## Abstract

Summary. We discuss the analysis of data from single-nucleotide polymorphism arrays comparing tumour and normal tissues. The data consist of sequences of indicators for loss of heterozygosity (LOH) and involve three nested levels of repetition: chromosomes for a given patient, regions within chromosomes and single-nucleotide polymorphisms nested within regions. We propose to analyse these data by using a semiparametric model for multilevel repeated binary data. At the top level of the hierarchy we assume a sampling model for the observed binary LOH sequences that arises from a partial exchangeability argument. This implies a mixture of Markov chains model. The mixture is defined with respect to the Markov transition probabilities. We assume a non-parametric prior for the random-mixing measure. The resulting model takes the form of a semiparametric random-effects model with the matrix of transition probabilities being the random effects. The model includes appropriate dependence assumptions for the two remaining levels of the hierarchy, i.e. for regions within chromosomes and for chromosomes within patient. We use the model to identify regions of increased LOH in a data set coming from a study of treatment-related leukaemia in children with an initial cancer diagnostic. The model successfully identifies the desired regions and performs well compared with other available alternatives.

Original language | English (US) |
---|---|

Pages (from-to) | 419-431 |

Number of pages | 13 |

Journal | Journal of the Royal Statistical Society. Series C: Applied Statistics |

Volume | 57 |

Issue number | 4 |

DOIs | |

State | Published - Sep 2008 |

Externally published | Yes |

## Keywords

- Dirichlet process
- Loss of heterozygosity
- Partial exchangeability
- Semiparametric random effects

## ASJC Scopus subject areas

- Statistics and Probability
- Statistics, Probability and Uncertainty