Estimating genome-wide copy number using allele specific mixture models

Wenyi Wang, Benilton Carvalho, Nate Miller, Jonathan Pevsner, Aravinda Chakravarti, Rafael A. Irizarry

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations


Genomic changes such as copy number alterations are thought to be one of the major underlying causes of human phenotypic variation among normal and disease subjects [23,11,25,26,5,4,7,18]. These include chromosomal regions with so-called copy number alterations: instead of the expected two copies, a section of the chromosome for a particular individual may have zero copies (homozygous deletion), one copy (hemizygous deletions), or more than two copies (amplifications). The canonical example is Down syndrome which is caused by an extra copy of chromosome 21. Identification of such abnormalities in smaller regions has been of great interest, because it is believed to be an underlying cause of cancer. More than one decade ago comparative genomic hybridization (CGH) technology was developed to detect copy number changes in a high-throughput fashion. However, this technology only provides a 10 MB resolution which limits the ability to detect copy number alterations spanning small regions. It is widely believed that a copy number alteration as small as one base can have significant downstream effects, thus microarray manufacturers have developed technologies that provide much higher resolution. Unfortunately, strong probe effects and variation introduced by sample preparation procedures have made single-point copy number estimates too imprecise to be useful. CGH arrays use a two-color hybridization, usually comparing a sample of interest to a reference sample, which to some degree removes the probe effect. However, the resolution is not nearly high enough to provide single-point copy number estimates. Various groups have proposed statistical procedures that pool data from neighboring locations to successfully improve precision. However, these procedure need to average across relatively large regions to work effectively thus greatly reducing the resolution. Recently, regression-type models that account for probe-effect have been proposed and appear to improve accuracy as well as precision. In this paper, we propose a mixture model solution specifically designed for single-point estimation, that provides various advantages over the existing methodology. We use a 314 sample database, constructed with public dataseis, to motivate and fit models for the conditional distribution of the observed intensities given allele specific copy numbers. With the estimated models in place we can compute posterior probabilities that provide a useful prediction rule as well as a confidence measure for each call. Software to implement this procedure will be available in the Bioconductor oligo package (

Original languageEnglish (US)
Title of host publicationResearch in Computational Molecular Biology - 11th Annual International Conference, RECOMB 2007, Proceedings
PublisherSpringer Verlag
Number of pages14
ISBN (Print)3540716807, 9783540716808
StatePublished - 2007
Event11th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2007 - Oakland, CA, United States
Duration: Apr 21 2007Apr 25 2007

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4453 LNBI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Other11th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2007
Country/TerritoryUnited States
CityOakland, CA

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)


Dive into the research topics of 'Estimating genome-wide copy number using allele specific mixture models'. Together they form a unique fingerprint.

Cite this