Estimating genome-wide copy number using allele-specific mixture models

Wenyi Wang, Benilton Carvalho, Nathaniel D. Miller, Jonathan A. Pevsner, Aravinda Chakravarti, Rafael A. Irizarry

Research output: Contribution to journalArticle

Abstract

Genomic changes such as copy number alterations are one of the major underlying causes of human phenotypic variation among normal and disease subjects. Array comparative genomic hybridization (CGH) technology was developed to detect copy number changes in a high-throughput fashion. However, this technology provides only a >30-kb resolution, which limits the ability to detect copy number alterations spanning small regions. Higher resolution technologies such as single nucleotide polymorphism (SNP) microarrays allow detection of copy number alterations at least as small as several thousand base pairs. Unfortunately, strong probe effects and variation introduced by sample preparation procedures have made single-point copy number estimates too imprecise to be useful. Various groups have proposed statistical procedures that pool data from neighboring locations to successfully improve precision. However, these procedure need to average across relatively large regions to work effectively, thus greatly reducing resolution. Recently, regression-type models that account for probe effects have been proposed and appear to improve accuracy as well as precision. In this paper, we propose a mixture model solution, specifically designed for single-point estimation, that provides various advantages over the existing methodology. We use a 314-sample database, to motivate and fit models for the conditional distribution of the observed intensities given allele-specific copy number. We can then compute posterior probabilities that provide a useful prediction rule as well as a confidence measure for each call. Software to implement this procedure will be available in the Bioconductor oligo package (www.bioconductor.org).

Original languageEnglish (US)
Pages (from-to)857-866
Number of pages10
JournalJournal of Computational Biology
Volume15
Issue number7
DOIs
StatePublished - Sep 1 2008

Fingerprint

Mixture Model
Genome
Genes
Alleles
Technology
Microarrays
Nucleotides
Polymorphism
Comparative Genomic Hybridization
Probe
Throughput
Base Pairing
Single Nucleotide Polymorphism
Confidence Measure
Point Estimation
Comparative Genomics
Software
Single nucleotide Polymorphism
Posterior Probability
Databases

Keywords

  • Algorithms
  • Computational molecular biology
  • DNA arrays

ASJC Scopus subject areas

  • Molecular Biology
  • Genetics

Cite this

Estimating genome-wide copy number using allele-specific mixture models. / Wang, Wenyi; Carvalho, Benilton; Miller, Nathaniel D.; Pevsner, Jonathan A.; Chakravarti, Aravinda; Irizarry, Rafael A.

In: Journal of Computational Biology, Vol. 15, No. 7, 01.09.2008, p. 857-866.

Research output: Contribution to journalArticle

Wang, W, Carvalho, B, Miller, ND, Pevsner, JA, Chakravarti, A & Irizarry, RA 2008, 'Estimating genome-wide copy number using allele-specific mixture models', Journal of Computational Biology, vol. 15, no. 7, pp. 857-866. https://doi.org/10.1089/cmb.2007.0148
Wang, Wenyi ; Carvalho, Benilton ; Miller, Nathaniel D. ; Pevsner, Jonathan A. ; Chakravarti, Aravinda ; Irizarry, Rafael A. / Estimating genome-wide copy number using allele-specific mixture models. In: Journal of Computational Biology. 2008 ; Vol. 15, No. 7. pp. 857-866.
@article{f3ea00755b0e44b1ac728e4b18213c25,
title = "Estimating genome-wide copy number using allele-specific mixture models",
abstract = "Genomic changes such as copy number alterations are one of the major underlying causes of human phenotypic variation among normal and disease subjects. Array comparative genomic hybridization (CGH) technology was developed to detect copy number changes in a high-throughput fashion. However, this technology provides only a >30-kb resolution, which limits the ability to detect copy number alterations spanning small regions. Higher resolution technologies such as single nucleotide polymorphism (SNP) microarrays allow detection of copy number alterations at least as small as several thousand base pairs. Unfortunately, strong probe effects and variation introduced by sample preparation procedures have made single-point copy number estimates too imprecise to be useful. Various groups have proposed statistical procedures that pool data from neighboring locations to successfully improve precision. However, these procedure need to average across relatively large regions to work effectively, thus greatly reducing resolution. Recently, regression-type models that account for probe effects have been proposed and appear to improve accuracy as well as precision. In this paper, we propose a mixture model solution, specifically designed for single-point estimation, that provides various advantages over the existing methodology. We use a 314-sample database, to motivate and fit models for the conditional distribution of the observed intensities given allele-specific copy number. We can then compute posterior probabilities that provide a useful prediction rule as well as a confidence measure for each call. Software to implement this procedure will be available in the Bioconductor oligo package (www.bioconductor.org).",
keywords = "Algorithms, Computational molecular biology, DNA arrays",
author = "Wenyi Wang and Benilton Carvalho and Miller, {Nathaniel D.} and Pevsner, {Jonathan A.} and Aravinda Chakravarti and Irizarry, {Rafael A.}",
year = "2008",
month = "9",
day = "1",
doi = "10.1089/cmb.2007.0148",
language = "English (US)",
volume = "15",
pages = "857--866",
journal = "Journal of Computational Biology",
issn = "1066-5277",
publisher = "Mary Ann Liebert Inc.",
number = "7",

}

TY - JOUR

T1 - Estimating genome-wide copy number using allele-specific mixture models

AU - Wang, Wenyi

AU - Carvalho, Benilton

AU - Miller, Nathaniel D.

AU - Pevsner, Jonathan A.

AU - Chakravarti, Aravinda

AU - Irizarry, Rafael A.

PY - 2008/9/1

Y1 - 2008/9/1

N2 - Genomic changes such as copy number alterations are one of the major underlying causes of human phenotypic variation among normal and disease subjects. Array comparative genomic hybridization (CGH) technology was developed to detect copy number changes in a high-throughput fashion. However, this technology provides only a >30-kb resolution, which limits the ability to detect copy number alterations spanning small regions. Higher resolution technologies such as single nucleotide polymorphism (SNP) microarrays allow detection of copy number alterations at least as small as several thousand base pairs. Unfortunately, strong probe effects and variation introduced by sample preparation procedures have made single-point copy number estimates too imprecise to be useful. Various groups have proposed statistical procedures that pool data from neighboring locations to successfully improve precision. However, these procedure need to average across relatively large regions to work effectively, thus greatly reducing resolution. Recently, regression-type models that account for probe effects have been proposed and appear to improve accuracy as well as precision. In this paper, we propose a mixture model solution, specifically designed for single-point estimation, that provides various advantages over the existing methodology. We use a 314-sample database, to motivate and fit models for the conditional distribution of the observed intensities given allele-specific copy number. We can then compute posterior probabilities that provide a useful prediction rule as well as a confidence measure for each call. Software to implement this procedure will be available in the Bioconductor oligo package (www.bioconductor.org).

AB - Genomic changes such as copy number alterations are one of the major underlying causes of human phenotypic variation among normal and disease subjects. Array comparative genomic hybridization (CGH) technology was developed to detect copy number changes in a high-throughput fashion. However, this technology provides only a >30-kb resolution, which limits the ability to detect copy number alterations spanning small regions. Higher resolution technologies such as single nucleotide polymorphism (SNP) microarrays allow detection of copy number alterations at least as small as several thousand base pairs. Unfortunately, strong probe effects and variation introduced by sample preparation procedures have made single-point copy number estimates too imprecise to be useful. Various groups have proposed statistical procedures that pool data from neighboring locations to successfully improve precision. However, these procedure need to average across relatively large regions to work effectively, thus greatly reducing resolution. Recently, regression-type models that account for probe effects have been proposed and appear to improve accuracy as well as precision. In this paper, we propose a mixture model solution, specifically designed for single-point estimation, that provides various advantages over the existing methodology. We use a 314-sample database, to motivate and fit models for the conditional distribution of the observed intensities given allele-specific copy number. We can then compute posterior probabilities that provide a useful prediction rule as well as a confidence measure for each call. Software to implement this procedure will be available in the Bioconductor oligo package (www.bioconductor.org).

KW - Algorithms

KW - Computational molecular biology

KW - DNA arrays

UR - http://www.scopus.com/inward/record.url?scp=51349100415&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=51349100415&partnerID=8YFLogxK

U2 - 10.1089/cmb.2007.0148

DO - 10.1089/cmb.2007.0148

M3 - Article

C2 - 18707534

AN - SCOPUS:51349100415

VL - 15

SP - 857

EP - 866

JO - Journal of Computational Biology

JF - Journal of Computational Biology

SN - 1066-5277

IS - 7

ER -