TY - JOUR
T1 - GPHMM
T2 - An integrated hidden Markov model for identification of copy number alteration and loss of heterozygosity in complex tumor samples using whole genome SNP arrays
AU - Li, Ao
AU - Liu, Zongzhi
AU - Lezon-Geyda, Kimberly
AU - Sarkar, Sudipa
AU - Lannin, Donald
AU - Schulz, Vincent
AU - Krop, Ian
AU - Winer, Eric
AU - Harris, Lyndsay
AU - Tuck, David
N1 - Funding Information:
Funding for open access charge: Department of Defense (grant W81XWH-04-1-0549 to L.H.); Yale Center of Excellence in Molecular Hematology P30 DK072442-03 NIDDK (to D.T. and V.S.); Susan G. Komen Foundation (grant number FAS0703853 to D.L.).
PY - 2011/7
Y1 - 2011/7
N2 - There is an increasing interest in using single nucleotide polymorphism (SNP) genotyping arrays for profiling chromosomal rearrangements in tumors, as they allow simultaneous detection of copy number and loss of heterozygosity with high resolution. Critical issues such as signal baseline shift due to aneuploidy, normal cell contamination, and the presence of GC content bias have been reported to dramatically alter SNP array signals and complicate accurate identification of aberrations in cancer genomes. To address these issues, we propose a novel Global Parameter Hidden Markov Model (GPHMM) to unravel tangled genotyping data generated from tumor samples. In contrast to other HMM methods, a distinct feature of GPHMM is that the issues mentioned above are quantitatively modeled by global parameters and integrated within the statistical framework. We developed an efficient EM algorithm for parameter estimation. We evaluated performance on three data sets and show that GPHMM can correctly identify chromosomal aberrations in tumor samples containing as few as 10 cancer cells. Furthermore, we demonstrated that the estimation of global parameters in GPHMM provides information about the biological characteristics of tumor samples and the quality of genotyping signal from SNP array experiments, which is helpful for data quality control and outlier detection in cohort studies.
AB - There is an increasing interest in using single nucleotide polymorphism (SNP) genotyping arrays for profiling chromosomal rearrangements in tumors, as they allow simultaneous detection of copy number and loss of heterozygosity with high resolution. Critical issues such as signal baseline shift due to aneuploidy, normal cell contamination, and the presence of GC content bias have been reported to dramatically alter SNP array signals and complicate accurate identification of aberrations in cancer genomes. To address these issues, we propose a novel Global Parameter Hidden Markov Model (GPHMM) to unravel tangled genotyping data generated from tumor samples. In contrast to other HMM methods, a distinct feature of GPHMM is that the issues mentioned above are quantitatively modeled by global parameters and integrated within the statistical framework. We developed an efficient EM algorithm for parameter estimation. We evaluated performance on three data sets and show that GPHMM can correctly identify chromosomal aberrations in tumor samples containing as few as 10 cancer cells. Furthermore, we demonstrated that the estimation of global parameters in GPHMM provides information about the biological characteristics of tumor samples and the quality of genotyping signal from SNP array experiments, which is helpful for data quality control and outlier detection in cohort studies.
UR - http://www.scopus.com/inward/record.url?scp=79960268207&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79960268207&partnerID=8YFLogxK
U2 - 10.1093/nar/gkr014
DO - 10.1093/nar/gkr014
M3 - Article
C2 - 21398628
AN - SCOPUS:79960268207
SN - 0305-1048
VL - 39
SP - 4928
EP - 4941
JO - Nucleic acids research
JF - Nucleic acids research
IS - 12
ER -