TY - JOUR
T1 - Trainable high resolution melt curve machine learning classifier for large-scale reliable genotyping of sequence variants
AU - Athamanolap, Pornpat
AU - Parekh, Vishwa
AU - Fraley, Stephanie I.
AU - Agarwal, Vatsal
AU - Shin, Dong J.
AU - Jacobs, Michael A.
AU - Wang, Tza Huei
AU - Yang, Samuel
N1 - Publisher Copyright:
© 2014 Athamanolap et al.
PY - 2014/10/2
Y1 - 2014/10/2
N2 - High resolution melt (HRM) is gaining considerable popularity as a simple and robust method for genotyping sequence variants. However, accurate genotyping of an unknown sample for which a large number of possible variants may exist will require an automated HRM curve identification method capable of comparing unknowns against a large cohort of known sequence variants. Herein, we describe a new method for automated HRM curve classification based on machine learning methods and learned tolerance for reaction condition deviations. We tested this method in silico through multiple cross-validations using curves generated from 9 different simulated experimental conditions to classify 92 known serotypes of Streptococcus pneumoniae and demonstrated over 99% accuracy with 8 training curves per serotype. In vitro verification of the algorithm was tested using sequence variants of a cancer-related gene and demonstrated 100% accuracy with 3 training curves per sequence variant. The machine learning algorithm enabled reliable, scalable, and automated HRM genotyping analysis with broad potential clinical and epidemiological applications.
AB - High resolution melt (HRM) is gaining considerable popularity as a simple and robust method for genotyping sequence variants. However, accurate genotyping of an unknown sample for which a large number of possible variants may exist will require an automated HRM curve identification method capable of comparing unknowns against a large cohort of known sequence variants. Herein, we describe a new method for automated HRM curve classification based on machine learning methods and learned tolerance for reaction condition deviations. We tested this method in silico through multiple cross-validations using curves generated from 9 different simulated experimental conditions to classify 92 known serotypes of Streptococcus pneumoniae and demonstrated over 99% accuracy with 8 training curves per serotype. In vitro verification of the algorithm was tested using sequence variants of a cancer-related gene and demonstrated 100% accuracy with 3 training curves per sequence variant. The machine learning algorithm enabled reliable, scalable, and automated HRM genotyping analysis with broad potential clinical and epidemiological applications.
UR - http://www.scopus.com/inward/record.url?scp=84907484079&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84907484079&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0109094
DO - 10.1371/journal.pone.0109094
M3 - Article
C2 - 25275518
AN - SCOPUS:84907484079
SN - 1932-6203
VL - 9
JO - PloS one
JF - PloS one
IS - 10
M1 - 0109094
ER -