Trainable high resolution melt curve machine learning classifier for large-scale reliable genotyping of sequence variants

Pornpat Athamanolap, Vishwa Parekh, Stephanie I. Fraley, Vatsal Agarwal, Dong J. Shin, Michael Jacobs, Tza Huei Wang, Samuel Yang

Research output: Contribution to journalArticle

Abstract

High resolution melt (HRM) is gaining considerable popularity as a simple and robust method for genotyping sequence variants. However, accurate genotyping of an unknown sample for which a large number of possible variants may exist will require an automated HRM curve identification method capable of comparing unknowns against a large cohort of known sequence variants. Herein, we describe a new method for automated HRM curve classification based on machine learning methods and learned tolerance for reaction condition deviations. We tested this method in silico through multiple cross-validations using curves generated from 9 different simulated experimental conditions to classify 92 known serotypes of Streptococcus pneumoniae and demonstrated over 99% accuracy with 8 training curves per serotype. In vitro verification of the algorithm was tested using sequence variants of a cancer-related gene and demonstrated 100% accuracy with 3 training curves per sequence variant. The machine learning algorithm enabled reliable, scalable, and automated HRM genotyping analysis with broad potential clinical and epidemiological applications.

Original languageEnglish (US)
Pages (from-to)e109094
JournalPLoS One
Volume9
Issue number9
DOIs
StatePublished - 2014

Fingerprint

artificial intelligence
genotyping
Learning systems
Classifiers
Learning algorithms
Genes
serotypes
methodology
Streptococcus pneumoniae
Neoplasm Genes
Computer Simulation
Machine Learning
taxonomy
neoplasms
genes
sampling
Serogroup

ASJC Scopus subject areas

  • Medicine(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)

Cite this

Trainable high resolution melt curve machine learning classifier for large-scale reliable genotyping of sequence variants. / Athamanolap, Pornpat; Parekh, Vishwa; Fraley, Stephanie I.; Agarwal, Vatsal; Shin, Dong J.; Jacobs, Michael; Wang, Tza Huei; Yang, Samuel.

In: PLoS One, Vol. 9, No. 9, 2014, p. e109094.

Research output: Contribution to journalArticle

Athamanolap, Pornpat ; Parekh, Vishwa ; Fraley, Stephanie I. ; Agarwal, Vatsal ; Shin, Dong J. ; Jacobs, Michael ; Wang, Tza Huei ; Yang, Samuel. / Trainable high resolution melt curve machine learning classifier for large-scale reliable genotyping of sequence variants. In: PLoS One. 2014 ; Vol. 9, No. 9. pp. e109094.
@article{ef9fcc89b9274907b523e9c5814f4d66,
title = "Trainable high resolution melt curve machine learning classifier for large-scale reliable genotyping of sequence variants",
abstract = "High resolution melt (HRM) is gaining considerable popularity as a simple and robust method for genotyping sequence variants. However, accurate genotyping of an unknown sample for which a large number of possible variants may exist will require an automated HRM curve identification method capable of comparing unknowns against a large cohort of known sequence variants. Herein, we describe a new method for automated HRM curve classification based on machine learning methods and learned tolerance for reaction condition deviations. We tested this method in silico through multiple cross-validations using curves generated from 9 different simulated experimental conditions to classify 92 known serotypes of Streptococcus pneumoniae and demonstrated over 99{\%} accuracy with 8 training curves per serotype. In vitro verification of the algorithm was tested using sequence variants of a cancer-related gene and demonstrated 100{\%} accuracy with 3 training curves per sequence variant. The machine learning algorithm enabled reliable, scalable, and automated HRM genotyping analysis with broad potential clinical and epidemiological applications.",
author = "Pornpat Athamanolap and Vishwa Parekh and Fraley, {Stephanie I.} and Vatsal Agarwal and Shin, {Dong J.} and Michael Jacobs and Wang, {Tza Huei} and Samuel Yang",
year = "2014",
doi = "10.1371/journal.pone.0109094",
language = "English (US)",
volume = "9",
pages = "e109094",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "9",

}

TY - JOUR

T1 - Trainable high resolution melt curve machine learning classifier for large-scale reliable genotyping of sequence variants

AU - Athamanolap, Pornpat

AU - Parekh, Vishwa

AU - Fraley, Stephanie I.

AU - Agarwal, Vatsal

AU - Shin, Dong J.

AU - Jacobs, Michael

AU - Wang, Tza Huei

AU - Yang, Samuel

PY - 2014

Y1 - 2014

N2 - High resolution melt (HRM) is gaining considerable popularity as a simple and robust method for genotyping sequence variants. However, accurate genotyping of an unknown sample for which a large number of possible variants may exist will require an automated HRM curve identification method capable of comparing unknowns against a large cohort of known sequence variants. Herein, we describe a new method for automated HRM curve classification based on machine learning methods and learned tolerance for reaction condition deviations. We tested this method in silico through multiple cross-validations using curves generated from 9 different simulated experimental conditions to classify 92 known serotypes of Streptococcus pneumoniae and demonstrated over 99% accuracy with 8 training curves per serotype. In vitro verification of the algorithm was tested using sequence variants of a cancer-related gene and demonstrated 100% accuracy with 3 training curves per sequence variant. The machine learning algorithm enabled reliable, scalable, and automated HRM genotyping analysis with broad potential clinical and epidemiological applications.

AB - High resolution melt (HRM) is gaining considerable popularity as a simple and robust method for genotyping sequence variants. However, accurate genotyping of an unknown sample for which a large number of possible variants may exist will require an automated HRM curve identification method capable of comparing unknowns against a large cohort of known sequence variants. Herein, we describe a new method for automated HRM curve classification based on machine learning methods and learned tolerance for reaction condition deviations. We tested this method in silico through multiple cross-validations using curves generated from 9 different simulated experimental conditions to classify 92 known serotypes of Streptococcus pneumoniae and demonstrated over 99% accuracy with 8 training curves per serotype. In vitro verification of the algorithm was tested using sequence variants of a cancer-related gene and demonstrated 100% accuracy with 3 training curves per sequence variant. The machine learning algorithm enabled reliable, scalable, and automated HRM genotyping analysis with broad potential clinical and epidemiological applications.

UR - http://www.scopus.com/inward/record.url?scp=84991543740&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84991543740&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0109094

DO - 10.1371/journal.pone.0109094

M3 - Article

VL - 9

SP - e109094

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 9

ER -