Trainable high resolution melt curve machine learning classifier for large-scale reliable genotyping of sequence variants

Pornpat Athamanolap; Vishwa Parekh; Stephanie I. Fraley; Vatsal Agarwal; Dong J. Shin; Michael A. Jacobs; Tza Huei Wang; Samuel Yang

doi:10.1371/journal.pone.0109094

Trainable high resolution melt curve machine learning classifier for large-scale reliable genotyping of sequence variants

Pornpat Athamanolap, Vishwa Parekh, Stephanie I. Fraley, Vatsal Agarwal, Dong J. Shin, Michael A. Jacobs, Tza Huei Wang, Samuel Yang

School of Medicine

Research output: Contribution to journal › Article › peer-review

24 Scopus citations

Abstract

High resolution melt (HRM) is gaining considerable popularity as a simple and robust method for genotyping sequence variants. However, accurate genotyping of an unknown sample for which a large number of possible variants may exist will require an automated HRM curve identification method capable of comparing unknowns against a large cohort of known sequence variants. Herein, we describe a new method for automated HRM curve classification based on machine learning methods and learned tolerance for reaction condition deviations. We tested this method in silico through multiple cross-validations using curves generated from 9 different simulated experimental conditions to classify 92 known serotypes of Streptococcus pneumoniae and demonstrated over 99% accuracy with 8 training curves per serotype. In vitro verification of the algorithm was tested using sequence variants of a cancer-related gene and demonstrated 100% accuracy with 3 training curves per sequence variant. The machine learning algorithm enabled reliable, scalable, and automated HRM genotyping analysis with broad potential clinical and epidemiological applications.

Original language	English (US)
Article number	0109094
Journal	PloS one
Volume	9
Issue number	10
DOIs	https://doi.org/10.1371/journal.pone.0109094
State	Published - Oct 2 2014

ASJC Scopus subject areas

General

Access to Document

10.1371/journal.pone.0109094

Cite this

@article{ef9fcc89b9274907b523e9c5814f4d66,

title = "Trainable high resolution melt curve machine learning classifier for large-scale reliable genotyping of sequence variants",

abstract = "High resolution melt (HRM) is gaining considerable popularity as a simple and robust method for genotyping sequence variants. However, accurate genotyping of an unknown sample for which a large number of possible variants may exist will require an automated HRM curve identification method capable of comparing unknowns against a large cohort of known sequence variants. Herein, we describe a new method for automated HRM curve classification based on machine learning methods and learned tolerance for reaction condition deviations. We tested this method in silico through multiple cross-validations using curves generated from 9 different simulated experimental conditions to classify 92 known serotypes of Streptococcus pneumoniae and demonstrated over 99% accuracy with 8 training curves per serotype. In vitro verification of the algorithm was tested using sequence variants of a cancer-related gene and demonstrated 100% accuracy with 3 training curves per sequence variant. The machine learning algorithm enabled reliable, scalable, and automated HRM genotyping analysis with broad potential clinical and epidemiological applications.",

author = "Pornpat Athamanolap and Vishwa Parekh and Fraley, {Stephanie I.} and Vatsal Agarwal and Shin, {Dong J.} and Jacobs, {Michael A.} and Wang, {Tza Huei} and Samuel Yang",

note = "Publisher Copyright: {\textcopyright} 2014 Athamanolap et al.",

year = "2014",

month = oct,

day = "2",

doi = "10.1371/journal.pone.0109094",

language = "English (US)",

volume = "9",

journal = "PloS one",

issn = "1932-6203",

publisher = "Public Library of Science",

number = "10",

}

TY - JOUR

T1 - Trainable high resolution melt curve machine learning classifier for large-scale reliable genotyping of sequence variants

AU - Athamanolap, Pornpat

AU - Parekh, Vishwa

AU - Fraley, Stephanie I.

AU - Agarwal, Vatsal

AU - Shin, Dong J.

AU - Jacobs, Michael A.

AU - Wang, Tza Huei

AU - Yang, Samuel

PY - 2014/10/2

Y1 - 2014/10/2

N2 - High resolution melt (HRM) is gaining considerable popularity as a simple and robust method for genotyping sequence variants. However, accurate genotyping of an unknown sample for which a large number of possible variants may exist will require an automated HRM curve identification method capable of comparing unknowns against a large cohort of known sequence variants. Herein, we describe a new method for automated HRM curve classification based on machine learning methods and learned tolerance for reaction condition deviations. We tested this method in silico through multiple cross-validations using curves generated from 9 different simulated experimental conditions to classify 92 known serotypes of Streptococcus pneumoniae and demonstrated over 99% accuracy with 8 training curves per serotype. In vitro verification of the algorithm was tested using sequence variants of a cancer-related gene and demonstrated 100% accuracy with 3 training curves per sequence variant. The machine learning algorithm enabled reliable, scalable, and automated HRM genotyping analysis with broad potential clinical and epidemiological applications.

AB - High resolution melt (HRM) is gaining considerable popularity as a simple and robust method for genotyping sequence variants. However, accurate genotyping of an unknown sample for which a large number of possible variants may exist will require an automated HRM curve identification method capable of comparing unknowns against a large cohort of known sequence variants. Herein, we describe a new method for automated HRM curve classification based on machine learning methods and learned tolerance for reaction condition deviations. We tested this method in silico through multiple cross-validations using curves generated from 9 different simulated experimental conditions to classify 92 known serotypes of Streptococcus pneumoniae and demonstrated over 99% accuracy with 8 training curves per serotype. In vitro verification of the algorithm was tested using sequence variants of a cancer-related gene and demonstrated 100% accuracy with 3 training curves per sequence variant. The machine learning algorithm enabled reliable, scalable, and automated HRM genotyping analysis with broad potential clinical and epidemiological applications.

UR - http://www.scopus.com/inward/record.url?scp=84907484079&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84907484079&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0109094

DO - 10.1371/journal.pone.0109094

M3 - Article

C2 - 25275518

AN - SCOPUS:84907484079

SN - 1932-6203

VL - 9

JO - PloS one

JF - PloS one

IS - 10

M1 - 0109094

ER -

Trainable high resolution melt curve machine learning classifier for large-scale reliable genotyping of sequence variants

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this