TY - JOUR
T1 - Missense variants in CFTR nucleotide-binding domains predict quantitative phenotypes associated with cystic fibrosis disease severity
AU - Masica, David L.
AU - Sosnay, Patrick R.
AU - Raraigh, Karen S.
AU - Cutting, Garry R.
AU - Karchin, Rachel
N1 - Funding Information:
This work supported by the US CF Foundation (KARCHI1210 and CUTTIN11A0).
Publisher Copyright:
© The Author 2014.
PY - 2014/10/28
Y1 - 2014/10/28
N2 - Predicting the impact of genetic variation on human health remains an important and difficult challenge. Often, algorithmic classifiers are tasked with predicting binary traits (e.g. positive or negative for a disease) from missense variation. Though useful, this arrangement is limiting and contrived, because human diseases often comprise a spectrum of severities, rather than a discrete partitioning of patient populations. Furthermore, labeling variants as causal or benign can be error prone, which is problematic for training supervised learning algorithms (the so-called garbage in, garbage out phenomenon).We explore the potential value of training classifiers using continuous-valued quantitative measurements, rather than binary traits. Using 20 variants from cystic fibrosis transmembrane conductance regulator (CFTR) nucleotide-binding domains and six quantitative measures of cystic fibrosis (CF) severity, we trained classifiers to predict CF severity from CFTR variants. Employing cross validation, classifier prediction and measured clinical/functional valueswere significantly correlated for four of six quantitative traits (correlation P-values from 1.35 × 10-4 to 4.15 × 10-3). Classifiers were also able to stratify variants by three clinically relevant risk categories with 85-100% accuracy, depending on which of the six quantitative traits was used for training. Finally, we characterized 11 additional CFTR variants using clinical sweat chloride testing, two functional assays, or all three diagnostics, and validated our classifier using blind prediction. Predictions were within the measured sweat chloride range for seven of eight variants, and captured the differential impact of specific variants on the two functional assays. This work demonstrates a promising and novel framework for assessing the impact of genetic variation.
AB - Predicting the impact of genetic variation on human health remains an important and difficult challenge. Often, algorithmic classifiers are tasked with predicting binary traits (e.g. positive or negative for a disease) from missense variation. Though useful, this arrangement is limiting and contrived, because human diseases often comprise a spectrum of severities, rather than a discrete partitioning of patient populations. Furthermore, labeling variants as causal or benign can be error prone, which is problematic for training supervised learning algorithms (the so-called garbage in, garbage out phenomenon).We explore the potential value of training classifiers using continuous-valued quantitative measurements, rather than binary traits. Using 20 variants from cystic fibrosis transmembrane conductance regulator (CFTR) nucleotide-binding domains and six quantitative measures of cystic fibrosis (CF) severity, we trained classifiers to predict CF severity from CFTR variants. Employing cross validation, classifier prediction and measured clinical/functional valueswere significantly correlated for four of six quantitative traits (correlation P-values from 1.35 × 10-4 to 4.15 × 10-3). Classifiers were also able to stratify variants by three clinically relevant risk categories with 85-100% accuracy, depending on which of the six quantitative traits was used for training. Finally, we characterized 11 additional CFTR variants using clinical sweat chloride testing, two functional assays, or all three diagnostics, and validated our classifier using blind prediction. Predictions were within the measured sweat chloride range for seven of eight variants, and captured the differential impact of specific variants on the two functional assays. This work demonstrates a promising and novel framework for assessing the impact of genetic variation.
UR - http://www.scopus.com/inward/record.url?scp=84926484027&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84926484027&partnerID=8YFLogxK
U2 - 10.1093/hmg/ddu607
DO - 10.1093/hmg/ddu607
M3 - Article
C2 - 25489051
AN - SCOPUS:84926484027
VL - 24
SP - 1908
EP - 1917
JO - Human Molecular Genetics
JF - Human Molecular Genetics
SN - 0964-6906
IS - 7
ER -