Missense variants in CFTR nucleotide-binding domains predict quantitative phenotypes associated with cystic fibrosis disease severity

David L. Masica, Patrick Ryan Sosnay, Karen S. Raraigh, Garry R Cutting, Rachel Karchin

Research output: Contribution to journalArticle

Abstract

Predicting the impact of genetic variation on human health remains an important and difficult challenge. Often, algorithmic classifiers are tasked with predicting binary traits (e.g. positive or negative for a disease) from missense variation. Though useful, this arrangement is limiting and contrived, because human diseases often comprise a spectrum of severities, rather than a discrete partitioning of patient populations. Furthermore, labeling variants as causal or benign can be error prone, which is problematic for training supervised learning algorithms (the so-called garbage in, garbage out phenomenon).We explore the potential value of training classifiers using continuous-valued quantitative measurements, rather than binary traits. Using 20 variants from cystic fibrosis transmembrane conductance regulator (CFTR) nucleotide-binding domains and six quantitative measures of cystic fibrosis (CF) severity, we trained classifiers to predict CF severity from CFTR variants. Employing cross validation, classifier prediction and measured clinical/functional valueswere significantly correlated for four of six quantitative traits (correlation P-values from 1.35 × 10-4 to 4.15 × 10-3). Classifiers were also able to stratify variants by three clinically relevant risk categories with 85-100% accuracy, depending on which of the six quantitative traits was used for training. Finally, we characterized 11 additional CFTR variants using clinical sweat chloride testing, two functional assays, or all three diagnostics, and validated our classifier using blind prediction. Predictions were within the measured sweat chloride range for seven of eight variants, and captured the differential impact of specific variants on the two functional assays. This work demonstrates a promising and novel framework for assessing the impact of genetic variation.

Original languageEnglish (US)
Article numberddu607
Pages (from-to)1908-1917
Number of pages10
JournalHuman Molecular Genetics
Volume24
Issue number7
DOIs
StatePublished - Oct 28 2014

Fingerprint

Cystic Fibrosis Transmembrane Conductance Regulator
Cystic Fibrosis
Nucleotides
Sweat
Phenotype
Chlorides
Learning
Health
Population

ASJC Scopus subject areas

  • Genetics
  • Genetics(clinical)
  • Molecular Biology

Cite this

Missense variants in CFTR nucleotide-binding domains predict quantitative phenotypes associated with cystic fibrosis disease severity. / Masica, David L.; Sosnay, Patrick Ryan; Raraigh, Karen S.; Cutting, Garry R; Karchin, Rachel.

In: Human Molecular Genetics, Vol. 24, No. 7, ddu607, 28.10.2014, p. 1908-1917.

Research output: Contribution to journalArticle

@article{b642ee1b1fd94afcbf814cb440530700,
title = "Missense variants in CFTR nucleotide-binding domains predict quantitative phenotypes associated with cystic fibrosis disease severity",
abstract = "Predicting the impact of genetic variation on human health remains an important and difficult challenge. Often, algorithmic classifiers are tasked with predicting binary traits (e.g. positive or negative for a disease) from missense variation. Though useful, this arrangement is limiting and contrived, because human diseases often comprise a spectrum of severities, rather than a discrete partitioning of patient populations. Furthermore, labeling variants as causal or benign can be error prone, which is problematic for training supervised learning algorithms (the so-called garbage in, garbage out phenomenon).We explore the potential value of training classifiers using continuous-valued quantitative measurements, rather than binary traits. Using 20 variants from cystic fibrosis transmembrane conductance regulator (CFTR) nucleotide-binding domains and six quantitative measures of cystic fibrosis (CF) severity, we trained classifiers to predict CF severity from CFTR variants. Employing cross validation, classifier prediction and measured clinical/functional valueswere significantly correlated for four of six quantitative traits (correlation P-values from 1.35 × 10-4 to 4.15 × 10-3). Classifiers were also able to stratify variants by three clinically relevant risk categories with 85-100{\%} accuracy, depending on which of the six quantitative traits was used for training. Finally, we characterized 11 additional CFTR variants using clinical sweat chloride testing, two functional assays, or all three diagnostics, and validated our classifier using blind prediction. Predictions were within the measured sweat chloride range for seven of eight variants, and captured the differential impact of specific variants on the two functional assays. This work demonstrates a promising and novel framework for assessing the impact of genetic variation.",
author = "Masica, {David L.} and Sosnay, {Patrick Ryan} and Raraigh, {Karen S.} and Cutting, {Garry R} and Rachel Karchin",
year = "2014",
month = "10",
day = "28",
doi = "10.1093/hmg/ddu607",
language = "English (US)",
volume = "24",
pages = "1908--1917",
journal = "Human Molecular Genetics",
issn = "0964-6906",
publisher = "Oxford University Press",
number = "7",

}

TY - JOUR

T1 - Missense variants in CFTR nucleotide-binding domains predict quantitative phenotypes associated with cystic fibrosis disease severity

AU - Masica, David L.

AU - Sosnay, Patrick Ryan

AU - Raraigh, Karen S.

AU - Cutting, Garry R

AU - Karchin, Rachel

PY - 2014/10/28

Y1 - 2014/10/28

N2 - Predicting the impact of genetic variation on human health remains an important and difficult challenge. Often, algorithmic classifiers are tasked with predicting binary traits (e.g. positive or negative for a disease) from missense variation. Though useful, this arrangement is limiting and contrived, because human diseases often comprise a spectrum of severities, rather than a discrete partitioning of patient populations. Furthermore, labeling variants as causal or benign can be error prone, which is problematic for training supervised learning algorithms (the so-called garbage in, garbage out phenomenon).We explore the potential value of training classifiers using continuous-valued quantitative measurements, rather than binary traits. Using 20 variants from cystic fibrosis transmembrane conductance regulator (CFTR) nucleotide-binding domains and six quantitative measures of cystic fibrosis (CF) severity, we trained classifiers to predict CF severity from CFTR variants. Employing cross validation, classifier prediction and measured clinical/functional valueswere significantly correlated for four of six quantitative traits (correlation P-values from 1.35 × 10-4 to 4.15 × 10-3). Classifiers were also able to stratify variants by three clinically relevant risk categories with 85-100% accuracy, depending on which of the six quantitative traits was used for training. Finally, we characterized 11 additional CFTR variants using clinical sweat chloride testing, two functional assays, or all three diagnostics, and validated our classifier using blind prediction. Predictions were within the measured sweat chloride range for seven of eight variants, and captured the differential impact of specific variants on the two functional assays. This work demonstrates a promising and novel framework for assessing the impact of genetic variation.

AB - Predicting the impact of genetic variation on human health remains an important and difficult challenge. Often, algorithmic classifiers are tasked with predicting binary traits (e.g. positive or negative for a disease) from missense variation. Though useful, this arrangement is limiting and contrived, because human diseases often comprise a spectrum of severities, rather than a discrete partitioning of patient populations. Furthermore, labeling variants as causal or benign can be error prone, which is problematic for training supervised learning algorithms (the so-called garbage in, garbage out phenomenon).We explore the potential value of training classifiers using continuous-valued quantitative measurements, rather than binary traits. Using 20 variants from cystic fibrosis transmembrane conductance regulator (CFTR) nucleotide-binding domains and six quantitative measures of cystic fibrosis (CF) severity, we trained classifiers to predict CF severity from CFTR variants. Employing cross validation, classifier prediction and measured clinical/functional valueswere significantly correlated for four of six quantitative traits (correlation P-values from 1.35 × 10-4 to 4.15 × 10-3). Classifiers were also able to stratify variants by three clinically relevant risk categories with 85-100% accuracy, depending on which of the six quantitative traits was used for training. Finally, we characterized 11 additional CFTR variants using clinical sweat chloride testing, two functional assays, or all three diagnostics, and validated our classifier using blind prediction. Predictions were within the measured sweat chloride range for seven of eight variants, and captured the differential impact of specific variants on the two functional assays. This work demonstrates a promising and novel framework for assessing the impact of genetic variation.

UR - http://www.scopus.com/inward/record.url?scp=84926484027&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84926484027&partnerID=8YFLogxK

U2 - 10.1093/hmg/ddu607

DO - 10.1093/hmg/ddu607

M3 - Article

C2 - 25489051

AN - SCOPUS:84926484027

VL - 24

SP - 1908

EP - 1917

JO - Human Molecular Genetics

JF - Human Molecular Genetics

SN - 0964-6906

IS - 7

M1 - ddu607

ER -