Multiple imputation of missing phenotype data for QTL mapping

Jennifer F. Bobb, Daniel O Scharfstein, Michael J. Daniels, Francis S. Collins, Samir Kelada

Research output: Contribution to journalArticle

Abstract

Missing phenotype data can be a major hurdle to mapping quantitative trait loci (QTL). Though in many cases experiments may be designed to minimize the occurrence of missing data, it is often unavoidable in practice; thus, statistical methods to account for missing data are needed. In this paper we describe an approach for conjoining multiple imputation and QTL mapping. Methods are applied to map genes associated with increased breathing effort in mice after lung inflammation due to allergen challenge in developing lines of the Collaborative Cross, a new mouse genetics resource. Missing data poses a particular challenge in this study because the desired phenotype summary to be mapped is a function of incompletely observed dose-response curves. Comparison of the multiple imputation approach to two naive approaches for handling missing data suggest that these simpler methods may yield poor results: ignoring missing data through a complete case analysis may lead to incorrect conclusions, while using a last observation carried forward procedure, which does not account for uncertainty in the imputed values, may lead to anti-conservative inference. The proposed approach is widely applicable to other studies with missing phenotype data.

Original languageEnglish (US)
Article number29
JournalStatistical Applications in Genetics and Molecular Biology
Volume10
Issue number1
DOIs
StatePublished - 2011

Fingerprint

Quantitative Trait Loci
Multiple Imputation
Missing Data
Phenotype
Allergens
Data handling
Statistical methods
Genes
Mouse
Dose-response Curve
Inflammation
Uncertainty
Pneumonia
Respiration
Lung
Experiments
Statistical method
Observation
Gene
Minimise

Keywords

  • missing data
  • multiple imputation
  • quantitative trait loci

ASJC Scopus subject areas

  • Genetics
  • Molecular Biology
  • Statistics and Probability
  • Computational Mathematics
  • Medicine(all)

Cite this

Multiple imputation of missing phenotype data for QTL mapping. / Bobb, Jennifer F.; Scharfstein, Daniel O; Daniels, Michael J.; Collins, Francis S.; Kelada, Samir.

In: Statistical Applications in Genetics and Molecular Biology, Vol. 10, No. 1, 29, 2011.

Research output: Contribution to journalArticle

Bobb, Jennifer F. ; Scharfstein, Daniel O ; Daniels, Michael J. ; Collins, Francis S. ; Kelada, Samir. / Multiple imputation of missing phenotype data for QTL mapping. In: Statistical Applications in Genetics and Molecular Biology. 2011 ; Vol. 10, No. 1.
@article{950c2d3733c04733bc757ff41d30c478,
title = "Multiple imputation of missing phenotype data for QTL mapping",
abstract = "Missing phenotype data can be a major hurdle to mapping quantitative trait loci (QTL). Though in many cases experiments may be designed to minimize the occurrence of missing data, it is often unavoidable in practice; thus, statistical methods to account for missing data are needed. In this paper we describe an approach for conjoining multiple imputation and QTL mapping. Methods are applied to map genes associated with increased breathing effort in mice after lung inflammation due to allergen challenge in developing lines of the Collaborative Cross, a new mouse genetics resource. Missing data poses a particular challenge in this study because the desired phenotype summary to be mapped is a function of incompletely observed dose-response curves. Comparison of the multiple imputation approach to two naive approaches for handling missing data suggest that these simpler methods may yield poor results: ignoring missing data through a complete case analysis may lead to incorrect conclusions, while using a last observation carried forward procedure, which does not account for uncertainty in the imputed values, may lead to anti-conservative inference. The proposed approach is widely applicable to other studies with missing phenotype data.",
keywords = "missing data, multiple imputation, quantitative trait loci",
author = "Bobb, {Jennifer F.} and Scharfstein, {Daniel O} and Daniels, {Michael J.} and Collins, {Francis S.} and Samir Kelada",
year = "2011",
doi = "10.2202/1544-6115.1676",
language = "English (US)",
volume = "10",
journal = "Statistical Applications in Genetics and Molecular Biology",
issn = "1544-6115",
publisher = "Berkeley Electronic Press",
number = "1",

}

TY - JOUR

T1 - Multiple imputation of missing phenotype data for QTL mapping

AU - Bobb, Jennifer F.

AU - Scharfstein, Daniel O

AU - Daniels, Michael J.

AU - Collins, Francis S.

AU - Kelada, Samir

PY - 2011

Y1 - 2011

N2 - Missing phenotype data can be a major hurdle to mapping quantitative trait loci (QTL). Though in many cases experiments may be designed to minimize the occurrence of missing data, it is often unavoidable in practice; thus, statistical methods to account for missing data are needed. In this paper we describe an approach for conjoining multiple imputation and QTL mapping. Methods are applied to map genes associated with increased breathing effort in mice after lung inflammation due to allergen challenge in developing lines of the Collaborative Cross, a new mouse genetics resource. Missing data poses a particular challenge in this study because the desired phenotype summary to be mapped is a function of incompletely observed dose-response curves. Comparison of the multiple imputation approach to two naive approaches for handling missing data suggest that these simpler methods may yield poor results: ignoring missing data through a complete case analysis may lead to incorrect conclusions, while using a last observation carried forward procedure, which does not account for uncertainty in the imputed values, may lead to anti-conservative inference. The proposed approach is widely applicable to other studies with missing phenotype data.

AB - Missing phenotype data can be a major hurdle to mapping quantitative trait loci (QTL). Though in many cases experiments may be designed to minimize the occurrence of missing data, it is often unavoidable in practice; thus, statistical methods to account for missing data are needed. In this paper we describe an approach for conjoining multiple imputation and QTL mapping. Methods are applied to map genes associated with increased breathing effort in mice after lung inflammation due to allergen challenge in developing lines of the Collaborative Cross, a new mouse genetics resource. Missing data poses a particular challenge in this study because the desired phenotype summary to be mapped is a function of incompletely observed dose-response curves. Comparison of the multiple imputation approach to two naive approaches for handling missing data suggest that these simpler methods may yield poor results: ignoring missing data through a complete case analysis may lead to incorrect conclusions, while using a last observation carried forward procedure, which does not account for uncertainty in the imputed values, may lead to anti-conservative inference. The proposed approach is widely applicable to other studies with missing phenotype data.

KW - missing data

KW - multiple imputation

KW - quantitative trait loci

UR - http://www.scopus.com/inward/record.url?scp=79960023255&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79960023255&partnerID=8YFLogxK

U2 - 10.2202/1544-6115.1676

DO - 10.2202/1544-6115.1676

M3 - Article

C2 - 24683667

AN - SCOPUS:79960023255

VL - 10

JO - Statistical Applications in Genetics and Molecular Biology

JF - Statistical Applications in Genetics and Molecular Biology

SN - 1544-6115

IS - 1

M1 - 29

ER -