Multiple Imputation and Random Forests (MIRF) for unobservable, high-dimensional data

Bareng A.S. Nonyane, Andrea S. Foulkes

Research output: Contribution to journalArticle

Abstract

Understanding the genetic underpinnings to complex diseases requires consideration of sophisticated analytical methods designed to uncover intricate associations across multiple predictor variables. At the same time, knowledge of whether single nucleotide polymorphisms within a gene are on the same (in cis) or on different (in trans) chromosomal copies, may provide crucial information about measures of disease progression. In association studies of unrelated individuals, allelic phase is generally unobservable, generating an additional analytical challenge. In this manuscript, we describe a novel approach that combines multiple imputation and random forests for this high-dimensional, unobservable data setting. An application to a cohort of IHV-1 infected individuals receiving anti-retroviral therapies is presented. A simulation study is also presented to characterize method performance.

Original languageEnglish (US)
Article number12
JournalThe international journal of biostatistics
Volume3
Issue number1
StatePublished - 2007
Externally publishedYes

Fingerprint

Multiple Imputation
Random Forest
High-dimensional Data
Single nucleotide Polymorphism
Progression
Analytical Methods
Therapy
Single Nucleotide Polymorphism
Disease Progression
Predictors
Simulation Study
Gene
Genes
Forests
Multiple imputation
Therapeutics
Knowledge
Analytical methods
Cohort
Polymorphism

Keywords

  • Genotype
  • Haplotype
  • HIV-1
  • Lipids
  • Phase
  • Random forests
  • Recursive partitioning

ASJC Scopus subject areas

  • Medicine(all)
  • Statistics, Probability and Uncertainty
  • Statistics and Probability

Cite this

Multiple Imputation and Random Forests (MIRF) for unobservable, high-dimensional data. / Nonyane, Bareng A.S.; Foulkes, Andrea S.

In: The international journal of biostatistics, Vol. 3, No. 1, 12, 2007.

Research output: Contribution to journalArticle

@article{7d28be5735764254ad132b8e6ebfb789,
title = "Multiple Imputation and Random Forests (MIRF) for unobservable, high-dimensional data",
abstract = "Understanding the genetic underpinnings to complex diseases requires consideration of sophisticated analytical methods designed to uncover intricate associations across multiple predictor variables. At the same time, knowledge of whether single nucleotide polymorphisms within a gene are on the same (in cis) or on different (in trans) chromosomal copies, may provide crucial information about measures of disease progression. In association studies of unrelated individuals, allelic phase is generally unobservable, generating an additional analytical challenge. In this manuscript, we describe a novel approach that combines multiple imputation and random forests for this high-dimensional, unobservable data setting. An application to a cohort of IHV-1 infected individuals receiving anti-retroviral therapies is presented. A simulation study is also presented to characterize method performance.",
keywords = "Genotype, Haplotype, HIV-1, Lipids, Phase, Random forests, Recursive partitioning",
author = "Nonyane, {Bareng A.S.} and Foulkes, {Andrea S.}",
year = "2007",
language = "English (US)",
volume = "3",
journal = "International Journal of Biostatistics",
issn = "1557-4679",
publisher = "Berkeley Electronic Press",
number = "1",

}

TY - JOUR

T1 - Multiple Imputation and Random Forests (MIRF) for unobservable, high-dimensional data

AU - Nonyane, Bareng A.S.

AU - Foulkes, Andrea S.

PY - 2007

Y1 - 2007

N2 - Understanding the genetic underpinnings to complex diseases requires consideration of sophisticated analytical methods designed to uncover intricate associations across multiple predictor variables. At the same time, knowledge of whether single nucleotide polymorphisms within a gene are on the same (in cis) or on different (in trans) chromosomal copies, may provide crucial information about measures of disease progression. In association studies of unrelated individuals, allelic phase is generally unobservable, generating an additional analytical challenge. In this manuscript, we describe a novel approach that combines multiple imputation and random forests for this high-dimensional, unobservable data setting. An application to a cohort of IHV-1 infected individuals receiving anti-retroviral therapies is presented. A simulation study is also presented to characterize method performance.

AB - Understanding the genetic underpinnings to complex diseases requires consideration of sophisticated analytical methods designed to uncover intricate associations across multiple predictor variables. At the same time, knowledge of whether single nucleotide polymorphisms within a gene are on the same (in cis) or on different (in trans) chromosomal copies, may provide crucial information about measures of disease progression. In association studies of unrelated individuals, allelic phase is generally unobservable, generating an additional analytical challenge. In this manuscript, we describe a novel approach that combines multiple imputation and random forests for this high-dimensional, unobservable data setting. An application to a cohort of IHV-1 infected individuals receiving anti-retroviral therapies is presented. A simulation study is also presented to characterize method performance.

KW - Genotype

KW - Haplotype

KW - HIV-1

KW - Lipids

KW - Phase

KW - Random forests

KW - Recursive partitioning

UR - http://www.scopus.com/inward/record.url?scp=34548133733&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34548133733&partnerID=8YFLogxK

M3 - Article

C2 - 22550652

AN - SCOPUS:34548133733

VL - 3

JO - International Journal of Biostatistics

JF - International Journal of Biostatistics

SN - 1557-4679

IS - 1

M1 - 12

ER -