Mapping complex traits using Random Forests.

Alexandre Bureau; Josée Dupuis; Brooke Hayward; Kathleen Falls; Paul Van Eerdewegh

doi:10.1186/1471-2156-4-s1-s64

Mapping complex traits using Random Forests.

Alexandre Bureau, Josée Dupuis, Brooke Hayward, Kathleen Falls, Paul Van Eerdewegh

Research output: Contribution to journal › Article › peer-review

33 Scopus citations

Abstract

Random Forest is a prediction technique based on growing trees on bootstrap samples of data, in conjunction with a random selection of explanatory variables to define the best split at each node. In the case of a quantitative outcome, the tree predictor takes on a numerical value. We applied Random Forest to the first replicate of the Genetic Analysis Workshop 13 simulated data set, with the sibling pairs as our units of analysis and identity by descent (IBD) at selected loci as our explanatory variables. With the knowledge of the true model, we performed two sets of analyses on three phenotypes: HDL, triglycerides, and glucose. The goal was to approach the mapping of complex traits from a multivariate perspective. The first set of analyses mimics a candidate gene approach with a high proportion of true genes among the predictors while the second set represents a genome scan analysis using microsatellite markers. Random Forest was able to identify a few of the major genes influencing the phenotypes, such as baseline HDL and triglycerides, but failed to identify the major genes regulating baseline glucose levels.

Original language	English (US)
Journal	BMC genetics
Volume	4 Suppl 1
DOIs	https://doi.org/10.1186/1471-2156-4-s1-s64
State	Published - 2003
Externally published	Yes

ASJC Scopus subject areas

Genetics
Genetics(clinical)

Access to Document

10.1186/1471-2156-4-s1-s64

Cite this

@article{9f86bfbd6446496d84d2f8a78e9adf15,

title = "Mapping complex traits using Random Forests.",

abstract = "Random Forest is a prediction technique based on growing trees on bootstrap samples of data, in conjunction with a random selection of explanatory variables to define the best split at each node. In the case of a quantitative outcome, the tree predictor takes on a numerical value. We applied Random Forest to the first replicate of the Genetic Analysis Workshop 13 simulated data set, with the sibling pairs as our units of analysis and identity by descent (IBD) at selected loci as our explanatory variables. With the knowledge of the true model, we performed two sets of analyses on three phenotypes: HDL, triglycerides, and glucose. The goal was to approach the mapping of complex traits from a multivariate perspective. The first set of analyses mimics a candidate gene approach with a high proportion of true genes among the predictors while the second set represents a genome scan analysis using microsatellite markers. Random Forest was able to identify a few of the major genes influencing the phenotypes, such as baseline HDL and triglycerides, but failed to identify the major genes regulating baseline glucose levels.",

author = "Alexandre Bureau and Jos{\'e}e Dupuis and Brooke Hayward and Kathleen Falls and {Van Eerdewegh}, Paul",

year = "2003",

doi = "10.1186/1471-2156-4-s1-s64",

language = "English (US)",

volume = "4 Suppl 1",

journal = "BMC genetics",

issn = "1471-2156",

publisher = "BioMed Central",

}

TY - JOUR

T1 - Mapping complex traits using Random Forests.

AU - Bureau, Alexandre

AU - Dupuis, Josée

AU - Hayward, Brooke

AU - Falls, Kathleen

AU - Van Eerdewegh, Paul

PY - 2003

Y1 - 2003

N2 - Random Forest is a prediction technique based on growing trees on bootstrap samples of data, in conjunction with a random selection of explanatory variables to define the best split at each node. In the case of a quantitative outcome, the tree predictor takes on a numerical value. We applied Random Forest to the first replicate of the Genetic Analysis Workshop 13 simulated data set, with the sibling pairs as our units of analysis and identity by descent (IBD) at selected loci as our explanatory variables. With the knowledge of the true model, we performed two sets of analyses on three phenotypes: HDL, triglycerides, and glucose. The goal was to approach the mapping of complex traits from a multivariate perspective. The first set of analyses mimics a candidate gene approach with a high proportion of true genes among the predictors while the second set represents a genome scan analysis using microsatellite markers. Random Forest was able to identify a few of the major genes influencing the phenotypes, such as baseline HDL and triglycerides, but failed to identify the major genes regulating baseline glucose levels.

AB - Random Forest is a prediction technique based on growing trees on bootstrap samples of data, in conjunction with a random selection of explanatory variables to define the best split at each node. In the case of a quantitative outcome, the tree predictor takes on a numerical value. We applied Random Forest to the first replicate of the Genetic Analysis Workshop 13 simulated data set, with the sibling pairs as our units of analysis and identity by descent (IBD) at selected loci as our explanatory variables. With the knowledge of the true model, we performed two sets of analyses on three phenotypes: HDL, triglycerides, and glucose. The goal was to approach the mapping of complex traits from a multivariate perspective. The first set of analyses mimics a candidate gene approach with a high proportion of true genes among the predictors while the second set represents a genome scan analysis using microsatellite markers. Random Forest was able to identify a few of the major genes influencing the phenotypes, such as baseline HDL and triglycerides, but failed to identify the major genes regulating baseline glucose levels.

UR - http://www.scopus.com/inward/record.url?scp=34248632806&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34248632806&partnerID=8YFLogxK

U2 - 10.1186/1471-2156-4-s1-s64

DO - 10.1186/1471-2156-4-s1-s64

M3 - Article

C2 - 14975132

AN - SCOPUS:34248632806

SN - 1471-2156

VL - 4 Suppl 1

JO - BMC genetics

JF - BMC genetics

ER -

Mapping complex traits using Random Forests.

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this