Survival models with preclustered gene groups as covariates

Kai Kammers; Michel Lang; Jan G. Hengstler; Marcus Schmidt; Jörg Rahnenführer

doi:10.1186/1471-2105-12-478

Survival models with preclustered gene groups as covariates

Kai Kammers, Michel Lang, Jan G. Hengstler, Marcus Schmidt, Jörg Rahnenführer

Research output: Contribution to journal › Article › peer-review

18 Scopus citations

Abstract

Background: An important application of high dimensional gene expression measurements is the risk prediction and the interpretation of the variables in the resulting survival models. A major problem in this context is the typically large number of genes compared to the number of observations (individuals). Feature selection procedures can generate predictive models with high prediction accuracy and at the same time low model complexity. However, interpretability of the resulting models is still limited due to little knowledge on many of the remaining selected genes. Thus, we summarize genes as gene groups defined by the hierarchically structured Gene Ontology (GO) and include these gene groups as covariates in the hazard regression models. Since expression profiles within GO groups are often heterogeneous, we present a new method to obtain subgroups with coherent patterns. We apply preclustering to genes within GO groups according to the correlation of their gene expression measurements.Results: We compare Cox models for modeling disease free survival times of breast cancer patients. Besides classical clinical covariates we consider genes, GO groups and preclustered GO groups as additional genomic covariates. Survival models with preclustered gene groups as covariates have similar prediction accuracy as models built only with single genes or GO groups.Conclusions: The preclustering information enables a more detailed analysis of the biological meaning of covariates selected in the final models. Compared to models built only with single genes there is additional functional information contained in the GO annotation, and compared to models using GO groups as covariates the preclustering yields coherent representative gene expression profiles.

Original language	English (US)
Article number	478
Journal	BMC Bioinformatics
Volume	12
Issue number	1
DOIs	https://doi.org/10.1186/1471-2105-12-478
State	Published - Dec 16 2011
Externally published	Yes

ASJC Scopus subject areas

Structural Biology
Biochemistry
Molecular Biology
Computer Science Applications
Applied Mathematics

Access to Document

10.1186/1471-2105-12-478

Cite this

@article{b46732009b64485c99f342407bc3c0e7,

title = "Survival models with preclustered gene groups as covariates",

abstract = "Background: An important application of high dimensional gene expression measurements is the risk prediction and the interpretation of the variables in the resulting survival models. A major problem in this context is the typically large number of genes compared to the number of observations (individuals). Feature selection procedures can generate predictive models with high prediction accuracy and at the same time low model complexity. However, interpretability of the resulting models is still limited due to little knowledge on many of the remaining selected genes. Thus, we summarize genes as gene groups defined by the hierarchically structured Gene Ontology (GO) and include these gene groups as covariates in the hazard regression models. Since expression profiles within GO groups are often heterogeneous, we present a new method to obtain subgroups with coherent patterns. We apply preclustering to genes within GO groups according to the correlation of their gene expression measurements.Results: We compare Cox models for modeling disease free survival times of breast cancer patients. Besides classical clinical covariates we consider genes, GO groups and preclustered GO groups as additional genomic covariates. Survival models with preclustered gene groups as covariates have similar prediction accuracy as models built only with single genes or GO groups.Conclusions: The preclustering information enables a more detailed analysis of the biological meaning of covariates selected in the final models. Compared to models built only with single genes there is additional functional information contained in the GO annotation, and compared to models using GO groups as covariates the preclustering yields coherent representative gene expression profiles.",

author = "Kai Kammers and Michel Lang and Hengstler, {Jan G.} and Marcus Schmidt and J{\"o}rg Rahnenf{\"u}hrer",

note = "Funding Information: The work on this paper has been supported by the German Research Foundation (DFG) within the Research Training Group “Statistical Modelling”, project C2 and the Collaborative Research Center SFB 876 “Providing Information by Resource-Constrained Analysis”, project A3.",

year = "2011",

month = dec,

day = "16",

doi = "10.1186/1471-2105-12-478",

language = "English (US)",

volume = "12",

journal = "BMC Bioinformatics",

issn = "1471-2105",

publisher = "BioMed Central",

number = "1",

}

TY - JOUR

T1 - Survival models with preclustered gene groups as covariates

AU - Kammers, Kai

AU - Lang, Michel

AU - Hengstler, Jan G.

AU - Schmidt, Marcus

AU - Rahnenführer, Jörg

N1 - Funding Information: The work on this paper has been supported by the German Research Foundation (DFG) within the Research Training Group “Statistical Modelling”, project C2 and the Collaborative Research Center SFB 876 “Providing Information by Resource-Constrained Analysis”, project A3.

PY - 2011/12/16

Y1 - 2011/12/16

N2 - Background: An important application of high dimensional gene expression measurements is the risk prediction and the interpretation of the variables in the resulting survival models. A major problem in this context is the typically large number of genes compared to the number of observations (individuals). Feature selection procedures can generate predictive models with high prediction accuracy and at the same time low model complexity. However, interpretability of the resulting models is still limited due to little knowledge on many of the remaining selected genes. Thus, we summarize genes as gene groups defined by the hierarchically structured Gene Ontology (GO) and include these gene groups as covariates in the hazard regression models. Since expression profiles within GO groups are often heterogeneous, we present a new method to obtain subgroups with coherent patterns. We apply preclustering to genes within GO groups according to the correlation of their gene expression measurements.Results: We compare Cox models for modeling disease free survival times of breast cancer patients. Besides classical clinical covariates we consider genes, GO groups and preclustered GO groups as additional genomic covariates. Survival models with preclustered gene groups as covariates have similar prediction accuracy as models built only with single genes or GO groups.Conclusions: The preclustering information enables a more detailed analysis of the biological meaning of covariates selected in the final models. Compared to models built only with single genes there is additional functional information contained in the GO annotation, and compared to models using GO groups as covariates the preclustering yields coherent representative gene expression profiles.

AB - Background: An important application of high dimensional gene expression measurements is the risk prediction and the interpretation of the variables in the resulting survival models. A major problem in this context is the typically large number of genes compared to the number of observations (individuals). Feature selection procedures can generate predictive models with high prediction accuracy and at the same time low model complexity. However, interpretability of the resulting models is still limited due to little knowledge on many of the remaining selected genes. Thus, we summarize genes as gene groups defined by the hierarchically structured Gene Ontology (GO) and include these gene groups as covariates in the hazard regression models. Since expression profiles within GO groups are often heterogeneous, we present a new method to obtain subgroups with coherent patterns. We apply preclustering to genes within GO groups according to the correlation of their gene expression measurements.Results: We compare Cox models for modeling disease free survival times of breast cancer patients. Besides classical clinical covariates we consider genes, GO groups and preclustered GO groups as additional genomic covariates. Survival models with preclustered gene groups as covariates have similar prediction accuracy as models built only with single genes or GO groups.Conclusions: The preclustering information enables a more detailed analysis of the biological meaning of covariates selected in the final models. Compared to models built only with single genes there is additional functional information contained in the GO annotation, and compared to models using GO groups as covariates the preclustering yields coherent representative gene expression profiles.

UR - http://www.scopus.com/inward/record.url?scp=83455203586&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=83455203586&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-12-478

DO - 10.1186/1471-2105-12-478

M3 - Article

C2 - 22177110

AN - SCOPUS:83455203586

SN - 1471-2105

VL - 12

JO - BMC Bioinformatics

JF - BMC Bioinformatics

IS - 1

M1 - 478

ER -

Survival models with preclustered gene groups as covariates

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this