Phenotype harmonization and cross-study collaboration in GWAS consortia: The GENEVA experience

Siiri N. Bennett, Neil Caporaso, Annette L. Fitzpatrick, Arpana Agrawal, Kathleen Barnes, Heather A. Boyd, Marilyn C. Cornelis, Nadia Hansel, Gerardo Heiss, John A. Heit, Jae Hee Kang, Steven J. Kittner, Peter Kraft, William Lowe, Mary L. Marazita, Kristine R. Monroe, Louis R. Pasquale, Erin M. Ramos, Rob M. van Dam, Jenna UdrenKayleen Williams

Research output: Contribution to journalArticle

Abstract

Genome-wide association study (GWAS) consortia and collaborations formed to detect genetic loci for common phenotypes or investigate gene-environment (GE) interactions are increasingly common. While these consortia effectively increase sample size, phenotype heterogeneity across studies represents a major obstacle that limits successful identification of these associations. Investigators are faced with the challenge of how to harmonize previously collected phenotype data obtained using different data collection instruments which cover topics in varying degrees of detail and over diverse time frames. This process has not been described in detail. We describe here some of the strategies and pitfalls associated with combining phenotype data from varying studies. Using the Gene Environment Association Studies (GENEVA) multi-site GWAS consortium as an example, this paper provides an illustration to guide GWAS consortia through the process of phenotype harmonization and describes key issues that arise when sharing data across disparate studies. GENEVA is unusual in the diversity of disease endpoints and so the issues it faces as its participating studies share data will be informative for many collaborations. Phenotype harmonization requires identifying common phenotypes, determining the feasibility of cross-study analysis for each, preparing common definitions, and applying appropriate algorithms. Other issues to be considered include genotyping timeframes, coordination of parallel efforts by other collaborative groups, analytic approaches, and imputation of genotype data. GENEVA's harmonization efforts and policy of promoting data sharing and collaboration, not only within GENEVA but also with outside collaborations, can provide important guidance to ongoing and new consortia.

Original languageEnglish (US)
Pages (from-to)159-173
Number of pages15
JournalGenetic Epidemiology
Volume35
Issue number3
DOIs
StatePublished - Apr 2011

Fingerprint

Genome-Wide Association Study
Phenotype
Genes
Information Dissemination
Gene-Environment Interaction
Genetic Loci
Feasibility Studies
Sample Size
Genotype
Research Personnel

Keywords

  • Consortia
  • GENEVA
  • Genome-wide association studies
  • Harmonization
  • Phenotype

ASJC Scopus subject areas

  • Genetics(clinical)
  • Epidemiology

Cite this

Bennett, S. N., Caporaso, N., Fitzpatrick, A. L., Agrawal, A., Barnes, K., Boyd, H. A., ... Williams, K. (2011). Phenotype harmonization and cross-study collaboration in GWAS consortia: The GENEVA experience. Genetic Epidemiology, 35(3), 159-173. https://doi.org/10.1002/gepi.20564

Phenotype harmonization and cross-study collaboration in GWAS consortia : The GENEVA experience. / Bennett, Siiri N.; Caporaso, Neil; Fitzpatrick, Annette L.; Agrawal, Arpana; Barnes, Kathleen; Boyd, Heather A.; Cornelis, Marilyn C.; Hansel, Nadia; Heiss, Gerardo; Heit, John A.; Kang, Jae Hee; Kittner, Steven J.; Kraft, Peter; Lowe, William; Marazita, Mary L.; Monroe, Kristine R.; Pasquale, Louis R.; Ramos, Erin M.; van Dam, Rob M.; Udren, Jenna; Williams, Kayleen.

In: Genetic Epidemiology, Vol. 35, No. 3, 04.2011, p. 159-173.

Research output: Contribution to journalArticle

Bennett, SN, Caporaso, N, Fitzpatrick, AL, Agrawal, A, Barnes, K, Boyd, HA, Cornelis, MC, Hansel, N, Heiss, G, Heit, JA, Kang, JH, Kittner, SJ, Kraft, P, Lowe, W, Marazita, ML, Monroe, KR, Pasquale, LR, Ramos, EM, van Dam, RM, Udren, J & Williams, K 2011, 'Phenotype harmonization and cross-study collaboration in GWAS consortia: The GENEVA experience', Genetic Epidemiology, vol. 35, no. 3, pp. 159-173. https://doi.org/10.1002/gepi.20564
Bennett SN, Caporaso N, Fitzpatrick AL, Agrawal A, Barnes K, Boyd HA et al. Phenotype harmonization and cross-study collaboration in GWAS consortia: The GENEVA experience. Genetic Epidemiology. 2011 Apr;35(3):159-173. https://doi.org/10.1002/gepi.20564
Bennett, Siiri N. ; Caporaso, Neil ; Fitzpatrick, Annette L. ; Agrawal, Arpana ; Barnes, Kathleen ; Boyd, Heather A. ; Cornelis, Marilyn C. ; Hansel, Nadia ; Heiss, Gerardo ; Heit, John A. ; Kang, Jae Hee ; Kittner, Steven J. ; Kraft, Peter ; Lowe, William ; Marazita, Mary L. ; Monroe, Kristine R. ; Pasquale, Louis R. ; Ramos, Erin M. ; van Dam, Rob M. ; Udren, Jenna ; Williams, Kayleen. / Phenotype harmonization and cross-study collaboration in GWAS consortia : The GENEVA experience. In: Genetic Epidemiology. 2011 ; Vol. 35, No. 3. pp. 159-173.
@article{0bdc4529a3ae473fbd00a97ab0fe4f09,
title = "Phenotype harmonization and cross-study collaboration in GWAS consortia: The GENEVA experience",
abstract = "Genome-wide association study (GWAS) consortia and collaborations formed to detect genetic loci for common phenotypes or investigate gene-environment (GE) interactions are increasingly common. While these consortia effectively increase sample size, phenotype heterogeneity across studies represents a major obstacle that limits successful identification of these associations. Investigators are faced with the challenge of how to harmonize previously collected phenotype data obtained using different data collection instruments which cover topics in varying degrees of detail and over diverse time frames. This process has not been described in detail. We describe here some of the strategies and pitfalls associated with combining phenotype data from varying studies. Using the Gene Environment Association Studies (GENEVA) multi-site GWAS consortium as an example, this paper provides an illustration to guide GWAS consortia through the process of phenotype harmonization and describes key issues that arise when sharing data across disparate studies. GENEVA is unusual in the diversity of disease endpoints and so the issues it faces as its participating studies share data will be informative for many collaborations. Phenotype harmonization requires identifying common phenotypes, determining the feasibility of cross-study analysis for each, preparing common definitions, and applying appropriate algorithms. Other issues to be considered include genotyping timeframes, coordination of parallel efforts by other collaborative groups, analytic approaches, and imputation of genotype data. GENEVA's harmonization efforts and policy of promoting data sharing and collaboration, not only within GENEVA but also with outside collaborations, can provide important guidance to ongoing and new consortia.",
keywords = "Consortia, GENEVA, Genome-wide association studies, Harmonization, Phenotype",
author = "Bennett, {Siiri N.} and Neil Caporaso and Fitzpatrick, {Annette L.} and Arpana Agrawal and Kathleen Barnes and Boyd, {Heather A.} and Cornelis, {Marilyn C.} and Nadia Hansel and Gerardo Heiss and Heit, {John A.} and Kang, {Jae Hee} and Kittner, {Steven J.} and Peter Kraft and William Lowe and Marazita, {Mary L.} and Monroe, {Kristine R.} and Pasquale, {Louis R.} and Ramos, {Erin M.} and {van Dam}, {Rob M.} and Jenna Udren and Kayleen Williams",
year = "2011",
month = "4",
doi = "10.1002/gepi.20564",
language = "English (US)",
volume = "35",
pages = "159--173",
journal = "Genetic Epidemiology",
issn = "0741-0395",
publisher = "Wiley-Liss Inc.",
number = "3",

}

TY - JOUR

T1 - Phenotype harmonization and cross-study collaboration in GWAS consortia

T2 - The GENEVA experience

AU - Bennett, Siiri N.

AU - Caporaso, Neil

AU - Fitzpatrick, Annette L.

AU - Agrawal, Arpana

AU - Barnes, Kathleen

AU - Boyd, Heather A.

AU - Cornelis, Marilyn C.

AU - Hansel, Nadia

AU - Heiss, Gerardo

AU - Heit, John A.

AU - Kang, Jae Hee

AU - Kittner, Steven J.

AU - Kraft, Peter

AU - Lowe, William

AU - Marazita, Mary L.

AU - Monroe, Kristine R.

AU - Pasquale, Louis R.

AU - Ramos, Erin M.

AU - van Dam, Rob M.

AU - Udren, Jenna

AU - Williams, Kayleen

PY - 2011/4

Y1 - 2011/4

N2 - Genome-wide association study (GWAS) consortia and collaborations formed to detect genetic loci for common phenotypes or investigate gene-environment (GE) interactions are increasingly common. While these consortia effectively increase sample size, phenotype heterogeneity across studies represents a major obstacle that limits successful identification of these associations. Investigators are faced with the challenge of how to harmonize previously collected phenotype data obtained using different data collection instruments which cover topics in varying degrees of detail and over diverse time frames. This process has not been described in detail. We describe here some of the strategies and pitfalls associated with combining phenotype data from varying studies. Using the Gene Environment Association Studies (GENEVA) multi-site GWAS consortium as an example, this paper provides an illustration to guide GWAS consortia through the process of phenotype harmonization and describes key issues that arise when sharing data across disparate studies. GENEVA is unusual in the diversity of disease endpoints and so the issues it faces as its participating studies share data will be informative for many collaborations. Phenotype harmonization requires identifying common phenotypes, determining the feasibility of cross-study analysis for each, preparing common definitions, and applying appropriate algorithms. Other issues to be considered include genotyping timeframes, coordination of parallel efforts by other collaborative groups, analytic approaches, and imputation of genotype data. GENEVA's harmonization efforts and policy of promoting data sharing and collaboration, not only within GENEVA but also with outside collaborations, can provide important guidance to ongoing and new consortia.

AB - Genome-wide association study (GWAS) consortia and collaborations formed to detect genetic loci for common phenotypes or investigate gene-environment (GE) interactions are increasingly common. While these consortia effectively increase sample size, phenotype heterogeneity across studies represents a major obstacle that limits successful identification of these associations. Investigators are faced with the challenge of how to harmonize previously collected phenotype data obtained using different data collection instruments which cover topics in varying degrees of detail and over diverse time frames. This process has not been described in detail. We describe here some of the strategies and pitfalls associated with combining phenotype data from varying studies. Using the Gene Environment Association Studies (GENEVA) multi-site GWAS consortium as an example, this paper provides an illustration to guide GWAS consortia through the process of phenotype harmonization and describes key issues that arise when sharing data across disparate studies. GENEVA is unusual in the diversity of disease endpoints and so the issues it faces as its participating studies share data will be informative for many collaborations. Phenotype harmonization requires identifying common phenotypes, determining the feasibility of cross-study analysis for each, preparing common definitions, and applying appropriate algorithms. Other issues to be considered include genotyping timeframes, coordination of parallel efforts by other collaborative groups, analytic approaches, and imputation of genotype data. GENEVA's harmonization efforts and policy of promoting data sharing and collaboration, not only within GENEVA but also with outside collaborations, can provide important guidance to ongoing and new consortia.

KW - Consortia

KW - GENEVA

KW - Genome-wide association studies

KW - Harmonization

KW - Phenotype

UR - http://www.scopus.com/inward/record.url?scp=79952503255&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79952503255&partnerID=8YFLogxK

U2 - 10.1002/gepi.20564

DO - 10.1002/gepi.20564

M3 - Article

C2 - 21284036

AN - SCOPUS:79952503255

VL - 35

SP - 159

EP - 173

JO - Genetic Epidemiology

JF - Genetic Epidemiology

SN - 0741-0395

IS - 3

ER -