False discovery rates in somatic mutation studies of cancer

Lorenzo Trippa, Giovanni Parmigiani

Research output: Contribution to journalArticle

Abstract

The purpose of cancer genome sequencing studies is to determine the nature and types of alterations present in a typical cancer and to discover genes mutated at high frequencies. In this article we discuss statistical methods for the analysis of somatic mutation frequency data generated in these studies. We place special emphasis on a two-stage study design introduced by Sjöblom et al. [Science 314 (2006) 268-274]. In this context, we describe and compare statistical methods for constructing scores that can be used to prioritize candidate genes for further investigation and to assess the statistical significance of the candidates thus identified. Controversy has surrounded the reliability of the false discovery rates estimates provided by the approximations used in early cancer genome studies. To address these, we develop a semiparametric Bayesian model that provides an accurate fit to the data. We use this model to generate a large collection of realistic scenarios, and evaluate alternative approaches on this collection. Our assessment is impartial in that the model used for generating data is not used by any of the approaches compared. And is objective, in that the scenarios are generated by a model that fits data. Our results quantify the conservative control of the false discovery rate with the Benjamini and Hockberg method compared to the empirical Bayes approach and the multiple testing method proposed in Storey [J. R. Stat. Soc. Ser. B Stat. Methodol. 64 (2002) 479-498]. Simulation results also show a negligible departure from the target false discovery rate for the methodology used in Sjöblom et al.

Original languageEnglish (US)
Pages (from-to)1360-1378
Number of pages19
JournalAnnals of Applied Statistics
Volume5
Issue number2 B
DOIs
StatePublished - Jun 2011
Externally publishedYes

Fingerprint

Cancer
Mutation
Genes
Statistical method
Statistical methods
Genome
Two-stage Design
Gene
Scenarios
Empirical Bayes
Multiple Testing
Semiparametric Model
Statistical Significance
Bayesian Model
Sequencing
Quantify
Model
Target
False
Methodology

Keywords

  • Cancer genome studies
  • False discovery rate
  • Genome-wide studies
  • Multiple hypothesis testing
  • Somatic mutations

ASJC Scopus subject areas

  • Statistics, Probability and Uncertainty
  • Modeling and Simulation
  • Statistics and Probability

Cite this

False discovery rates in somatic mutation studies of cancer. / Trippa, Lorenzo; Parmigiani, Giovanni.

In: Annals of Applied Statistics, Vol. 5, No. 2 B, 06.2011, p. 1360-1378.

Research output: Contribution to journalArticle

Trippa, Lorenzo ; Parmigiani, Giovanni. / False discovery rates in somatic mutation studies of cancer. In: Annals of Applied Statistics. 2011 ; Vol. 5, No. 2 B. pp. 1360-1378.
@article{282d71e6bb474e96992106c1e942059a,
title = "False discovery rates in somatic mutation studies of cancer",
abstract = "The purpose of cancer genome sequencing studies is to determine the nature and types of alterations present in a typical cancer and to discover genes mutated at high frequencies. In this article we discuss statistical methods for the analysis of somatic mutation frequency data generated in these studies. We place special emphasis on a two-stage study design introduced by Sj{\"o}blom et al. [Science 314 (2006) 268-274]. In this context, we describe and compare statistical methods for constructing scores that can be used to prioritize candidate genes for further investigation and to assess the statistical significance of the candidates thus identified. Controversy has surrounded the reliability of the false discovery rates estimates provided by the approximations used in early cancer genome studies. To address these, we develop a semiparametric Bayesian model that provides an accurate fit to the data. We use this model to generate a large collection of realistic scenarios, and evaluate alternative approaches on this collection. Our assessment is impartial in that the model used for generating data is not used by any of the approaches compared. And is objective, in that the scenarios are generated by a model that fits data. Our results quantify the conservative control of the false discovery rate with the Benjamini and Hockberg method compared to the empirical Bayes approach and the multiple testing method proposed in Storey [J. R. Stat. Soc. Ser. B Stat. Methodol. 64 (2002) 479-498]. Simulation results also show a negligible departure from the target false discovery rate for the methodology used in Sj{\"o}blom et al.",
keywords = "Cancer genome studies, False discovery rate, Genome-wide studies, Multiple hypothesis testing, Somatic mutations",
author = "Lorenzo Trippa and Giovanni Parmigiani",
year = "2011",
month = "6",
doi = "10.1214/10-AOAS438",
language = "English (US)",
volume = "5",
pages = "1360--1378",
journal = "Annals of Applied Statistics",
issn = "1932-6157",
publisher = "Institute of Mathematical Statistics",
number = "2 B",

}

TY - JOUR

T1 - False discovery rates in somatic mutation studies of cancer

AU - Trippa, Lorenzo

AU - Parmigiani, Giovanni

PY - 2011/6

Y1 - 2011/6

N2 - The purpose of cancer genome sequencing studies is to determine the nature and types of alterations present in a typical cancer and to discover genes mutated at high frequencies. In this article we discuss statistical methods for the analysis of somatic mutation frequency data generated in these studies. We place special emphasis on a two-stage study design introduced by Sjöblom et al. [Science 314 (2006) 268-274]. In this context, we describe and compare statistical methods for constructing scores that can be used to prioritize candidate genes for further investigation and to assess the statistical significance of the candidates thus identified. Controversy has surrounded the reliability of the false discovery rates estimates provided by the approximations used in early cancer genome studies. To address these, we develop a semiparametric Bayesian model that provides an accurate fit to the data. We use this model to generate a large collection of realistic scenarios, and evaluate alternative approaches on this collection. Our assessment is impartial in that the model used for generating data is not used by any of the approaches compared. And is objective, in that the scenarios are generated by a model that fits data. Our results quantify the conservative control of the false discovery rate with the Benjamini and Hockberg method compared to the empirical Bayes approach and the multiple testing method proposed in Storey [J. R. Stat. Soc. Ser. B Stat. Methodol. 64 (2002) 479-498]. Simulation results also show a negligible departure from the target false discovery rate for the methodology used in Sjöblom et al.

AB - The purpose of cancer genome sequencing studies is to determine the nature and types of alterations present in a typical cancer and to discover genes mutated at high frequencies. In this article we discuss statistical methods for the analysis of somatic mutation frequency data generated in these studies. We place special emphasis on a two-stage study design introduced by Sjöblom et al. [Science 314 (2006) 268-274]. In this context, we describe and compare statistical methods for constructing scores that can be used to prioritize candidate genes for further investigation and to assess the statistical significance of the candidates thus identified. Controversy has surrounded the reliability of the false discovery rates estimates provided by the approximations used in early cancer genome studies. To address these, we develop a semiparametric Bayesian model that provides an accurate fit to the data. We use this model to generate a large collection of realistic scenarios, and evaluate alternative approaches on this collection. Our assessment is impartial in that the model used for generating data is not used by any of the approaches compared. And is objective, in that the scenarios are generated by a model that fits data. Our results quantify the conservative control of the false discovery rate with the Benjamini and Hockberg method compared to the empirical Bayes approach and the multiple testing method proposed in Storey [J. R. Stat. Soc. Ser. B Stat. Methodol. 64 (2002) 479-498]. Simulation results also show a negligible departure from the target false discovery rate for the methodology used in Sjöblom et al.

KW - Cancer genome studies

KW - False discovery rate

KW - Genome-wide studies

KW - Multiple hypothesis testing

KW - Somatic mutations

UR - http://www.scopus.com/inward/record.url?scp=84862797765&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84862797765&partnerID=8YFLogxK

U2 - 10.1214/10-AOAS438

DO - 10.1214/10-AOAS438

M3 - Article

AN - SCOPUS:84862797765

VL - 5

SP - 1360

EP - 1378

JO - Annals of Applied Statistics

JF - Annals of Applied Statistics

SN - 1932-6157

IS - 2 B

ER -