Hierarchical Bayesian analysis of somatic mutation data in cancer

Jie Ding, Lorenzo Trippa, Xiaogang Zhong, Giovanni Parmigiani

Research output: Contribution to journalArticle

Abstract

Identifying genes underlying cancer development is critical to cancer biology and has important implications across prevention, diagnosis and treatment. Cancer sequencing studies aim at discovering genes with high frequencies of somatic mutations in specific types of cancer, as these genes are potential driving factors (drivers) for cancer development. We introduce a hierarchical Bayesian methodology to estimate gene-specific mutation rates and driver probabilities from somatic mutation data and to shed light on the overall proportion of drivers among sequenced genes. Our methodology applies to different experimental designs used in practice, including one-stage, two-stage and candidate gene designs. Also, sample sizes are typically small relative to the rarity of individual mutations. Via a shrinkage method borrowing strength from the whole genome in assessing individual genes, we reinforce inference and address the selection effects induced by multistage designs. Our simulation studies show that the posterior driver probabilities provide a nearly unbiased false discovery rate estimate. We apply our methods to pancreatic and breast cancer data, contrast our results to previous estimates and provide estimated proportions of drivers for these two types of cancer.

Original languageEnglish (US)
Pages (from-to)883-903
Number of pages21
JournalAnnals of Applied Statistics
Volume7
Issue number2
DOIs
StatePublished - Jun 2013
Externally publishedYes

Fingerprint

Bayesian Analysis
Cancer
Mutation
Genes
Gene
Driver
Proportion
Estimate
Methodology
Shrinkage
Bayesian analysis
Experimental design
Breast Cancer
Sequencing
Biology
Design of experiments
Sample Size
Genome
Simulation Study

Keywords

  • Drivers and passengers
  • Hierarchical Bayesian model
  • Pancreatic and breast cancer
  • Somatic mutations

ASJC Scopus subject areas

  • Statistics, Probability and Uncertainty
  • Modeling and Simulation
  • Statistics and Probability

Cite this

Hierarchical Bayesian analysis of somatic mutation data in cancer. / Ding, Jie; Trippa, Lorenzo; Zhong, Xiaogang; Parmigiani, Giovanni.

In: Annals of Applied Statistics, Vol. 7, No. 2, 06.2013, p. 883-903.

Research output: Contribution to journalArticle

Ding, J, Trippa, L, Zhong, X & Parmigiani, G 2013, 'Hierarchical Bayesian analysis of somatic mutation data in cancer', Annals of Applied Statistics, vol. 7, no. 2, pp. 883-903. https://doi.org/10.1214/12-AOAS604
Ding, Jie ; Trippa, Lorenzo ; Zhong, Xiaogang ; Parmigiani, Giovanni. / Hierarchical Bayesian analysis of somatic mutation data in cancer. In: Annals of Applied Statistics. 2013 ; Vol. 7, No. 2. pp. 883-903.
@article{e3890f8a327e4604aaa283bc86ffd021,
title = "Hierarchical Bayesian analysis of somatic mutation data in cancer",
abstract = "Identifying genes underlying cancer development is critical to cancer biology and has important implications across prevention, diagnosis and treatment. Cancer sequencing studies aim at discovering genes with high frequencies of somatic mutations in specific types of cancer, as these genes are potential driving factors (drivers) for cancer development. We introduce a hierarchical Bayesian methodology to estimate gene-specific mutation rates and driver probabilities from somatic mutation data and to shed light on the overall proportion of drivers among sequenced genes. Our methodology applies to different experimental designs used in practice, including one-stage, two-stage and candidate gene designs. Also, sample sizes are typically small relative to the rarity of individual mutations. Via a shrinkage method borrowing strength from the whole genome in assessing individual genes, we reinforce inference and address the selection effects induced by multistage designs. Our simulation studies show that the posterior driver probabilities provide a nearly unbiased false discovery rate estimate. We apply our methods to pancreatic and breast cancer data, contrast our results to previous estimates and provide estimated proportions of drivers for these two types of cancer.",
keywords = "Drivers and passengers, Hierarchical Bayesian model, Pancreatic and breast cancer, Somatic mutations",
author = "Jie Ding and Lorenzo Trippa and Xiaogang Zhong and Giovanni Parmigiani",
year = "2013",
month = "6",
doi = "10.1214/12-AOAS604",
language = "English (US)",
volume = "7",
pages = "883--903",
journal = "Annals of Applied Statistics",
issn = "1932-6157",
publisher = "Institute of Mathematical Statistics",
number = "2",

}

TY - JOUR

T1 - Hierarchical Bayesian analysis of somatic mutation data in cancer

AU - Ding, Jie

AU - Trippa, Lorenzo

AU - Zhong, Xiaogang

AU - Parmigiani, Giovanni

PY - 2013/6

Y1 - 2013/6

N2 - Identifying genes underlying cancer development is critical to cancer biology and has important implications across prevention, diagnosis and treatment. Cancer sequencing studies aim at discovering genes with high frequencies of somatic mutations in specific types of cancer, as these genes are potential driving factors (drivers) for cancer development. We introduce a hierarchical Bayesian methodology to estimate gene-specific mutation rates and driver probabilities from somatic mutation data and to shed light on the overall proportion of drivers among sequenced genes. Our methodology applies to different experimental designs used in practice, including one-stage, two-stage and candidate gene designs. Also, sample sizes are typically small relative to the rarity of individual mutations. Via a shrinkage method borrowing strength from the whole genome in assessing individual genes, we reinforce inference and address the selection effects induced by multistage designs. Our simulation studies show that the posterior driver probabilities provide a nearly unbiased false discovery rate estimate. We apply our methods to pancreatic and breast cancer data, contrast our results to previous estimates and provide estimated proportions of drivers for these two types of cancer.

AB - Identifying genes underlying cancer development is critical to cancer biology and has important implications across prevention, diagnosis and treatment. Cancer sequencing studies aim at discovering genes with high frequencies of somatic mutations in specific types of cancer, as these genes are potential driving factors (drivers) for cancer development. We introduce a hierarchical Bayesian methodology to estimate gene-specific mutation rates and driver probabilities from somatic mutation data and to shed light on the overall proportion of drivers among sequenced genes. Our methodology applies to different experimental designs used in practice, including one-stage, two-stage and candidate gene designs. Also, sample sizes are typically small relative to the rarity of individual mutations. Via a shrinkage method borrowing strength from the whole genome in assessing individual genes, we reinforce inference and address the selection effects induced by multistage designs. Our simulation studies show that the posterior driver probabilities provide a nearly unbiased false discovery rate estimate. We apply our methods to pancreatic and breast cancer data, contrast our results to previous estimates and provide estimated proportions of drivers for these two types of cancer.

KW - Drivers and passengers

KW - Hierarchical Bayesian model

KW - Pancreatic and breast cancer

KW - Somatic mutations

UR - http://www.scopus.com/inward/record.url?scp=84879531491&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84879531491&partnerID=8YFLogxK

U2 - 10.1214/12-AOAS604

DO - 10.1214/12-AOAS604

M3 - Article

AN - SCOPUS:84879531491

VL - 7

SP - 883

EP - 903

JO - Annals of Applied Statistics

JF - Annals of Applied Statistics

SN - 1932-6157

IS - 2

ER -