A Decision-Theory Approach to Interpretable Set Analysis for High-Dimensional Data

Simina M. Boca, Héctor Corrada Bravo, Brian Caffo, Jeffrey T. Leek, Giovanni Parmigiani

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

A key problem in high-dimensional significance analysis is to find pre-defined sets that show enrichment for a statistical signal of interest; the classic example is the enrichment of gene sets for differentially expressed genes. Here, we propose a new decision-theory approach to the analysis of gene sets which focuses on estimating the fraction of non-null variables in a set. We introduce the idea of "atoms," non-overlapping sets based on the original pre-defined set annotations. Our approach focuses on finding the union of atoms that minimizes a weighted average of the number of false discoveries and missed discoveries. We introduce a new false discovery rate for sets, called the atomic false discovery rate (afdr), and prove that the optimal estimator in our decision-theory framework is to threshold the afdr. These results provide a coherent and interpretable framework for the analysis of sets that addresses the key issues of overlapping annotations and difficulty in interpreting p values in both competitive and self-contained tests. We illustrate our method and compare it to a popular existing method using simulated examples, as well as gene-set and brain ROI data analyses.

Original languageEnglish (US)
Pages (from-to)614-623
Number of pages10
JournalBiometrics
Volume69
Issue number3
DOIs
StatePublished - Sep 2013

Keywords

  • Atomic false discovery rate
  • Gene-sets
  • Hypothesis testing
  • Set-level inference

ASJC Scopus subject areas

  • Statistics and Probability
  • General Biochemistry, Genetics and Molecular Biology
  • General Immunology and Microbiology
  • General Agricultural and Biological Sciences
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'A Decision-Theory Approach to Interpretable Set Analysis for High-Dimensional Data'. Together they form a unique fingerprint.

Cite this