Post-Selection Inference Following Aggregate Level Hypothesis Testing in Large-Scale Genomic Data

Ruth Heller, Nilanjan Chatterjee, Abba Krieger, Jianxin Shi

Research output: Contribution to journalArticle

Abstract

In many genomic applications, hypotheses tests are performed for powerful identification of signals by aggregating test-statistics across units within naturally defined classes. Following class-level testing, it is naturally of interest to identify the lower level units which contain true signals. Testing the individual units within a class without taking into account the fact that the class was selected using an aggregate-level test-statistic, will produce biased inference. We develop a hypothesis testing framework that guarantees control for false positive rates conditional on the fact that the class was selected. Specifically, we develop procedures for calculating unit level p-values that allows rejection of null hypotheses controlling for two types of conditional error rates, one relating to family-wise rate and the other relating to false discovery rate. We use simulation studies to illustrate validity and power of the proposed procedure in comparison to several possible alternatives. We illustrate the power of the method in a natural application involving whole-genome expression quantitative trait loci (eQTL) analysis across 17 tissue types using data from The Cancer Genome Atlas (TCGA) Project. Supplementary materials for this article are available online.

Original languageEnglish (US)
Pages (from-to)1-14
Number of pages14
JournalJournal of the American Statistical Association
DOIs
StateAccepted/In press - Jun 25 2018

Fingerprint

Hypothesis Testing
Genomics
Unit
Test Statistic
Genome
Quantitative Trait Loci
Testing
Atlas
Hypothesis Test
p-Value
False Positive
Rejection
Null hypothesis
Biased
Error Rate
Cancer
Class
Inference
Hypothesis testing
Simulation Study

Keywords

  • Conditional p-value
  • False discovery rate
  • Multiple testing
  • Selective inference

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this

Post-Selection Inference Following Aggregate Level Hypothesis Testing in Large-Scale Genomic Data. / Heller, Ruth; Chatterjee, Nilanjan; Krieger, Abba; Shi, Jianxin.

In: Journal of the American Statistical Association, 25.06.2018, p. 1-14.

Research output: Contribution to journalArticle

@article{2764dbd02cae48ddb98b91e1c9f57fd5,
title = "Post-Selection Inference Following Aggregate Level Hypothesis Testing in Large-Scale Genomic Data",
abstract = "In many genomic applications, hypotheses tests are performed for powerful identification of signals by aggregating test-statistics across units within naturally defined classes. Following class-level testing, it is naturally of interest to identify the lower level units which contain true signals. Testing the individual units within a class without taking into account the fact that the class was selected using an aggregate-level test-statistic, will produce biased inference. We develop a hypothesis testing framework that guarantees control for false positive rates conditional on the fact that the class was selected. Specifically, we develop procedures for calculating unit level p-values that allows rejection of null hypotheses controlling for two types of conditional error rates, one relating to family-wise rate and the other relating to false discovery rate. We use simulation studies to illustrate validity and power of the proposed procedure in comparison to several possible alternatives. We illustrate the power of the method in a natural application involving whole-genome expression quantitative trait loci (eQTL) analysis across 17 tissue types using data from The Cancer Genome Atlas (TCGA) Project. Supplementary materials for this article are available online.",
keywords = "Conditional p-value, False discovery rate, Multiple testing, Selective inference",
author = "Ruth Heller and Nilanjan Chatterjee and Abba Krieger and Jianxin Shi",
year = "2018",
month = "6",
day = "25",
doi = "10.1080/01621459.2017.1375933",
language = "English (US)",
pages = "1--14",
journal = "Journal of the American Statistical Association",
issn = "0162-1459",
publisher = "Taylor and Francis Ltd.",

}

TY - JOUR

T1 - Post-Selection Inference Following Aggregate Level Hypothesis Testing in Large-Scale Genomic Data

AU - Heller, Ruth

AU - Chatterjee, Nilanjan

AU - Krieger, Abba

AU - Shi, Jianxin

PY - 2018/6/25

Y1 - 2018/6/25

N2 - In many genomic applications, hypotheses tests are performed for powerful identification of signals by aggregating test-statistics across units within naturally defined classes. Following class-level testing, it is naturally of interest to identify the lower level units which contain true signals. Testing the individual units within a class without taking into account the fact that the class was selected using an aggregate-level test-statistic, will produce biased inference. We develop a hypothesis testing framework that guarantees control for false positive rates conditional on the fact that the class was selected. Specifically, we develop procedures for calculating unit level p-values that allows rejection of null hypotheses controlling for two types of conditional error rates, one relating to family-wise rate and the other relating to false discovery rate. We use simulation studies to illustrate validity and power of the proposed procedure in comparison to several possible alternatives. We illustrate the power of the method in a natural application involving whole-genome expression quantitative trait loci (eQTL) analysis across 17 tissue types using data from The Cancer Genome Atlas (TCGA) Project. Supplementary materials for this article are available online.

AB - In many genomic applications, hypotheses tests are performed for powerful identification of signals by aggregating test-statistics across units within naturally defined classes. Following class-level testing, it is naturally of interest to identify the lower level units which contain true signals. Testing the individual units within a class without taking into account the fact that the class was selected using an aggregate-level test-statistic, will produce biased inference. We develop a hypothesis testing framework that guarantees control for false positive rates conditional on the fact that the class was selected. Specifically, we develop procedures for calculating unit level p-values that allows rejection of null hypotheses controlling for two types of conditional error rates, one relating to family-wise rate and the other relating to false discovery rate. We use simulation studies to illustrate validity and power of the proposed procedure in comparison to several possible alternatives. We illustrate the power of the method in a natural application involving whole-genome expression quantitative trait loci (eQTL) analysis across 17 tissue types using data from The Cancer Genome Atlas (TCGA) Project. Supplementary materials for this article are available online.

KW - Conditional p-value

KW - False discovery rate

KW - Multiple testing

KW - Selective inference

UR - http://www.scopus.com/inward/record.url?scp=85049152891&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85049152891&partnerID=8YFLogxK

U2 - 10.1080/01621459.2017.1375933

DO - 10.1080/01621459.2017.1375933

M3 - Article

AN - SCOPUS:85049152891

SP - 1

EP - 14

JO - Journal of the American Statistical Association

JF - Journal of the American Statistical Association

SN - 0162-1459

ER -