Post-Selection Inference Following Aggregate Level Hypothesis Testing in Large-Scale Genomic Data

Ruth Heller; Nilanjan Chatterjee; Abba Krieger; Jianxin Shi

doi:10.1080/01621459.2017.1375933

Post-Selection Inference Following Aggregate Level Hypothesis Testing in Large-Scale Genomic Data

Ruth Heller, Nilanjan Chatterjee, Abba Krieger, Jianxin Shi

Bloomberg School of Public Health

Research output: Contribution to journal › Article › peer-review

6 Scopus citations

Abstract

In many genomic applications, hypotheses tests are performed for powerful identification of signals by aggregating test-statistics across units within naturally defined classes. Following class-level testing, it is naturally of interest to identify the lower level units which contain true signals. Testing the individual units within a class without taking into account the fact that the class was selected using an aggregate-level test-statistic, will produce biased inference. We develop a hypothesis testing framework that guarantees control for false positive rates conditional on the fact that the class was selected. Specifically, we develop procedures for calculating unit level p-values that allows rejection of null hypotheses controlling for two types of conditional error rates, one relating to family-wise rate and the other relating to false discovery rate. We use simulation studies to illustrate validity and power of the proposed procedure in comparison to several possible alternatives. We illustrate the power of the method in a natural application involving whole-genome expression quantitative trait loci (eQTL) analysis across 17 tissue types using data from The Cancer Genome Atlas (TCGA) Project. Supplementary materials for this article are available online.

Original language	English (US)
Pages (from-to)	1770-1783
Number of pages	14
Journal	Journal of the American Statistical Association
Volume	113
Issue number	524
DOIs	https://doi.org/10.1080/01621459.2017.1375933
State	Published - Oct 2 2018

Keywords

Conditional p-value
False discovery rate
Multiple testing
Selective inference

ASJC Scopus subject areas

Statistics and Probability
Statistics, Probability and Uncertainty

Access to Document

10.1080/01621459.2017.1375933

Cite this

@article{2764dbd02cae48ddb98b91e1c9f57fd5,

title = "Post-Selection Inference Following Aggregate Level Hypothesis Testing in Large-Scale Genomic Data",

abstract = "In many genomic applications, hypotheses tests are performed for powerful identification of signals by aggregating test-statistics across units within naturally defined classes. Following class-level testing, it is naturally of interest to identify the lower level units which contain true signals. Testing the individual units within a class without taking into account the fact that the class was selected using an aggregate-level test-statistic, will produce biased inference. We develop a hypothesis testing framework that guarantees control for false positive rates conditional on the fact that the class was selected. Specifically, we develop procedures for calculating unit level p-values that allows rejection of null hypotheses controlling for two types of conditional error rates, one relating to family-wise rate and the other relating to false discovery rate. We use simulation studies to illustrate validity and power of the proposed procedure in comparison to several possible alternatives. We illustrate the power of the method in a natural application involving whole-genome expression quantitative trait loci (eQTL) analysis across 17 tissue types using data from The Cancer Genome Atlas (TCGA) Project. Supplementary materials for this article are available online.",

keywords = "Conditional p-value, False discovery rate, Multiple testing, Selective inference",

author = "Ruth Heller and Nilanjan Chatterjee and Abba Krieger and Jianxin Shi",

note = "Funding Information: Study supported by the National Cancer Institute Intramural Research Program. Ruth Heller acknowledges support by Israel Science Foundation grant no. 1049/16. Publisher Copyright: {\textcopyright} 2018, {\textcopyright} 2018 American Statistical Association.",

year = "2018",

month = oct,

day = "2",

doi = "10.1080/01621459.2017.1375933",

language = "English (US)",

volume = "113",

pages = "1770--1783",

journal = "Journal of the American Statistical Association",

issn = "0162-1459",

publisher = "Taylor and Francis Ltd.",

number = "524",

}

TY - JOUR

T1 - Post-Selection Inference Following Aggregate Level Hypothesis Testing in Large-Scale Genomic Data

AU - Heller, Ruth

AU - Chatterjee, Nilanjan

AU - Krieger, Abba

AU - Shi, Jianxin

N1 - Funding Information: Study supported by the National Cancer Institute Intramural Research Program. Ruth Heller acknowledges support by Israel Science Foundation grant no. 1049/16. Publisher Copyright: © 2018, © 2018 American Statistical Association.

PY - 2018/10/2

Y1 - 2018/10/2

N2 - In many genomic applications, hypotheses tests are performed for powerful identification of signals by aggregating test-statistics across units within naturally defined classes. Following class-level testing, it is naturally of interest to identify the lower level units which contain true signals. Testing the individual units within a class without taking into account the fact that the class was selected using an aggregate-level test-statistic, will produce biased inference. We develop a hypothesis testing framework that guarantees control for false positive rates conditional on the fact that the class was selected. Specifically, we develop procedures for calculating unit level p-values that allows rejection of null hypotheses controlling for two types of conditional error rates, one relating to family-wise rate and the other relating to false discovery rate. We use simulation studies to illustrate validity and power of the proposed procedure in comparison to several possible alternatives. We illustrate the power of the method in a natural application involving whole-genome expression quantitative trait loci (eQTL) analysis across 17 tissue types using data from The Cancer Genome Atlas (TCGA) Project. Supplementary materials for this article are available online.

AB - In many genomic applications, hypotheses tests are performed for powerful identification of signals by aggregating test-statistics across units within naturally defined classes. Following class-level testing, it is naturally of interest to identify the lower level units which contain true signals. Testing the individual units within a class without taking into account the fact that the class was selected using an aggregate-level test-statistic, will produce biased inference. We develop a hypothesis testing framework that guarantees control for false positive rates conditional on the fact that the class was selected. Specifically, we develop procedures for calculating unit level p-values that allows rejection of null hypotheses controlling for two types of conditional error rates, one relating to family-wise rate and the other relating to false discovery rate. We use simulation studies to illustrate validity and power of the proposed procedure in comparison to several possible alternatives. We illustrate the power of the method in a natural application involving whole-genome expression quantitative trait loci (eQTL) analysis across 17 tissue types using data from The Cancer Genome Atlas (TCGA) Project. Supplementary materials for this article are available online.

KW - Conditional p-value

KW - False discovery rate

KW - Multiple testing

KW - Selective inference

UR - http://www.scopus.com/inward/record.url?scp=85049152891&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85049152891&partnerID=8YFLogxK

U2 - 10.1080/01621459.2017.1375933

DO - 10.1080/01621459.2017.1375933

M3 - Article

AN - SCOPUS:85049152891

SN - 0162-1459

VL - 113

SP - 1770

EP - 1783

JO - Journal of the American Statistical Association

JF - Journal of the American Statistical Association

IS - 524

ER -

Post-Selection Inference Following Aggregate Level Hypothesis Testing in Large-Scale Genomic Data

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this