A practical guide to methods controlling false discoveries in computational biology

Keegan Korthauer, Patrick K. Kimes, Claire Duvallet, Alejandro Reyes, Ayshwarya Subramanian, Mingxiang Teng, Chinmay Shukla, Eric J. Alm, Stephanie Hicks

Research output: Contribution to journalArticle

Abstract

Background: In high-throughput studies, hundreds to millions of hypotheses are typically tested. Statistical methods that control the false discovery rate (FDR) have emerged as popular and powerful tools for error rate control. While classic FDR methods use only p values as input, more modern FDR methods have been shown to increase power by incorporating complementary information as informative covariates to prioritize, weight, and group hypotheses. However, there is currently no consensus on how the modern methods compare to one another. We investigate the accuracy, applicability, and ease of use of two classic and six modern FDR-controlling methods by performing a systematic benchmark comparison using simulation studies as well as six case studies in computational biology. Results: Methods that incorporate informative covariates are modestly more powerful than classic approaches, and do not underperform classic approaches, even when the covariate is completely uninformative. The majority of methods are successful at controlling the FDR, with the exception of two modern methods under certain settings. Furthermore, we find that the improvement of the modern FDR methods over the classic methods increases with the informativeness of the covariate, total number of hypothesis tests, and proportion of truly non-null hypotheses. Conclusions: Modern FDR methods that use an informative covariate provide advantages over classic FDR-controlling procedures, with the relative gain dependent on the application and informativeness of available covariates. We present our findings as a practical guide and provide recommendations to aid researchers in their choice of methods to correct for false discoveries.

Original languageEnglish (US)
Article number118
JournalGenome biology
Volume20
Issue number1
DOIs
StatePublished - Jun 4 2019

Fingerprint

Computational Biology
bioinformatics
methodology
method
rate
Benchmarking
statistical analysis
researchers
case studies
Research Personnel

Keywords

  • ChIP-seq
  • False discovery rate
  • Gene set analysis
  • GWAS
  • Microbiome
  • Multiple hypothesis testing
  • RNA-seq
  • ScRNA-seq

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Genetics
  • Cell Biology

Cite this

Korthauer, K., Kimes, P. K., Duvallet, C., Reyes, A., Subramanian, A., Teng, M., ... Hicks, S. (2019). A practical guide to methods controlling false discoveries in computational biology. Genome biology, 20(1), [118]. https://doi.org/10.1186/s13059-019-1716-1

A practical guide to methods controlling false discoveries in computational biology. / Korthauer, Keegan; Kimes, Patrick K.; Duvallet, Claire; Reyes, Alejandro; Subramanian, Ayshwarya; Teng, Mingxiang; Shukla, Chinmay; Alm, Eric J.; Hicks, Stephanie.

In: Genome biology, Vol. 20, No. 1, 118, 04.06.2019.

Research output: Contribution to journalArticle

Korthauer, K, Kimes, PK, Duvallet, C, Reyes, A, Subramanian, A, Teng, M, Shukla, C, Alm, EJ & Hicks, S 2019, 'A practical guide to methods controlling false discoveries in computational biology', Genome biology, vol. 20, no. 1, 118. https://doi.org/10.1186/s13059-019-1716-1
Korthauer K, Kimes PK, Duvallet C, Reyes A, Subramanian A, Teng M et al. A practical guide to methods controlling false discoveries in computational biology. Genome biology. 2019 Jun 4;20(1). 118. https://doi.org/10.1186/s13059-019-1716-1
Korthauer, Keegan ; Kimes, Patrick K. ; Duvallet, Claire ; Reyes, Alejandro ; Subramanian, Ayshwarya ; Teng, Mingxiang ; Shukla, Chinmay ; Alm, Eric J. ; Hicks, Stephanie. / A practical guide to methods controlling false discoveries in computational biology. In: Genome biology. 2019 ; Vol. 20, No. 1.
@article{bbcc09794e614be6b26fa5044db85539,
title = "A practical guide to methods controlling false discoveries in computational biology",
abstract = "Background: In high-throughput studies, hundreds to millions of hypotheses are typically tested. Statistical methods that control the false discovery rate (FDR) have emerged as popular and powerful tools for error rate control. While classic FDR methods use only p values as input, more modern FDR methods have been shown to increase power by incorporating complementary information as informative covariates to prioritize, weight, and group hypotheses. However, there is currently no consensus on how the modern methods compare to one another. We investigate the accuracy, applicability, and ease of use of two classic and six modern FDR-controlling methods by performing a systematic benchmark comparison using simulation studies as well as six case studies in computational biology. Results: Methods that incorporate informative covariates are modestly more powerful than classic approaches, and do not underperform classic approaches, even when the covariate is completely uninformative. The majority of methods are successful at controlling the FDR, with the exception of two modern methods under certain settings. Furthermore, we find that the improvement of the modern FDR methods over the classic methods increases with the informativeness of the covariate, total number of hypothesis tests, and proportion of truly non-null hypotheses. Conclusions: Modern FDR methods that use an informative covariate provide advantages over classic FDR-controlling procedures, with the relative gain dependent on the application and informativeness of available covariates. We present our findings as a practical guide and provide recommendations to aid researchers in their choice of methods to correct for false discoveries.",
keywords = "ChIP-seq, False discovery rate, Gene set analysis, GWAS, Microbiome, Multiple hypothesis testing, RNA-seq, ScRNA-seq",
author = "Keegan Korthauer and Kimes, {Patrick K.} and Claire Duvallet and Alejandro Reyes and Ayshwarya Subramanian and Mingxiang Teng and Chinmay Shukla and Alm, {Eric J.} and Stephanie Hicks",
year = "2019",
month = "6",
day = "4",
doi = "10.1186/s13059-019-1716-1",
language = "English (US)",
volume = "20",
journal = "Genome Biology",
issn = "1474-7596",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - A practical guide to methods controlling false discoveries in computational biology

AU - Korthauer, Keegan

AU - Kimes, Patrick K.

AU - Duvallet, Claire

AU - Reyes, Alejandro

AU - Subramanian, Ayshwarya

AU - Teng, Mingxiang

AU - Shukla, Chinmay

AU - Alm, Eric J.

AU - Hicks, Stephanie

PY - 2019/6/4

Y1 - 2019/6/4

N2 - Background: In high-throughput studies, hundreds to millions of hypotheses are typically tested. Statistical methods that control the false discovery rate (FDR) have emerged as popular and powerful tools for error rate control. While classic FDR methods use only p values as input, more modern FDR methods have been shown to increase power by incorporating complementary information as informative covariates to prioritize, weight, and group hypotheses. However, there is currently no consensus on how the modern methods compare to one another. We investigate the accuracy, applicability, and ease of use of two classic and six modern FDR-controlling methods by performing a systematic benchmark comparison using simulation studies as well as six case studies in computational biology. Results: Methods that incorporate informative covariates are modestly more powerful than classic approaches, and do not underperform classic approaches, even when the covariate is completely uninformative. The majority of methods are successful at controlling the FDR, with the exception of two modern methods under certain settings. Furthermore, we find that the improvement of the modern FDR methods over the classic methods increases with the informativeness of the covariate, total number of hypothesis tests, and proportion of truly non-null hypotheses. Conclusions: Modern FDR methods that use an informative covariate provide advantages over classic FDR-controlling procedures, with the relative gain dependent on the application and informativeness of available covariates. We present our findings as a practical guide and provide recommendations to aid researchers in their choice of methods to correct for false discoveries.

AB - Background: In high-throughput studies, hundreds to millions of hypotheses are typically tested. Statistical methods that control the false discovery rate (FDR) have emerged as popular and powerful tools for error rate control. While classic FDR methods use only p values as input, more modern FDR methods have been shown to increase power by incorporating complementary information as informative covariates to prioritize, weight, and group hypotheses. However, there is currently no consensus on how the modern methods compare to one another. We investigate the accuracy, applicability, and ease of use of two classic and six modern FDR-controlling methods by performing a systematic benchmark comparison using simulation studies as well as six case studies in computational biology. Results: Methods that incorporate informative covariates are modestly more powerful than classic approaches, and do not underperform classic approaches, even when the covariate is completely uninformative. The majority of methods are successful at controlling the FDR, with the exception of two modern methods under certain settings. Furthermore, we find that the improvement of the modern FDR methods over the classic methods increases with the informativeness of the covariate, total number of hypothesis tests, and proportion of truly non-null hypotheses. Conclusions: Modern FDR methods that use an informative covariate provide advantages over classic FDR-controlling procedures, with the relative gain dependent on the application and informativeness of available covariates. We present our findings as a practical guide and provide recommendations to aid researchers in their choice of methods to correct for false discoveries.

KW - ChIP-seq

KW - False discovery rate

KW - Gene set analysis

KW - GWAS

KW - Microbiome

KW - Multiple hypothesis testing

KW - RNA-seq

KW - ScRNA-seq

UR - http://www.scopus.com/inward/record.url?scp=85066850876&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85066850876&partnerID=8YFLogxK

U2 - 10.1186/s13059-019-1716-1

DO - 10.1186/s13059-019-1716-1

M3 - Article

C2 - 31164141

AN - SCOPUS:85066850876

VL - 20

JO - Genome Biology

JF - Genome Biology

SN - 1474-7596

IS - 1

M1 - 118

ER -