A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data

Bipolar Disorders Working Group of the Psychiatric Genomics Consortium

Research output: Contribution to journalArticle

Abstract

A common goal in data-analysis is to sift through a large data-matrix and detect any significant submatrices (i.e., biclusters) that have a low numerical rank. We present a simple algorithm for tackling this biclustering problem. Our algorithm accumulates information about 2-by-2 submatrices (i.e., ‘loops’) within the data-matrix, and focuses on rows and columns of the data-matrix that participate in an abundance of low-rank loops. We demonstrate, through analysis and numerical-experiments, that this loop-counting method performs well in a variety of scenarios, outperforming simple spectral methods in many situations of interest. Another important feature of our method is that it can easily be modified to account for aspects of experimental design which commonly arise in practice. For example, our algorithm can be modified to correct for controls, categorical- and continuous-covariates, as well as sparsity within the data. We demonstrate these practical features with two examples; the first drawn from gene-expression analysis and the second drawn from a much larger genome-wide-association-study (GWAS).

Original languageEnglish (US)
Article numbere1006105
JournalPLoS computational biology
Volume14
Issue number5
DOIs
StatePublished - May 1 2018
Externally publishedYes

Fingerprint

Biclustering
Genome-Wide Association Study
Gene expression
Gene Expression
gene expression
Covariates
Counting
Genome
genome
Genes
matrix
Gene Expression Analysis
Large Data
Accumulate
Spectral Methods
Experimental design
Sparsity
Categorical
Design of experiments
Demonstrate

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Modeling and Simulation
  • Ecology
  • Molecular Biology
  • Genetics
  • Cellular and Molecular Neuroscience
  • Computational Theory and Mathematics

Cite this

A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data. / Bipolar Disorders Working Group of the Psychiatric Genomics Consortium.

In: PLoS computational biology, Vol. 14, No. 5, e1006105, 01.05.2018.

Research output: Contribution to journalArticle

Bipolar Disorders Working Group of the Psychiatric Genomics Consortium. / A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data. In: PLoS computational biology. 2018 ; Vol. 14, No. 5.
@article{29dd836580c24041be127643522aa722,
title = "A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data",
abstract = "A common goal in data-analysis is to sift through a large data-matrix and detect any significant submatrices (i.e., biclusters) that have a low numerical rank. We present a simple algorithm for tackling this biclustering problem. Our algorithm accumulates information about 2-by-2 submatrices (i.e., ‘loops’) within the data-matrix, and focuses on rows and columns of the data-matrix that participate in an abundance of low-rank loops. We demonstrate, through analysis and numerical-experiments, that this loop-counting method performs well in a variety of scenarios, outperforming simple spectral methods in many situations of interest. Another important feature of our method is that it can easily be modified to account for aspects of experimental design which commonly arise in practice. For example, our algorithm can be modified to correct for controls, categorical- and continuous-covariates, as well as sparsity within the data. We demonstrate these practical features with two examples; the first drawn from gene-expression analysis and the second drawn from a much larger genome-wide-association-study (GWAS).",
author = "{Bipolar Disorders Working Group of the Psychiatric Genomics Consortium} and Rangan, {Aaditya V.} and McGrouther, {Caroline C.} and John Kelsoe and Nicholas Schork and Eli Stahl and Qian Zhu and Arjun Krishnan and Vicky Yao and Olga Troyanskaya and Seda Bilaloglu and Preeti Raghavan and Sarah Bergen and Anders Jureus and Mikael Landen",
year = "2018",
month = "5",
day = "1",
doi = "10.1371/journal.pcbi.1006105",
language = "English (US)",
volume = "14",
journal = "PLoS Computational Biology",
issn = "1553-734X",
publisher = "Public Library of Science",
number = "5",

}

TY - JOUR

T1 - A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data

AU - Bipolar Disorders Working Group of the Psychiatric Genomics Consortium

AU - Rangan, Aaditya V.

AU - McGrouther, Caroline C.

AU - Kelsoe, John

AU - Schork, Nicholas

AU - Stahl, Eli

AU - Zhu, Qian

AU - Krishnan, Arjun

AU - Yao, Vicky

AU - Troyanskaya, Olga

AU - Bilaloglu, Seda

AU - Raghavan, Preeti

AU - Bergen, Sarah

AU - Jureus, Anders

AU - Landen, Mikael

PY - 2018/5/1

Y1 - 2018/5/1

N2 - A common goal in data-analysis is to sift through a large data-matrix and detect any significant submatrices (i.e., biclusters) that have a low numerical rank. We present a simple algorithm for tackling this biclustering problem. Our algorithm accumulates information about 2-by-2 submatrices (i.e., ‘loops’) within the data-matrix, and focuses on rows and columns of the data-matrix that participate in an abundance of low-rank loops. We demonstrate, through analysis and numerical-experiments, that this loop-counting method performs well in a variety of scenarios, outperforming simple spectral methods in many situations of interest. Another important feature of our method is that it can easily be modified to account for aspects of experimental design which commonly arise in practice. For example, our algorithm can be modified to correct for controls, categorical- and continuous-covariates, as well as sparsity within the data. We demonstrate these practical features with two examples; the first drawn from gene-expression analysis and the second drawn from a much larger genome-wide-association-study (GWAS).

AB - A common goal in data-analysis is to sift through a large data-matrix and detect any significant submatrices (i.e., biclusters) that have a low numerical rank. We present a simple algorithm for tackling this biclustering problem. Our algorithm accumulates information about 2-by-2 submatrices (i.e., ‘loops’) within the data-matrix, and focuses on rows and columns of the data-matrix that participate in an abundance of low-rank loops. We demonstrate, through analysis and numerical-experiments, that this loop-counting method performs well in a variety of scenarios, outperforming simple spectral methods in many situations of interest. Another important feature of our method is that it can easily be modified to account for aspects of experimental design which commonly arise in practice. For example, our algorithm can be modified to correct for controls, categorical- and continuous-covariates, as well as sparsity within the data. We demonstrate these practical features with two examples; the first drawn from gene-expression analysis and the second drawn from a much larger genome-wide-association-study (GWAS).

UR - http://www.scopus.com/inward/record.url?scp=85048172661&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85048172661&partnerID=8YFLogxK

U2 - 10.1371/journal.pcbi.1006105

DO - 10.1371/journal.pcbi.1006105

M3 - Article

VL - 14

JO - PLoS Computational Biology

JF - PLoS Computational Biology

SN - 1553-734X

IS - 5

M1 - e1006105

ER -