TY - JOUR
T1 - Co-expression analysis is biased by a mean-correlation relationship
AU - Wang, Yi
AU - Hicks, Stephanie C.
AU - Hansen, Kasper D.
N1 - Publisher Copyright:
The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2020/2/13
Y1 - 2020/2/13
N2 - Estimates of correlation between pairs of genes in coexpression analysis are commonly used to construct networks among genes using gene expression data. Here, we show that the distribution of such correlations depend on the expression level of the involved genes, which we refer to this as a mean-correlation relationship in RNA-seq data, both bulk and single-cell. This dependence introduces a bias in co-expression analysis whereby highly expressed genes are more likely to be highly correlated. Such a relationship is not observed in protein-protein interaction data, suggesting that it is not reflecting biology. Ignoring this bias can lead to missing potentially biologically relevant pairs of genes that are lowly expressed, such as transcription factors. To address this problem, we introduce spatial quantile normalization (SpQN), a method for normalizing local distributions in a correlation matrix. We show that spatial quantile normalization removes the mean-correlation relationship and corrects the expression bias in network reconstruction.
AB - Estimates of correlation between pairs of genes in coexpression analysis are commonly used to construct networks among genes using gene expression data. Here, we show that the distribution of such correlations depend on the expression level of the involved genes, which we refer to this as a mean-correlation relationship in RNA-seq data, both bulk and single-cell. This dependence introduces a bias in co-expression analysis whereby highly expressed genes are more likely to be highly correlated. Such a relationship is not observed in protein-protein interaction data, suggesting that it is not reflecting biology. Ignoring this bias can lead to missing potentially biologically relevant pairs of genes that are lowly expressed, such as transcription factors. To address this problem, we introduce spatial quantile normalization (SpQN), a method for normalizing local distributions in a correlation matrix. We show that spatial quantile normalization removes the mean-correlation relationship and corrects the expression bias in network reconstruction.
UR - http://www.scopus.com/inward/record.url?scp=85098903061&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85098903061&partnerID=8YFLogxK
U2 - 10.1101/2020.02.13.944777
DO - 10.1101/2020.02.13.944777
M3 - Article
AN - SCOPUS:85098903061
JO - Advances in Water Resources
JF - Advances in Water Resources
SN - 0309-1708
ER -