A systematic evaluation of single-cell RNA-sequencing imputation methods

Wenpin Hou; Zhicheng Ji; Hongkai Ji; Stephanie C. Hicks

doi:10.1186/s13059-020-02132-x

A systematic evaluation of single-cell RNA-sequencing imputation methods

Wenpin Hou, Zhicheng Ji, Hongkai Ji, Stephanie C. Hicks

Bloomberg School of Public Health

Research output: Contribution to journal › Article › peer-review

13 Scopus citations

Abstract

Background: The rapid development of single-cell RNA-sequencing (scRNA-seq) technologies has led to the emergence of many methods for removing systematic technical noises, including imputation methods, which aim to address the increased sparsity observed in single-cell data. Although many imputation methods have been developed, there is no consensus on how methods compare to each other. Results: Here, we perform a systematic evaluation of 18 scRNA-seq imputation methods to assess their accuracy and usability. We benchmark these methods in terms of the similarity between imputed cell profiles and bulk samples and whether these methods recover relevant biological signals or introduce spurious noise in downstream differential expression, unsupervised clustering, and pseudotemporal trajectory analyses, as well as their computational run time, memory usage, and scalability. Methods are evaluated using data from both cell lines and tissues and from both plate- and droplet-based single-cell platforms. Conclusions: We found that the majority of scRNA-seq imputation methods outperformed no imputation in recovering gene expression observed in bulk RNA-seq. However, the majority of the methods did not improve performance in downstream analyses compared to no imputation, in particular for clustering and trajectory analysis, and thus should be used with caution. In addition, we found substantial variability in the performance of the methods within each evaluation aspect. Overall, MAGIC, kNN-smoothing, and SAVER were found to outperform the other methods most consistently.

Original language	English (US)
Article number	218
Journal	Genome biology
Volume	21
Issue number	1
DOIs	https://doi.org/10.1186/s13059-020-02132-x
State	Published - Aug 27 2020

Keywords

Benchmark
Gene expression
Imputation
Single-cell RNA-sequencing

ASJC Scopus subject areas

Ecology, Evolution, Behavior and Systematics
Genetics
Cell Biology

Access to Document

10.1186/s13059-020-02132-x

Cite this

@article{76daec452a1946adb38b7fdaffa64d41,

title = "A systematic evaluation of single-cell RNA-sequencing imputation methods",

abstract = "Background: The rapid development of single-cell RNA-sequencing (scRNA-seq) technologies has led to the emergence of many methods for removing systematic technical noises, including imputation methods, which aim to address the increased sparsity observed in single-cell data. Although many imputation methods have been developed, there is no consensus on how methods compare to each other. Results: Here, we perform a systematic evaluation of 18 scRNA-seq imputation methods to assess their accuracy and usability. We benchmark these methods in terms of the similarity between imputed cell profiles and bulk samples and whether these methods recover relevant biological signals or introduce spurious noise in downstream differential expression, unsupervised clustering, and pseudotemporal trajectory analyses, as well as their computational run time, memory usage, and scalability. Methods are evaluated using data from both cell lines and tissues and from both plate- and droplet-based single-cell platforms. Conclusions: We found that the majority of scRNA-seq imputation methods outperformed no imputation in recovering gene expression observed in bulk RNA-seq. However, the majority of the methods did not improve performance in downstream analyses compared to no imputation, in particular for clustering and trajectory analysis, and thus should be used with caution. In addition, we found substantial variability in the performance of the methods within each evaluation aspect. Overall, MAGIC, kNN-smoothing, and SAVER were found to outperform the other methods most consistently.",

keywords = "Benchmark, Gene expression, Imputation, Single-cell RNA-sequencing",

author = "Wenpin Hou and Zhicheng Ji and Hongkai Ji and Hicks, {Stephanie C.}",

note = "Funding Information: This work is supported by the National Institutes of Health grants R01HG010889 and R01HG009518 to HJ and R00HG009007 to SCH. SCH is also supported by CZF2019-002443 from the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation. WH and SCH are supported by Alex{\textquoteright}s Lemonade Stand Foundation. Publisher Copyright: {\textcopyright} 2020 The Author(s).",

year = "2020",

month = aug,

day = "27",

doi = "10.1186/s13059-020-02132-x",

language = "English (US)",

volume = "21",

journal = "Genome biology",

issn = "1474-7596",

publisher = "BioMed Central",

number = "1",

}

TY - JOUR

T1 - A systematic evaluation of single-cell RNA-sequencing imputation methods

AU - Hou, Wenpin

AU - Ji, Zhicheng

AU - Ji, Hongkai

AU - Hicks, Stephanie C.

N1 - Funding Information: This work is supported by the National Institutes of Health grants R01HG010889 and R01HG009518 to HJ and R00HG009007 to SCH. SCH is also supported by CZF2019-002443 from the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation. WH and SCH are supported by Alex’s Lemonade Stand Foundation. Publisher Copyright: © 2020 The Author(s).

PY - 2020/8/27

Y1 - 2020/8/27

N2 - Background: The rapid development of single-cell RNA-sequencing (scRNA-seq) technologies has led to the emergence of many methods for removing systematic technical noises, including imputation methods, which aim to address the increased sparsity observed in single-cell data. Although many imputation methods have been developed, there is no consensus on how methods compare to each other. Results: Here, we perform a systematic evaluation of 18 scRNA-seq imputation methods to assess their accuracy and usability. We benchmark these methods in terms of the similarity between imputed cell profiles and bulk samples and whether these methods recover relevant biological signals or introduce spurious noise in downstream differential expression, unsupervised clustering, and pseudotemporal trajectory analyses, as well as their computational run time, memory usage, and scalability. Methods are evaluated using data from both cell lines and tissues and from both plate- and droplet-based single-cell platforms. Conclusions: We found that the majority of scRNA-seq imputation methods outperformed no imputation in recovering gene expression observed in bulk RNA-seq. However, the majority of the methods did not improve performance in downstream analyses compared to no imputation, in particular for clustering and trajectory analysis, and thus should be used with caution. In addition, we found substantial variability in the performance of the methods within each evaluation aspect. Overall, MAGIC, kNN-smoothing, and SAVER were found to outperform the other methods most consistently.

AB - Background: The rapid development of single-cell RNA-sequencing (scRNA-seq) technologies has led to the emergence of many methods for removing systematic technical noises, including imputation methods, which aim to address the increased sparsity observed in single-cell data. Although many imputation methods have been developed, there is no consensus on how methods compare to each other. Results: Here, we perform a systematic evaluation of 18 scRNA-seq imputation methods to assess their accuracy and usability. We benchmark these methods in terms of the similarity between imputed cell profiles and bulk samples and whether these methods recover relevant biological signals or introduce spurious noise in downstream differential expression, unsupervised clustering, and pseudotemporal trajectory analyses, as well as their computational run time, memory usage, and scalability. Methods are evaluated using data from both cell lines and tissues and from both plate- and droplet-based single-cell platforms. Conclusions: We found that the majority of scRNA-seq imputation methods outperformed no imputation in recovering gene expression observed in bulk RNA-seq. However, the majority of the methods did not improve performance in downstream analyses compared to no imputation, in particular for clustering and trajectory analysis, and thus should be used with caution. In addition, we found substantial variability in the performance of the methods within each evaluation aspect. Overall, MAGIC, kNN-smoothing, and SAVER were found to outperform the other methods most consistently.

KW - Benchmark

KW - Gene expression

KW - Imputation

KW - Single-cell RNA-sequencing

UR - http://www.scopus.com/inward/record.url?scp=85090012386&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85090012386&partnerID=8YFLogxK

U2 - 10.1186/s13059-020-02132-x

DO - 10.1186/s13059-020-02132-x

M3 - Article

C2 - 32854757

AN - SCOPUS:85090012386

SN - 1474-7596

VL - 21

JO - Genome biology

JF - Genome biology

IS - 1

M1 - 218

ER -

A systematic evaluation of single-cell RNA-sequencing imputation methods

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this