TY - JOUR
T1 - Universal prediction of cell-cycle position using transfer learning
AU - Zheng, Shijie C.
AU - Stein-O’Brien, Genevieve
AU - Augustin, Jonathan J.
AU - Slosberg, Jared
AU - Carosso, Giovanni A.
AU - Winer, Briana
AU - Shin, Gloria
AU - Bjornsson, Hans T.
AU - Goff, Loyal A.
AU - Hansen, Kasper D.
N1 - Funding Information:
This project has been made possible in part by grant number CZF2019-002443 from the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation. Research reported in this publication was supported by the National Institute of General Medical Sciences of the National Institutes of Health under award R01GM121459. This work was additionally supported by awards from the National Science Foundation (IOS-1665692), the National Institute of Aging (R01AG066768), and the Maryland Stem Cell Research Foundation (2016-MSCRFI-2805). GSO is supported by postdoctoral fellowship awards from the Kavli Neurodiscovery Institute, the Johns Hopkins Provost Award Program, and the BRAIN Initiative in partnership with the National Institute of Neurological Disorders (K99NS122085).
Publisher Copyright:
© 2021, The Author(s).
PY - 2022/12
Y1 - 2022/12
N2 - Background: The cell cycle is a highly conserved, continuous process which controls faithful replication and division of cells. Single-cell technologies have enabled increasingly precise measurements of the cell cycle both as a biological process of interest and as a possible confounding factor. Despite its importance and conservation, there is no universally applicable approach to infer position in the cell cycle with high-resolution from single-cell RNA-seq data. Results: Here, we present tricycle, an R/Bioconductor package, to address this challenge by leveraging key features of the biology of the cell cycle, the mathematical properties of principal component analysis of periodic functions, and the use of transfer learning. We estimate a cell-cycle embedding using a fixed reference dataset and project new data into this reference embedding, an approach that overcomes key limitations of learning a dataset-dependent embedding. Tricycle then predicts a cell-specific position in the cell cycle based on the data projection. The accuracy of tricycle compares favorably to gold-standard experimental assays, which generally require specialized measurements in specifically constructed in vitro systems. Using internal controls which are available for any dataset, we show that tricycle predictions generalize to datasets with multiple cell types, across tissues, species, and even sequencing assays. Conclusions: Tricycle generalizes across datasets and is highly scalable and applicable to atlas-level single-cell RNA-seq data.
AB - Background: The cell cycle is a highly conserved, continuous process which controls faithful replication and division of cells. Single-cell technologies have enabled increasingly precise measurements of the cell cycle both as a biological process of interest and as a possible confounding factor. Despite its importance and conservation, there is no universally applicable approach to infer position in the cell cycle with high-resolution from single-cell RNA-seq data. Results: Here, we present tricycle, an R/Bioconductor package, to address this challenge by leveraging key features of the biology of the cell cycle, the mathematical properties of principal component analysis of periodic functions, and the use of transfer learning. We estimate a cell-cycle embedding using a fixed reference dataset and project new data into this reference embedding, an approach that overcomes key limitations of learning a dataset-dependent embedding. Tricycle then predicts a cell-specific position in the cell cycle based on the data projection. The accuracy of tricycle compares favorably to gold-standard experimental assays, which generally require specialized measurements in specifically constructed in vitro systems. Using internal controls which are available for any dataset, we show that tricycle predictions generalize to datasets with multiple cell types, across tissues, species, and even sequencing assays. Conclusions: Tricycle generalizes across datasets and is highly scalable and applicable to atlas-level single-cell RNA-seq data.
KW - Cell cycle
KW - Single-cell RNA-sequencing
KW - Transfer learning
UR - http://www.scopus.com/inward/record.url?scp=85123974426&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85123974426&partnerID=8YFLogxK
U2 - 10.1186/s13059-021-02581-y
DO - 10.1186/s13059-021-02581-y
M3 - Article
C2 - 35101061
AN - SCOPUS:85123974426
SN - 1474-7596
VL - 23
JO - Genome Biology
JF - Genome Biology
IS - 1
M1 - 41
ER -