TY - JOUR
T1 - ProjectR
T2 - An R/Bioconductor package for transfer learning via PCA, NMF, correlation, and clustering
AU - Sharma, Gaurav
AU - Colantuoni, Carlo
AU - Goff, Loyal A.
AU - Fertig, Elana J.
AU - Stein-O’Brien, Genevieve
N1 - Publisher Copyright:
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2019/8/6
Y1 - 2019/8/6
N2 - Motivation Dimension reduction techniques are widely used to interpret high-dimensional biological data. Features learned from these methods are used to discover both technical artifacts and novel biological phenomena. Such feature discovery is critically import to large single-cell datasets, where lack of a ground truth limits validation and interpretation. Transfer learning (TL) can be used to relate the features learned from one source dataset to a new target dataset to perform biologically-driven validation by evaluating their use in or association with additional sample annotations in that independent target dataset. Results We developed an R/Bioconductor package, projectR, to perform TL for analyses of genomics data via TL of clustering, correlation, and factorization methods. We then demonstrate the utility TL for integrated data analysis with an example for spatial single-cell analysis. Availability projectR is available on Bioconductor and at https://github.com/genesofeve/projectR. Contact gsteinobrien@jhmi.edu; ejfertig@jhmi.edu
AB - Motivation Dimension reduction techniques are widely used to interpret high-dimensional biological data. Features learned from these methods are used to discover both technical artifacts and novel biological phenomena. Such feature discovery is critically import to large single-cell datasets, where lack of a ground truth limits validation and interpretation. Transfer learning (TL) can be used to relate the features learned from one source dataset to a new target dataset to perform biologically-driven validation by evaluating their use in or association with additional sample annotations in that independent target dataset. Results We developed an R/Bioconductor package, projectR, to perform TL for analyses of genomics data via TL of clustering, correlation, and factorization methods. We then demonstrate the utility TL for integrated data analysis with an example for spatial single-cell analysis. Availability projectR is available on Bioconductor and at https://github.com/genesofeve/projectR. Contact gsteinobrien@jhmi.edu; ejfertig@jhmi.edu
UR - http://www.scopus.com/inward/record.url?scp=85095643038&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85095643038&partnerID=8YFLogxK
U2 - 10.1101/726547
DO - 10.1101/726547
M3 - Article
AN - SCOPUS:85095643038
JO - Advances in Water Resources
JF - Advances in Water Resources
SN - 0309-1708
ER -