ProjectR: An R/Bioconductor package for transfer learning via PCA, NMF, correlation and clustering

Gaurav Sharma; Carlo Colantuoni; Loyal A. Goff; Elana J. Fertig; Genevieve Stein-O'Brien

doi:10.1093/bioinformatics/btaa183

ProjectR: An R/Bioconductor package for transfer learning via PCA, NMF, correlation and clustering

Gaurav Sharma, Carlo Colantuoni, Loyal A. Goff, Elana J. Fertig, Genevieve Stein-O'Brien

School of Medicine

Research output: Contribution to journal › Article › peer-review

5 Scopus citations

Abstract

Motivation: Dimension reduction techniques are widely used to interpret high-dimensional biological data. Features learned from these methods are used to discover both technical artifacts and novel biological phenomena. Such feature discovery is critically importent in analysis of large single-cell datasets, where lack of a ground truth limits validation and interpretation. Transfer learning (TL) can be used to relate the features learned from one source dataset to a new target dataset to perform biologically driven validation by evaluating their use in or association with additional sample annotations in that independent target dataset. Results: We developed an R/Bioconductor package, projectR, to perform TL for analyses of genomics data via TL of clustering, correlation and factorization methods. We then demonstrate the utility TL for integrated data analysis with an example for spatial single-cell analysis.

Original language	English (US)
Pages (from-to)	3592-3593
Number of pages	2
Journal	Bioinformatics
Volume	36
Issue number	11
DOIs	https://doi.org/10.1093/bioinformatics/btaa183
State	Published - Jun 1 2020

ASJC Scopus subject areas

Statistics and Probability
Biochemistry
Molecular Biology
Computer Science Applications
Computational Theory and Mathematics
Computational Mathematics

Access to Document

10.1093/bioinformatics/btaa183

Cite this

@article{f01abc44d28245d7a99b48f2f1b157b6,

title = "ProjectR: An R/Bioconductor package for transfer learning via PCA, NMF, correlation and clustering",

abstract = "Motivation: Dimension reduction techniques are widely used to interpret high-dimensional biological data. Features learned from these methods are used to discover both technical artifacts and novel biological phenomena. Such feature discovery is critically importent in analysis of large single-cell datasets, where lack of a ground truth limits validation and interpretation. Transfer learning (TL) can be used to relate the features learned from one source dataset to a new target dataset to perform biologically driven validation by evaluating their use in or association with additional sample annotations in that independent target dataset. Results: We developed an R/Bioconductor package, projectR, to perform TL for analyses of genomics data via TL of clustering, correlation and factorization methods. We then demonstrate the utility TL for integrated data analysis with an example for spatial single-cell analysis.",

author = "Gaurav Sharma and Carlo Colantuoni and Goff, {Loyal A.} and Fertig, {Elana J.} and Genevieve Stein-O'Brien",

year = "2020",

month = jun,

day = "1",

doi = "10.1093/bioinformatics/btaa183",

language = "English (US)",

volume = "36",

pages = "3592--3593",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "11",

}

TY - JOUR

T1 - ProjectR

T2 - An R/Bioconductor package for transfer learning via PCA, NMF, correlation and clustering

AU - Sharma, Gaurav

AU - Colantuoni, Carlo

AU - Goff, Loyal A.

AU - Fertig, Elana J.

AU - Stein-O'Brien, Genevieve

PY - 2020/6/1

Y1 - 2020/6/1

N2 - Motivation: Dimension reduction techniques are widely used to interpret high-dimensional biological data. Features learned from these methods are used to discover both technical artifacts and novel biological phenomena. Such feature discovery is critically importent in analysis of large single-cell datasets, where lack of a ground truth limits validation and interpretation. Transfer learning (TL) can be used to relate the features learned from one source dataset to a new target dataset to perform biologically driven validation by evaluating their use in or association with additional sample annotations in that independent target dataset. Results: We developed an R/Bioconductor package, projectR, to perform TL for analyses of genomics data via TL of clustering, correlation and factorization methods. We then demonstrate the utility TL for integrated data analysis with an example for spatial single-cell analysis.

AB - Motivation: Dimension reduction techniques are widely used to interpret high-dimensional biological data. Features learned from these methods are used to discover both technical artifacts and novel biological phenomena. Such feature discovery is critically importent in analysis of large single-cell datasets, where lack of a ground truth limits validation and interpretation. Transfer learning (TL) can be used to relate the features learned from one source dataset to a new target dataset to perform biologically driven validation by evaluating their use in or association with additional sample annotations in that independent target dataset. Results: We developed an R/Bioconductor package, projectR, to perform TL for analyses of genomics data via TL of clustering, correlation and factorization methods. We then demonstrate the utility TL for integrated data analysis with an example for spatial single-cell analysis.

UR - http://www.scopus.com/inward/record.url?scp=85084129101&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85084129101&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btaa183

DO - 10.1093/bioinformatics/btaa183

M3 - Article

C2 - 32167521

AN - SCOPUS:85084129101

SN - 1367-4803

VL - 36

SP - 3592

EP - 3593

JO - Bioinformatics

JF - Bioinformatics

IS - 11

ER -

ProjectR: An R/Bioconductor package for transfer learning via PCA, NMF, correlation and clustering

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this