Motivation: Bayesian factorization methods, including Coordinated Gene Activity in Pattern Sets (CoGAPS), are emerging as powerful analysis tools for single cell data. However, these methods have greater computational costs than their gradient-based counterparts. These costs are often prohibitive for analysis of large single-cell datasets. Many such methods can be run in parallel which enables this limitation to be overcome by running on more powerful hardware. However, the constraints imposed by the prior distributions in CoGAPS limit the applicability of parallelization methods to enhance computational efficiency for single-cell analysis. Results: We upgraded CoGAPS in Version 3 to overcome the computational limitations of Bayesian matrix factorization for single cell data analysis. This software includes a new parallelization framework that is designed around the sequential updating steps of the algorithm to enhance computational efficiency. These algorithmic advances were coupled with new software architecture and sparse data structures to reduce the memory overhead for single-cell data. Altogether, these updates to CoGAPS enhance the efficiency of the algorithm so that it can analyze 1000 times more cells, enabling factorization of large single-cell data sets. Availability: CoGAPS is available as a Bioconductor package and the source code is provided at github.com/FertigLab/CoGAPS. All efficiency updates to enable single-cell analysis available as of version 3.2.
ASJC Scopus subject areas
- Biochemistry, Genetics and Molecular Biology(all)
- Agricultural and Biological Sciences(all)
- Immunology and Microbiology(all)
- Pharmacology, Toxicology and Pharmaceutics(all)