SC3s: efficient scaling of single cell consensus clustering to millions of cells.
Published version
Peer-reviewed
Repository URI
Repository DOI
Type
Article
Change log
Authors
Quah, Fu Xiang https://orcid.org/0000-0002-4616-5324
Hemberg, Martin https://orcid.org/0000-0001-8895-5239
Abstract
BACKGROUND: Today it is possible to profile the transcriptome of individual cells, and a key step in the analysis of these datasets is unsupervised clustering. For very large datasets, efficient algorithms are required to ensure that analyses can be conducted with reasonable time and memory requirements. RESULTS: Here, we present a highly efficient k-means based approach, and we demonstrate that it scales favorably with the number of cells with regards to time and memory. CONCLUSIONS: We have demonstrated that our streaming k-means clustering algorithm gives state-of-the-art performance while resource requirements scale favorably for up to 2 million cells.
Description
Keywords
Software, k-Means clustering, Streaming clustering, scRNAseq
Journal Title
BMC Bioinformatics
Conference Name
Journal ISSN
1471-2105
1471-2105
1471-2105
Volume Title
Publisher
Springer Science and Business Media LLC