Repository logo
 

VICatMix: variational Bayesian clustering and variable selection for discrete biomedical data

Accepted version
Peer-reviewed

Loading...
Thumbnail Image

Type

Article

Change log

Abstract

jats:titleAbstract</jats:title> jats:sec jats:title </jats:title> jats:pEffective clustering of biomedical data is crucial in precision medicine, enabling accurate stratification of patients or samples. However, the growth in availability of high-dimensional categorical data, including ’omics data, necessitates computationally efficient clustering algorithms. We present VICatMix, a variational Bayesian finite mixture model designed for the clustering of categorical data. The use of variational inference (VI) in its training allows the model to outperform competitors in terms of computational time and scalability, while maintaining high accuracy. VICatMix furthermore performs variable selection, enhancing its performance on high-dimensional, noisy data. The proposed model incorporates summarisation and model averaging to mitigate poor local optima in VI, allowing for improved estimation of the true number of clusters simultaneously with feature saliency. We demonstrate the performance of VICatMix with both simulated and real-world data, including applications to datasets from The Cancer Genome Atlas (TCGA), showing its use in cancer subtyping and driver gene discovery. We demonstrate VICatMix’s potential utility in integrative cluster analysis with different ’omics datasets, enabling the discovery of novel disease subtypes.</jats:p> </jats:sec> jats:sec jats:titleAvailability</jats:title> jats:pVICatMix is freely available as an R package via CRAN, incorporating C ++ for faster computation, at https://CRAN.R-project.org/package=VICatMix</jats:p> </jats:sec>

Description

Keywords

31 Biological Sciences, 3102 Bioinformatics and Computational Biology, 46 Information and Computing Sciences, 4905 Statistics, 49 Mathematical Sciences, 4611 Machine Learning, Genetics, Women's Health, Cancer, Bioengineering, Cancer Genomics, Precision Medicine, Networking and Information Technology R&D (NITRD), Human Genome, 2.5 Research design and methodologies (aetiology)

Journal Title

Bioinformatics Advances

Conference Name

Journal ISSN

2635-0041
2635-0041

Volume Title

Publisher

Oxford University Press (OUP)
Sponsorship
Medical Research Council (MR/S027602/1)