Topological Data Analysis for Unsupervised Feature Selection in Large Scale Spatial Omics Data Sets
Published version
Peer-reviewed
Repository URI
Repository DOI
Loading...
Type
Change log
Abstract
Abstract
Spatial transcriptomics studies are becoming increasingly large and commonplace, necessitating simultaneous analysis of a large number of spatially resolved variables. Correspondingly, a diverse range of methodologies have been proposed to compare the spatial expression structure of genes. Here, we apply persistent homology, a method from topological data analysis, to produce a continuous quantification of spatial structure in a given gene’s expression, and show how this can be used for downstream tasks such as spatially variable gene identification. We explore the unique advantages of topology for this task, deriving biologically meaningful insights into kidney disease and myocardial infarction using public spatial transcriptomics data. We also show how the non-parametric nature of homology enables our methodology to extend naturally to other spatial omics modalities, demonstrating this on a spatial metabolomics sample. Our work showcases the advantages of using a continuous quantification of spatial structure over
p
-value based approaches to SVG identification, the potential for developing unified methods for the analysis of different spatial omics modalities, and the utility of persistent homology in big data applications.
Description
Acknowledgements: We would like to thank those who worked to generate, organise and share the publicly available spatial transcriptomics datasets analysed in this paper.
Journal Title
Bulletin of Mathematical Biology
Conference Name
Journal ISSN
0092-8240
1522-9602
1522-9602
Volume Title
88
Publisher
Springer Science and Business Media LLC
Publisher DOI
Rights and licensing
Except where otherwised noted, this item's license is described as http://creativecommons.org/licenses/by/4.0/
Sponsorship
Medical Research Council (G117871)

