Repository logo
 

Topological Data Analysis for Unsupervised Feature Selection in Large Scale Spatial Omics Data Sets

Published version
Peer-reviewed

Repository DOI


Change log

Abstract

Abstract

                Spatial transcriptomics studies are becoming increasingly large and commonplace, necessitating simultaneous analysis of a large number of spatially resolved variables. Correspondingly, a diverse range of methodologies have been proposed to compare the spatial expression structure of genes. Here, we apply persistent homology, a method from topological data analysis, to produce a continuous quantification of spatial structure in a given gene’s expression, and show how this can be used for downstream tasks such as spatially variable gene identification. We explore the unique advantages of topology for this task, deriving biologically meaningful insights into kidney disease and myocardial infarction using public spatial transcriptomics data. We also show how the non-parametric nature of homology enables our methodology to extend naturally to other spatial omics modalities, demonstrating this on a spatial metabolomics sample. Our work showcases the advantages of using a continuous quantification of spatial structure over
                p
                -value based approaches to SVG identification, the potential for developing unified methods for the analysis of different spatial omics modalities, and the utility of persistent homology in big data applications.

Description

Acknowledgements: We would like to thank those who worked to generate, organise and share the publicly available spatial transcriptomics datasets analysed in this paper.

Journal Title

Bulletin of Mathematical Biology

Conference Name

Journal ISSN

0092-8240
1522-9602

Volume Title

88

Publisher

Springer Science and Business Media LLC

Rights and licensing

Except where otherwised noted, this item's license is described as http://creativecommons.org/licenses/by/4.0/
Sponsorship
Medical Research Council (G117871)