Topological and geometric inference of data
The overarching problem under consideration is to determine the structure of the subspace on which a distribution is supported, given only a finite noisy sample thereof. The special case in which the subspace is an embedded manifold is given particular attention owing to its conceptual elegance, and asymptotic bounds are obtained on the admissible level of noise such that the manifold can be recovered up to homotopy equivalence.
Attention is turned on how to accomplish this in practice. Following ideas from topological data analysis, simplicial complexes are used as discrete analogues of spaces suitable for computation. By utilising the prior assumption that the data lie on a manifold, topologically inspired techniques are proposed for refining the simplicial complex to better approximate this manifold. This is applied to the problem of nonlinear dimensionality reduction and found to improve accuracy of reconstructing several synthetic and real-world datasets.
The second chapter focuses on extending this work to the case where the ambient space is non-Euclidean. The interfaces between topological data analysis, functional data analysis, and shape analysis are thoroughly explored. Lipschitz bounds are proved which relate several metrics on the space of positive semidefinite matrices; they are then interpreted in the context of topological data analysis. This is applied to diffusion tensor imaging and phonology.
The final chapter explores the case where the points are non-uniformly distributed over the embedded subspace. In particular, a method is proposed to overcome the shortcomings of witness complex construction when there are large deviations in the density. The theory of multidimensional persistence is leveraged to provide a succinct setting in which the structure of the data can be interpreted as a generalised stratified space.