Statistical inference in high-dimensional matrix models

Löffler, Matthias

Statistical inference in high-dimensional matrix models

Repository URI

https://www.repository.cam.ac.uk/handle/1810/298064

Repository DOI

https://doi.org/10.17863/CAM.45122

Files

Thesis file (1.43 MB)

Type

Thesis

Authors

Löffler, Matthias

Abstract

Matrix models are ubiquitous in modern statistics. For instance, they are used in finance to assess interdependence of assets, in genomics to impute missing data and in movie recommender systems to model the relationship between users and movie ratings. Typically such models are either high-dimensional, meaning that the number of parameters may exceed the number of data points by many orders of magnitudes, or nonparametric in the sense that the quantity of interest is an infinite dimensional operator. This leads to new algorithms and also to new theoretical phenomena that may occur when estimating a parameter of interest or functionals of it or when constructing confidence sets. In this thesis, we will exemplarily consider three such matrix models and develop statistical theory for them: Matrix completion, Principal Component Analysis (PCA) with Gaussian data and transition operators of Markov chains. \ \ We start with matrix completion and investigate the existence of adaptive confidence sets in the 'Bernoulli' and 'trace-regression' models. In the 'Bernoulli' model we show that adaptive confidence sets do not exist when the variance of the errors is unknown, whereas we give an explicit construction in the ’trace-regression’ model. Finally, in the known variance case, we show that adaptive confidence sets do also exist in the 'Bernoulli' model based on a testing argument. \ \ Next, we consider PCA in a Gaussian observation model with complexity measured by the effective rank, the reciprocal of the percentage of variance explained by the first principal component. We investigate estimation of linear functionals of eigenvectors and prove Berry-Essen type bounds. Due to the high-dimensionality of the problem we discover a new phenomenon: The plug-in estimator based on the sample eigenvector can have non-negligible bias and hence may be not $n$ -consistent anymore. We show how to de-bias this estimator, achieving $n$ -convergence rates, and prove exact matching minimax lower bounds. \ \ Finally, we consider nonparametric estimation of the transition operator of a Markov chain and its transition density. We assume that the singular values of the transition operator decay exponentially. For example, this assumption is fulfilled by discrete, low frequency observations of periodised, reversible stochastic differential equations. Using penalization techniques from low rank matrix estimation we develop a new algorithm and show improved convergence rates.

Date

2019-06-21

Advisors

Nickl, Richard

Keywords

High-dimensional Statistics, Low-rank inference, PCA

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights

Sponsorship

Financial support of ERC grant UQMSI/647812 and EPSRC grant EP/L016516/1

Collections

Theses - Pure Mathematics and Mathematical Statistics