Optimal estimation in high-dimensional and nonparametric models
Repository URI
Repository DOI
Change log
Authors
Abstract
Minimax optimality is a key property of an estimation procedure in statistical modelling. This thesis looks at several problems in high-dimensional and nonparametric statistics and proposes novel estimation procedures. It then provides statistical guarantees on the performance of these methods and establishes whether those are computationally tractable.
In the first chapter, a new estimator for the volume of a convex set is proposed. The estimator is minimax optimal and also efficient non-asymptotically: it is nearly unbiased with minimal variance among all unbiased oracle-type estimators. Our approach is based on a Poisson point process model and as an ingredient, we prove that the convex hull is a sufficient and complete statistic. No hypotheses on the boundary of the convex set are imposed. In a numerical study, we show that the estimator outperforms earlier estimators for the volume. In addition, an improved set estimator for the convex body itself is proposed.
The second chapter extends the results of the first chapter and develops a unified framework for estimating the volume of a set in
The third chapter considers the problem of link prediction, based on partial observation of a large network, and on side information associated to its vertices. The generative model is formulated as a matrix logistic regression. The performance of the model is analysed in a high-dimensional regime under a structural assumption. The minimax rate for the Frobenius-norm risk is established and a combinatorial estimator based on the penalised maximum likelihood approach is shown to achieve it. Furthermore, it is shown that this rate cannot be attained by any (randomised) polynomial-time algorithm under a computational complexity assumption.
The trade-off between computational efficiency and statistical optimality is discussed throughout the thesis. For estimating the volume of a set from the class of convex or weakly-convex sets in high dimensions, we propose minimax optimal estimators in the first and second chapters. However, they cannot be computed using a polynomial-time algorithm in dimensions higher than three. Analogously, the proposed minimax optimal estimator for a prediction task in the matrix logistic regression problem in the third chapter cannot be computed in polynomial time. The third chapter further identifies a computational lower bound in the regression problem, thereby revealing the gap between the best possible rate of convergence of a polynomial-time algorithm and the minimax optimal rate.