Repository logo
 

Optimal estimation in high-dimensional and nonparametric models


Type

Thesis

Change log

Authors

Baldin, Nikolay 

Abstract

Minimax optimality is a key property of an estimation procedure in statistical modelling. This thesis looks at several problems in high-dimensional and nonparametric statistics and proposes novel estimation procedures. It then provides statistical guarantees on the performance of these methods and establishes whether those are computationally tractable.

In the first chapter, a new estimator for the volume of a convex set is proposed. The estimator is minimax optimal and also efficient non-asymptotically: it is nearly unbiased with minimal variance among all unbiased oracle-type estimators. Our approach is based on a Poisson point process model and as an ingredient, we prove that the convex hull is a sufficient and complete statistic. No hypotheses on the boundary of the convex set are imposed. In a numerical study, we show that the estimator outperforms earlier estimators for the volume. In addition, an improved set estimator for the convex body itself is proposed.

The second chapter extends the results of the first chapter and develops a unified framework for estimating the volume of a set in Rd based on observations of points uniformly distributed over the set. The framework applies to all classes of sets satisfying one simple axiom: a class is assumed to be intersection stable. No further hypotheses on the boundary of the set are imposed; in particular, the class of convex sets and the class of weakly-convex sets are covered by the framework. We introduce the so-called wrapping hull, a generalization of the convex hull, and prove that it is a sufficient and complete statistic. The proposed estimator of the volume is simply the volume of the wrapping hull scaled with an appropriate factor. It is shown to be consistent for all classes of sets satisfying the axiom and mimics an unbiased estimator with uniformly minimal variance. The construction and proofs hinge upon an interplay between probabilistic and geometric arguments. The tractability of the framework is numerically confirmed in a variety of examples.

The third chapter considers the problem of link prediction, based on partial observation of a large network, and on side information associated to its vertices. The generative model is formulated as a matrix logistic regression. The performance of the model is analysed in a high-dimensional regime under a structural assumption. The minimax rate for the Frobenius-norm risk is established and a combinatorial estimator based on the penalised maximum likelihood approach is shown to achieve it. Furthermore, it is shown that this rate cannot be attained by any (randomised) polynomial-time algorithm under a computational complexity assumption.

The trade-off between computational efficiency and statistical optimality is discussed throughout the thesis. For estimating the volume of a set from the class of convex or weakly-convex sets in high dimensions, we propose minimax optimal estimators in the first and second chapters. However, they cannot be computed using a polynomial-time algorithm in dimensions higher than three. Analogously, the proposed minimax optimal estimator for a prediction task in the matrix logistic regression problem in the third chapter cannot be computed in polynomial time. The third chapter further identifies a computational lower bound in the regression problem, thereby revealing the gap between the best possible rate of convergence of a polynomial-time algorithm and the minimax optimal rate.

Description

Date

2018-11-04

Advisors

Berthet, Quentin

Keywords

Nonparametric statistics, Computational lower bounds, Logistic regression, Link prediction, Volume estimation, Convex hull, UMVU, Stopping set, Exact oracle inequality

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge
Sponsorship
I gratefully acknowledge the funding sources that made my PhD work possible. I was funded by the European Research Council (ERC) Grant No. 647812.