Repository logo
 

STATISTICAL AND COMPUTATIONAL TRADE-OFFS IN ESTIMATION OF SPARSE PRINCIPAL COMPONENTS


No Thumbnail Available

Type

Article

Change log

Authors

Wang, Tengyao 
Berthet, Quentin 
Samworth, Richard J 

Abstract

In recent years, sparse principal component analysis has emerged as an extremely popular dimension reduction technique for high-dimensional data. The theoretical challenge, in the simplest case, is to estimate the leading eigenvector of a population covariance matrix under the assumption that this eigenvector is sparse. An impressive range of estimators have been proposed; some of these are fast to compute, while others are known to achieve the minimax optimal rate over certain Gaussian or sub-Gaussian classes. In this paper, we show that, under a widely-believed assumption from computational complexity theory, there is a fundamental trade-off between statistical and computational performance in this problem. More precisely, working with new, larger classes satisfying a restricted covariance concentration condition, we show that there is an effective sample size regime in which no randomised polynomial time algorithm can achieve the minimax optimal rate. We also study the theoretical performance of a (polynomial time) variant of the well-known semidefinite relaxation estimator, revealing a subtle interplay between statistical and computational efficiency.

Description

Keywords

Computational lower bounds, planted clique problem, polynomial time algorithm, sparse principal component analysis

Journal Title

ANNALS OF STATISTICS

Conference Name

Journal ISSN

0090-5364

Volume Title

44

Publisher

Institute of Mathematical Statistics
Sponsorship
Engineering and Physical Sciences Research Council (EP/J017213/1)
Leverhulme Trust (PLP-2014-353)
The first author is supported by a Benefactors’ scholarship from St John’s College, Cambridge. The second author was supported by Air Force of Scientific Research (AFOSR) grant FA9550-14-1-0098 at the Center for the Mathematics of Information at the California Institute of Technology. The third author is supported by Engineering and Physical Sciences Research Council Early Career Fellowship EP/J017213/1.