Modern Methods for Variable Significance Testing
Repository URI
Repository DOI
Change log
Authors
Abstract
This thesis concerns the ubiquitous statistical problem of variable significance testing. The first chapter contains an account of classical approaches to variable significance testing including different perspectives on how to formalise the notion of `variable significance'. The historical development is contrasted with more recent methods that are adapted to both the scale of modern datasets but also the power of advanced machine learning techniques. This chapter also includes a description of and motivation for the theoretical framework that permeates the rest of the thesis: providing theoretical guarantees that hold uniformly over large classes of distributions.
The second chapter deals with testing the null that Y ⊥ X | Z where X and Y take values in separable Hilbert spaces with a focus on applications to functional data. The first main result of the chapter shows that for functional data it is impossible to construct a non-trivial test for conditional independence even when assuming that the data are jointly Gaussian. A novel regression-based test, called the Generalised Hilbertian Covariance Measure (GHCM), is presented and theoretical guarantees for uniform asymptotic Type I error control are provided with the key assumption requiring that the product of the mean squared errors of regressing Y on Z and X on Z converges faster than n
The third and final chapter analyses the problem of nonparametric variable significance testing by testing for conditional mean independence, that is, testing the null that E(Y | X, Z) = E(Y | Z) for real-valued Y. A test, called the Projected Covariance Measure (PCM), is derived by considering a family of studentised test statistics and choosing a member of this family in a data-driven way that balances robustness and power properties of the resulting test. The test is regression-based and is computed by splitting a set of observations of (X, Y, Z) into two sets of equal size, where one half is used to learn a projection of Y onto X and Z (nonparametrically) and the second half is used to test for vanishing expected conditional correlation given Z between the projection and Y. The chapter contains general conditions that ensure uniform asymptotic Type I control of the resulting test by imposing conditions on the mean-squared error of the involved regressions. A modification of the PCM using additional sample splitting and employing spline regression is shown to achieve the minimax optimal separation rate between null and alternative under Hölder smoothness assumptions on the regression functions and the conditional density of X given Z=z. The chapter also shows through simulation studies that the test maintains the strong type I error control of methods like the Generalised Covariance Measure (GCM) but has power against a broader class of alternatives.
Description
Date
Advisors
Shah, Rajen