Repository logo

Model Selection, Uniform Inference and Nonparametric Regression



Change log


De Boeck, Alexis 


Model selection in the nonparametric regression model is inevitable since any nonparametric estimator requires tuning parameters to be specified in order for it to be feasible. It is, however, standard practice to carry over the theory of nonparametric estimators when the model is fixed to the case where the tuning parameters are no longer fixed, but chosen by, possibly, data-driven model selection algorithms. This theory is not necessar ily valid as the model selection step is not taken into account. This thesis contributes to the nonparametric econometrics and statistics literature and, in particular, to the theory of series estimators, by showing that such estimators have desirable properties and that valid inference is possible even when a model-selection step precedes estimation. The first chapter is concerned with K-fold cross-validation and shows that the cross- validated least-squares estimator predicts the response equally well as the unfeasible best-linear predictor whose dimension may diverge with the sample size. This property, known as risk consistency, is uncommon in econometrics, but it has the benefit that it holds under few and very weak conditions. The risk-consistency result crucially re lies on the non-asymptotic analysis of the difference between the prediction error of the cross-validated estimator and the best-linear predictor. As the dimension of the parameters may diverge, this set-up analyses both the high-dimensional linear model as well as the nonparametric regression model which reduces the need for duplicate theories. An extensive Monte Carlo experiment corroborates the theoretical results by showing that the non-asymptotic bound becomes arbitrarily small as the sample size diverges. The second chapter returns to more classical statistics and econometrics by studying the uniform consistency of the series estimator for the conditional mean function and its linear functionals. The uniformity holds both in the support of the covariates as well as the models considered. Under high-level assumptions, a non-asymptotic linearisation result delivers uniform rates of convergence for the series estimator. By verifying the high-level assumptions, case-specific rates can easily be derived. For example, the series estimator attains, up to a small logarithmic penalty, the minimax rate of convergence for functions lying in a Hölder ball. The results from the second chapter form the basis for the inference procedure proposed in the final chapter in order to construct valid uniform confidence bands for the series estimator. The uniform confidence bands are valid in the sense that they control the asymptotic size for the conditional mean function, or its linear functionals, seen as a process in the covariates and the models considered. Given that the results hold uniformly over the models considered, the inference procedure is valid regardless of which model-selection algorithm delivers the final model used to estimate the parameters of interest. The key quantity is the maximal t-statistic correctly studentised using an estimator for the standard error. The theory relies on the uniform linearisation result from chapter two and the concept of strong approximations, or couplings, as the limit distribution of the maximal t-statistic does not exist. A Monte Carlo study establishes that the uniform confidence bands have the correct coverage even in finite samples. The chapter concludes with an application testing for shape restrictions on the demand function for gasoline in the US using a cross-validated series estimator.





Linton, Oliver




Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge
ESRC (1642244)