High-dimensional regression with potential prior information on variable importance
Authors
Change log
Abstract
There are a variety of settings where vague prior information may be
available on the importance of predictors in high-dimensional regression
settings. Examples include ordering on the variables offered by their empirical
variances (which is typically discarded through standardisation), the lag of
predictors when fitting autoregressive models in time series settings, or the
level of missingness of the variables. Whilst such orderings may not match the
true importance of variables, we argue that there is little to be lost, and
potentially much to be gained, by using them. We propose a simple scheme
involving fitting a sequence of models indicated by the ordering. We show that
the computational cost for fitting all models when ridge regression is used is
no more than for a single fit of ridge regression, and describe a strategy for
Lasso regression that makes use of previous fits to greatly speed up fitting
the entire sequence of models. We propose to select a final estimator by
cross-validation and provide a general result on the quality of the best
performing estimator on a test set selected from among a number
Publication Date
Online Publication Date
Acceptance Date
Keywords
Journal Title
Journal ISSN
1573-1375