High-dimensional regression with potential prior information on variable importance
View / Open Files
Publication Date
2022-06Journal Title
Statistics and Computing
ISSN
0960-3174
Publisher
Springer Science and Business Media LLC
Type
Article
This Version
VoR
Metadata
Show full item recordCitation
Stokell, B., & Shah, R. (2022). High-dimensional regression with potential prior information on variable
importance. Statistics and Computing https://doi.org/10.1007/s11222-022-10110-5
Abstract
There are a variety of settings where vague prior information may be
available on the importance of predictors in high-dimensional regression
settings. Examples include ordering on the variables offered by their empirical
variances (which is typically discarded through standardisation), the lag of
predictors when fitting autoregressive models in time series settings, or the
level of missingness of the variables. Whilst such orderings may not match the
true importance of variables, we argue that there is little to be lost, and
potentially much to be gained, by using them. We propose a simple scheme
involving fitting a sequence of models indicated by the ordering. We show that
the computational cost for fitting all models when ridge regression is used is
no more than for a single fit of ridge regression, and describe a strategy for
Lasso regression that makes use of previous fits to greatly speed up fitting
the entire sequence of models. We propose to select a final estimator by
cross-validation and provide a general result on the quality of the best
performing estimator on a test set selected from among a number $M$ of
competing estimators in a high-dimensional linear regression setting. Our
result requires no sparsity assumptions and shows that only a $\log M$ price is
incurred compared to the unknown best estimator. We demonstrate the
effectiveness of our approach when applied to missing or corrupted data, and
time series settings. An R package is available on github.
Keywords
stat.ME, stat.ME, stat.CO, stat.ML, 62J07
Sponsorship
Engineering and Physical Sciences Research Council (EP/N031938/1)
Identifiers
External DOI: https://doi.org/10.1007/s11222-022-10110-5
This record's URL: https://www.repository.cam.ac.uk/handle/1810/337325
Statistics
Total file downloads (since January 2020). For more information on metrics see the
IRUS guide.
Recommended or similar items
The current recommendation prototype on the Apollo Repository will be turned off on 03 February 2023. Although the pilot has been fruitful for both parties, the service provider IKVA is focusing on horizon scanning products and so the recommender service can no longer be supported. We recognise the importance of recommender services in supporting research discovery and are evaluating offerings from other service providers. If you would like to offer feedback on this decision please contact us on: support@repository.cam.ac.uk