Repository logo
 

Estimation of variance components, heritability and the ridge penalty in high-dimensional generalized linear models

Published version
Peer-reviewed

Type

Article

Change log

Authors

Veerman, JR 
Leday, GGR 
van de Wiel, MA 

Abstract

For high-dimensional linear regression models, we review and compare several estimators of variances τ2 and σ2 of the random slopes and errors, respectively. These variances relate directly to ridge regression penalty λ and heritability index h2, often used in genetics. Several estimators of these, either based on cross-validation (CV) or maximum marginal likelihood (MML), are also discussed. The comparisons include several cases of the high-dimensional covariate matrix such as multi-collinear covariates and data-derived ones. Moreover, we study robustness against model misspecifications such as sparse instead of dense effects and non-Gaussian errors. An example on weight gain data with genomic covariates confirms the good performance of MML compared to CV. Several extensions are presented. First, to the high-dimensional linear mixed effects model, with REML as an alternative to MML. Second, to the conjugate Bayesian setting, shown to be a good alternative. Third, and most prominently, to generalized linear models for which we derive a computationally efficient MML estimator by re-writing the marginal likelihood as an n-dimensional integral. For Poisson and Binomial ridge regression, we demonstrate the superior accuracy of the resulting MML estimator of λ as compared to CV. Software is provided to enable reproduction of all results.

Description

Keywords

Random effects, Ridge regression, Empirical Bayes, Marginal likelihood, Cross-validation

Journal Title

Communications in Statistics: Simulation and Computation

Conference Name

Journal ISSN

0361-0918
1532-4141

Volume Title

Publisher

Taylor & Francis
Sponsorship
MRC (via King's College London) (155108)
Gwenaël Leday was supported by the Medical Research Council, grant number MR/M004421.