Repository logo

A data-adaptive method for investigating effect heterogeneity with high-dimensional covariates in Mendelian randomization

Published version

Repository DOI

Change log


Tian, Haodong 
Tom, Brian DM 
Burgess, Stephen 


jats:titleAbstract</jats:title>jats:sec jats:titleBackground</jats:title> jats:pMendelian randomization is a popular method for causal inference with observational data that uses genetic variants as instrumental variables. Similarly to a randomized trial, a standard Mendelian randomization analysis estimates the population-averaged effect of an exposure on an outcome. Dividing the population into subgroups can reveal effect heterogeneity to inform who would most benefit from intervention on the exposure. However, as covariates are measured post-“randomization”, naive stratification typically induces collider bias in stratum-specific estimates.</jats:p> </jats:sec>jats:sec jats:titleMethod</jats:title> jats:pWe extend a previously proposed stratification method (the “doubly-ranked method”) to form strata based on a single covariate, and introduce a data-adaptive random forest method to calculate stratum-specific estimates that are robust to collider bias based on a high-dimensional covariate set. We also propose measures based on the Q statistic to assess heterogeneity between stratum-specific estimates (to understand whether estimates are more variable than expected due to chance alone) and variable importance (to identify the key drivers of effect heterogeneity).</jats:p> </jats:sec>jats:sec jats:titleResult</jats:title> jats:pWe show that the effect of body mass index (BMI) on lung function is heterogeneous, depending most strongly on hip circumference and weight. While for most individuals, the predicted effect of increasing BMI on lung function is negative, it is positive for some individuals and strongly negative for others.</jats:p> </jats:sec>jats:sec jats:titleConclusion</jats:title> jats:pOur data-adaptive approach allows for the exploration of effect heterogeneity in the relationship between an exposure and an outcome within a Mendelian randomization framework. This can yield valuable insights into disease aetiology and help identify specific groups of individuals who would derive the greatest benefit from targeted interventions on the exposure.</jats:p> </jats:sec>


Acknowledgements: For the purpose of open access, the author has applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising from this submission.


Genetics, Heterogenous effect, Stratification, Instrumental variable, Random forest, Variable importance

Journal Title

BMC Medical Research Methodology

Conference Name

Journal ISSN


Volume Title



Springer Science and Business Media LLC
Medical Research Council (MC UU 00002/2)
Wellcome Trust and the Royal Society (204623/Z/16/Z)