Repository logo

Selecting Invalid Instruments to Improve Mendelian Randomization with Two-Sample Summary Data.

Published version

Repository DOI



Change log


Patel, Ashish 
DiTraglia, Francis J 
Zuber, Verena 


Mendelian randomization (MR) is a widely-used method to estimate the causal relationship between a risk factor and disease. A fundamental part of any MR analysis is to choose appropriate genetic variants as instrumental variables. Genome-wide association studies often reveal that hundreds of genetic variants may be robustly associated with a risk factor, but in some situations investigators may have greater confidence in the instrument validity of only a smaller subset of variants. Nevertheless, the use of additional instruments may be optimal from the perspective of mean squared error even if they are slightly invalid; a small bias in estimation may be a price worth paying for a larger reduction in variance. For this purpose, we consider a method for "focused" instrument selection whereby genetic variants are selected to minimise the estimated asymptotic mean squared error of causal effect estimates. In a setting of many weak and locally invalid instruments, we propose a novel strategy to construct confidence intervals for post-selection focused estimators that guards against the worst case loss in asymptotic coverage. In empirical applications to: (i) validate lipid drug targets; and (ii) investigate vitamin D effects on a wide range of outcomes, our findings suggest that the optimal selection of instruments does not involve only a small number of biologically-justified instruments, but also many potentially invalid instruments.



Mendelian randomization, focused information criterion, post-selection inference

Journal Title

Ann Appl Stat

Conference Name

Journal ISSN


Volume Title



Institute of Mathematical Statistics
Wellcome Trust (225790/Z/22/Z)
Wellcome Trust (204623/Z/16/Z)
National Institute for Health and Care Research (IS-BRC-1215-20014)