Statistical issues in Mendelian randomization: use of genetic instrumental variables for assessing causal associations

Burgess, S

doi:10.17863/CAM.16219

Statistical issues in Mendelian randomization: use of genetic instrumental variables for assessing causal associations

Repository URI

http://www.dspace.cam.ac.uk/handle/1810/242184
https://www.repository.cam.ac.uk/handle/1810/242184

Repository DOI

https://doi.org/10.17863/CAM.16219

Files

thesis.pdf (7.2 MB)

Type

Thesis

Authors

Burgess, Stephen

https://orcid.org/0000-0001-5365-8760

Abstract

Mendelian randomization is an epidemiological method for using genetic variation to estimate the causal effect of the change in a modifiable phenotype on an outcome from observational data. A genetic variant satisfying the assumptions of an instrumental variable for the phenotype of interest can be used to divide a population into subgroups which differ systematically only in the phenotype. This gives a causal estimate which is asymptotically free of bias from confounding and reverse causation. However, the variance of the causal estimate is large compared to traditional regression methods, requiring large amounts of data and necessitating methods for efficient data synthesis. Additionally, if the association between the genetic variant and the phenotype is not strong, then the causal estimates will be biased due to the “weak instrument” in finite samples in the direction of the observational association. This bias may convince a researcher that an observed association is causal. If the causal parameter estimated is an odds ratio, then the parameter of association will differ depending on whether viewed as a population-averaged causal effect or a personal causal effect conditional on covariates. We introduce a Bayesian framework for instrumental variable analysis, which is less susceptible to weak instrument bias than traditional two-stage methods, has correct coverage with weak instruments, and is able to efficiently combine gene–phenotype–outcome data from multiple heterogeneous sources. Methods for imputing missing genetic data are developed, allowing multiple genetic variants to be used without reduction in sample size. We focus on the question of a binary outcome, illustrating how the collapsing of the odds ratio over heterogeneous strata in the population means that the two-stage and the Bayesian methods estimate a population-averaged marginal causal effect similar to that estimated by a randomized trial, but which typically differs from the conditional effect estimated by standard regression methods. We show how these methods can be adjusted to give an estimate closer to the conditional effect. We apply the methods and techniques discussed to data on the causal effect of C-reactive protein on fibrinogen and coronary heart disease, concluding with an overall estimate of causal association based on the totality of available data from 42 studies.

Keywords

Causal inference, Instrumental variables, Mendelian randomization, Bayesian methods, Meta-analysis, Missing data, Non-collapsibility

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights and licensing

Except where otherwised noted, this item's license is described as Attribution-NonCommercial-NoDerivs 2.0 UK: England & Wales

Sponsorship

This work was supported by the U.K. Medical Research Council [grant number U.1052.00.001].

Collections

Theses - Pure Mathematics and Mathematical Statistics

Statistical issues in Mendelian randomization: use of genetic instrumental variables for assessing causal associations

Repository URI

Repository DOI

Files

Type

Change log

Authors

Abstract

Description

Date

Advisors

Keywords

Qualification

Awarding Institution

Rights and licensing

Sponsorship

Collections