Repository logo
 

Multivariate linear mixed models for statistical genetics


Type

Thesis

Change log

Authors

Casale, Francesco Paolo 

Abstract

In the last decade, genome-wide association studies have helped to advance our understanding of the genetic architecture of many important traits, including diseases. However, the statistical analysis of genotype-phenotype associations remains challenging due to multiple factors. First, many traits have polygenic architectures, which means that they are controlled by a large number of variants with small individual effects. Second, as increasingly deep phenotype data are being generated there is a need for multivariate analysis approaches to leverage multiple related phenotypes while retaining computational efficiency. Additionally, genetic analyses are confronted by strong confounding factors that can create spurious associations when not properly accounted for in the statistical model. We here derive more flexible methods that allow integrating genetic effects across variants and multiple quantitative traits. To do so, we build on the classical linear mixed model (LMM), a widely adopted framework for genetic studies.

The first contribution of this thesis is mtSet, an efficient mixed-model approach that enables genome-wide association testing between sets of genetic variants and multiple traits while accounting for confounding factors. In both simulations and real-data applications we demonstrate that mtSet effectively combines the advantages of variant-set and multi-trait analyses.

Next, we present a new model for gene-context interactions that builds on mtSet. The proposed interaction set test (iSet) yields increased statistical power for detecting polygenic interactions. Additionally, iSet enables the identification of genetic loci that are associated with different configurations of causal variants across contexts. After benchmarking the proposed method using simulated data, we consider two applications to real datasets, where we investigate genetic effects on gene expression across different cellular contexts and sex-specific genetic effects on lipid levels.

Finally, we describe LIMIX, a software framework for the flexible implementation of different LMMs. Most of the models considered in this thesis, including mtSet and iSet, are implemented and available in LIMIX. A unique aspect of the software is an inference framework that allows a large class of genetic models to be defined and, in many cases, to be efficiently fitted by exploiting specific algebraic properties. We demonstrate the utility of this software suite in two applied collaboration projects.

Taken together, this thesis demonstrates the value of flexible and integrative modelling in genetics and contributes new statistical methods for genetic analysis. These approaches generalise previous models, yet retain the computational efficiency that is needed to tackle large genetic datasets.

Description

Date

Advisors

Stegle, Oliver

Keywords

linear mixed model, statistical genetics, GWAS, multivariate

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge
Sponsorship
EMBL-European Bioinformatics Institute