Repository logo
 

Statistical co-analysis of high-dimensional association studies


Type

Thesis

Change log

Authors

Liley, Albert James  ORCID logo  https://orcid.org/0000-0002-0049-8238

Abstract

Modern medical practice and science involve complex phenotypic definitions. Understanding patterns of association across this range of phenotypes requires co-analysis of high-dimensional association studies in order to characterise shared and distinct elements. In this thesis I address several problems in this area, with a general linking aim of making more efficient use of available data. The main application of these methods is in the analysis of genome-wide association studies (GWAS) and similar studies.

Firstly, I developed methodology for a Bayesian conditional false discovery rate (cFDR) for levering GWAS results using summary statistics from a related disease. I extended an existing method to enable a shared control design, increasing power and applicability, and developed an approximate bound on false-discovery rate (FDR) for the procedure. Using the new method I identified several new variant-disease associations. I then developed a second application of shared control design in the context of study replication, enabling improvement in power at the cost of changing the spectrum of sensitivity to systematic errors in study cohorts. This has application in studies on rare diseases or in between-case analyses.

I then developed a method for partially characterising heterogeneity within a disease by modelling the bivariate distribution of case-control and within-case effect sizes. Using an adaptation of a likelihood-ratio test, this allows an assessment to be made of whether disease heterogeneity corresponds to differences in disease pathology. I applied this method to a range of simulated and real datasets, enabling insight into the cause of heterogeneity in autoantibody positivity in type 1 diabetes (T1D). Finally, I investigated the relation of subtypes of juvenile idiopathic arthritis (JIA) to adult diseases, using modified genetic risk scores and linear discriminants in a penalised regression framework.

The contribution of this thesis is in a range of methodological developments in the analysis of high-dimensional association study comparison. Methods such as these will have wide application in the analysis of GWAS and similar areas, particularly in the development of stratified medicine.

Description

Date

2018-01-16

Advisors

Wallace, Chris
Mckinney, Eoin
Todd, John Andrew

Keywords

genetics, disease heterogeneity, multivariate Gaussian, statistical methods, two-stage association testing, gwas, non-parametric methods, false discovery rate, genetic risk scores, lasso, penalised regression, two-groups model, empirical Bayes, statistics, biostatistics, autoimmune disease, juvenile idiopathic arthritis, type 1 diabetes, autoimmune thyroid disease, shared controls, autoantibody, statistical leverage, effect size distribution

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge
Sponsorship
I was supported by a grant from the NIHR Cambridge BRC