Repository logo

Statistical Techniques to Fine Map the Related Genetic Aetiology of Autoimmune Diseases



Change log


Fortune, Mary Doris  ORCID logo


Genome Wide Association Studies (GWAS) have uncovered many genetic regions which are associated with autoimmune disease risk. In this thesis, I present methods which I have developed to build upon these studies and enable the analysis of the causal variants of these diseases.

Colocalization methods disentangle whether potential causal variants are shared or distinct in related diseases, and enable the discovery of novel associations below the single-trait significance threshold. However, existing approaches require independent datasets to accomplish this. I extended two methods to allow for the shared-control design; one of these extensions also enables fine mapping in the case of shared variants. My analysis of four autoimmune diseases identified 90 regions associated with at least one disease, 33 of which were associated with 2 or more disorders; 14 of these had evidence of distinct causal variants.

Once associated variants have been identified, we may wish to test some aggregate property, such as enrichment within an annotation of interest. However, the null distribution of GWAS signals showing association with a trait and preserving expected correlation due to linkage disequilibrium is complicated. I present an algorithm which computes the expected output of a GWAS, given any arbitrary definition of "null", and hence can be used to simulate the null distribution required for such a test.

Commonly, GWAS report only summary data, and determining which genetic variants are causal is more difficult; the strongest signal may merely be correlated with the true causal variant. I have developed a statistical method for fine mapping a region, requiring only GWAS p-values and publicly available reference datasets. I sample from the space of potential causal models, rejecting those leading to expected summary data excessively different from that observed. This removes the need for the assumption of a single causal variant. In contrast to other summary statistic methods which allow for multiple causal variants, it does not depend upon availability of effect size estimates, or the allelic direction of effect and it can infer whether the pattern of association is likely caused by a non-genotyped SNP without requiring imputation. I discuss the effect of choice of reference dataset, and the implications for other summary statistics techniques.




Wallace, Chris


Statistics, Genomics, Genetics, GWAS, Autoimmune Disease, Mathematics


Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge
My PhD was funded by the Wellcome Trust (through the Wellcome Trust Mathematical Genomics and Medicine PhD).