Statistical methods to improve understanding of the genetic basis of complex diseases
Robust statistical methods, utilising the vast amounts of genetic data that is now available, are required to resolve the genetic aetiology of complex human diseases including immune-mediated diseases. Essential to this process is firstly the use of genome-wide association studies (GWAS) to identify regions of the genome that determine the susceptibility to a given complex disease. Following this, identified regions can be fine-mapped with the aim of deducing the specific sequence variants that are causal for the disease of interest.
Functional genomic data is now routinely generated from high-throughput experiments. This data can reveal clues relating to disease biology, for example elucidating the functional genomic annotations that are enriched for disease-associated variants. In this thesis I describe a novel methodology based on the conditional false discovery rate (cFDR) that leverages functional genomic data with genetic association data to increase statistical power for GWAS discovery whilst controlling the FDR. I demonstrate the practical potential of my method through applications to asthma and type 1 diabetes (T1D) and validate my results using the larger, independent, UK Biobank data resource.
Fine-mapping is used to derive credible sets of putative causal variants in associated regions from GWAS. I show that these sets are generally over-conservative due to the fact that fine-mapping data sets are not randomly sampled, but are instead sampled from a subset of those with the largest effect sizes. I develop a method to derive credible sets that contain fewer variants whilst still containing the true causal variant with high probability. I use my method to improve the resolution of fine-mapping studies for T1D and ankylosing spondylitis. This enables a more efficient allocation of resources in the expensive functional follow-up studies that are used to elucidate the true causal variants from the prioritised sets of variants.
Whilst GWAS investigate genome-wide patterns of association, it is likely that studying a specific biological factor using a variety of data sources will give a more detailed perspective on disease pathogenesis. Taking a more holistic approach, I utilise a variety of genetic and functional genomic data in a range of statistical genetics techniques to try and decipher the role of the Ikaros family of transcription factors in T1D pathogenesis. I find that T1D-associated variants are enriched in Ikaros binding sites in immune-relevant cell types, but that there is no evidence of epistatic effects between causal variants residing in the Ikaros gene region and variants residing in genome-wide binding sites of Ikaros, thus suggesting that these sets of variants are not acting synergistically to influence T1D risk.
Together, in this thesis I develop and examine a range of statistical methods to aid understanding of the genetic basis of complex human diseases, with application specifically to immune-mediated diseases.