Rare Genetic Variants and Cancer Susceptibility
Repository URI
Repository DOI
Change log
Authors
Abstract
Genetic susceptibility to breast cancer is known to be conferred by common variants, identified through GWAS, together with some rarer variants conferring higher disease risks. The latter, identified through genetic linkage or targeted sequencing studies, includes protein-truncating variants and some rare missense variants in ATM, BARD1, BRCA1, BRCA2, CHEK2, RAD51C, RAD51D, PALB2 and TP53. However, these variants together explain less than half the familial relative risk of breast cancer. Similarly for other cancers, GWAS have identified many susceptibility loci and rare variants have been identified in a few genes, e.g. from familial studies. Across cancers, a large proportion of the heritability remains unexplained and the contribution of rare coding variants across the whole coding landscape is largely unknown. Rare variants may be one source of the missing heritability. GWAS are underpowered to detect rare variants but recent advances in NGS datasets, particularly the availability of WES and WGS data for large case-control and cohort studies, facilitate the investigation of rare variants across the whole-exome or whole-genome, with no prior hypothesis of association.
In Chapter 3, we performed a meta-analysis across 3 large whole-exome sequencing datasets from the Breast Cancer Association Consortium and UK Biobank. Burden tests were performed for protein-truncating and rare missense variants in over 15,000 genes, and family history information was incorporated to increase the power to detect associations. Associations between protein-truncating variants and breast cancer risk were identified at exome-wide significance for ATM, BRCA1, BRCA2, CHEK2 and PALB2 together with MAP3K1. Associations at P<1x10-4 were additionally identified for LZTR1, ATRIP and BARD1. For deleterious rare missense variants or protein-truncating variants, we additionally identified an association for CDKN2A at exome-wide significance. The overall contribution to the heritability of breast cancer from coding variants in genes beyond those previously known was estimated to be small.
In Chapter 4 we describe in more detail the method in Chapter 3 of incorporating family history information to increase the power to detect associations, this is particularly important in large cohort studies where the number of unaffected individuals outnumbers the number of affected individuals. We showed theoretically and using Monte-Carlo simulations that using family history with a weighting of 0.5 compared to true cases increases the power to detect associations for variants with moderate effect sizes, and demonstrated this for breast, bowel, lung and prostate cancer.
In Chapter 5 we applied the WES analysis to protein-truncating variants in genes and the risk of 10 additional cancers in the UK Biobank – breast, bowel, lung, prostate, pancreatic, endometrial, ovarian, kidney and bladder cancer, as well as malignant melanoma. We identified many genes associated with cancer risk at exome-wide significance, including a novel association for PTVs in NHEJ1 and ovarian cancer risk. This gene encodes a DNA repair factor essential for non-homologous end joining. Some genes were associated with multiple cancers e.g. ATM was associated with 6 cancers with P<1x10-3. In Chapter 6 we estimate the contribution of PTVs in genes to the heritability of each cancer, as well as assessing the overlap of genes associated with multiple cancers. Ovarian cancer had the greatest estimated proportion of genes associated with risk (0.037), and the greatest proportion of the familial relative risk attributable to PTVs in genes (45.9%). Some genes had posterior>0.8 of being associated with multiple cancers: APC, ATM, BAP1, BRCA1, BRCA2, CHEK2, MAP3K1, MLH1, MSH2, MSH6 and PALB2. In the joint cancer models, there was a significant enrichment of genes associated with cancer pairs breast-prostate, breast-ovarian, bowel-endometrial and breast-pancreatic. ATM was associated with the most cancer pairs (9 pairs) with posterior>0.8. The results increase our understanding of the role of genes in multiple cancer types and highlight the enrichment of tumour suppressor and DNA repair genes among cancer susceptibility genes.
Finally, in chapter 7, we describe an analysis of rare non-coding variants and breast cancer risk using whole-genome sequencing in UK Biobank. We performed burden tests incorporating family history and robust SKAT-O tests for UTR regions of ~19,000 genes, and ~35,000 putative promoter regions. After replication, there were no significant regions with P<1x10-4. Further analysis may therefore focus on rare variants in more distant regulatory elements e.g. enhancer regions, which are enriched for GWAS loci.
Collectively, the findings in this thesis provide significant insights into the genetic architecture of many cancer types and highlight the central role of tumour suppressor and DNA repair genes. Further research will be important to confirm the observed associations and provide more precise risk estimates. if confirmed, these findings could help improve the risk management of high-risk patients and help prevent future cancer cases.

