Repository logo

The use of whole exome sequencing data to identify candidate genes involved in cancer and benign tumour predisposition



Change log


Fewings, Eleanor Rose  ORCID logo


The development of whole exome sequencing has transformed the study of disease predisposition. The sequencing of both large disease sets and smaller rare disease families enables the identification of new predisposition variants and potentially provide clinical insight into disease management. There is no standard protocol for analysing exome sequencing data. Outside of extremely large sequencing studies including thousands of individuals, statistical approaches are often underpowered to detect rare disease associated variants. Aggregation of variants into functionally related regions, including genes, gene clusters, and pathways, allows for the detection of biological processes that, when interrupted, may impact disease risk. In silico functional studies can also be utilised to further understand how variants disrupt biological processes and identify genotype-phenotype relationships. This study describes the exploration of sequencing datasets from cancers and benign tumour diseases including: i) hereditary diffuse gastric cancer, ii) sweat duct proliferation tumours, iii) adrenocortical carcinoma, and iv) breast cancer. Each set underwent germline whole exome sequencing followed by additional tumour or targeted sequencing to identify associated predisposition genes. Variants within a cluster of risk genes that are involved in double strand break repair were identified as associated with hereditary diffuse gastric cancer risk via gene ontology enrichment analysis. This cluster included PALB2 within which, using externally collated data, loss of function variants were identified as significantly associated with hereditary diffuse gastric cancer risk. Germline protein-affecting variants in the myosin gene MYH9 were identified in all individuals with a rare sweat duct proliferative syndrome, suggesting a role for MYH9 in skin development, regulation and tumorigenesis. These MYH9 variants were analysed in silico to identify a genotype-phenotype relationship between the clinical presentation and variants in the ATP binding pocket of the protein. Tumour matched normal sequence data from adrenocortical carcinoma cases was used to elucidate the role of Lynch syndrome genes in disease pathogenesis. Within the breast cancer set, candidate genes were selected to undergo targeted sequencing in a larger set of cases to further explore their role in breast cancer risk. Risk associated genes identified within this study may ultimately aid in diagnosis and management of disease. This thesis has also generated multiple novel tools and sequencing analysis techniques that may be of use for further studies by aiding in the prioritisation of candidate variants. The described techniques will provide support to researchers working on rare, statistically underpowered datasets and to provide standard analysis pipelines for a range of dataset sizes and types, including familial data and unrelated individuals.





Tischkowitz, Marc


Cancer, Genetics, Predisposition, Sequencing, Bioinformatics, Breast Cancer, Hereditary Diffuse Gastric Cancer


Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge