Repository logo
 

Probabilistic modelling of somatic alterations in bulk tissue and single cells using repeat DNA

cam.depositDate2021-12-12
cam.restrictionthesis_access_embargoed
cam.supervisorLynch, Andy Graeme
cam.supervisorMorrissey, Edward Robert
dc.contributor.authorAbujudeh, Samer
dc.date.accessioned2021-12-16T02:11:59Z
dc.date.available2021-12-16T02:11:59Z
dc.date.submitted2019-03-08
dc.date.updated2021-12-12T15:17:18Z
dc.description.abstractChromosomal instability characterises several cancer types, in which large-scale structural alterations of the genome accumulate at an increased rate. An important class of structural alterations are somatic copy number alterations (SCNAs). SCNAs have been shown to be major drivers of oncogenesis and are associated with prognosis and response to therapies. Current sequencing and array-based methods that are used to infer SCNAs are cost-prohibitive for widespread clinical use. A low-cost, simple and more clinically applicable method to amplify and sequence more than 10,000 repeat regions across the genome was recently developed, called FAST-SeqS. However, current computational methods do not make effective use of this low-cost assay. This limits its application to clinical medicine and to biomedical research. In this thesis, I develop conliga; a probabilistic generative model and associated inference algorithms to infer relative copy number from FAST-SeqS data at the amplicon level. I implement this method in R and C++ and provide the software as an open-source tool. By applying conliga and FAST-SeqS to oesophageal adenocarcinoma and related conditions, I show that it has similar performance to QDNAseq applied to low-coverage whole-genome sequencing, which is a more expensive and laborious alternative for SCNA profiling. I explore several aspects of FAST-SeqS data and show that sample-specific biases can affect SCNA inferences. By extending the conliga model, I demonstrate that these biases can be jointly inferred with SCNA profiles. I validate these extensions by comparing the results to inferences obtained from whole genome sequencing in prostate cancer samples. I show that the variants present in FAST-SeqS data can be used to infer tumour purity, ploidy and allele-specific copy number. This has potential application in large-scale cancer genome studies to identify samples with sufficient purity before performing high-coverage whole-genome sequencing. Finally, I describe preliminary data showing that the FAST-SeqS protocol can be applied to single cells, enabling further extensions of the conliga model which could lead to the inference of SCNAs in single cells.
dc.identifier.doi10.17863/CAM.79000
dc.identifier.urihttps://www.repository.cam.ac.uk/handle/1810/331546
dc.language.isoeng
dc.publisher.institutionUniversity of Cambridge
dc.rightsAll Rights Reserved
dc.rights.urihttps://www.rioxx.net/licenses/all-rights-reserved/
dc.subjectSomatic copy number alterations
dc.subjectChromosomal instability
dc.subjectProbabilistic modelling
dc.subjectBayesian inference
dc.subjectBayesian nonparametrics
dc.subjectHidden Markov models
dc.subjectHierarchical Dirichlet processes
dc.subjectconliga
dc.subjectFAST-SeqS
dc.subjectOesophageal adenocarcinoma
dc.subjectBarrett's oesophagus
dc.subjectProstate cancer
dc.subjectTumour purity
dc.subjectSingle-cell copy number
dc.subjectLow-cost and clinically applicable copy number profiling
dc.subjectRepeat DNA
dc.subjectLong interspersed nuclear elements
dc.subjectLINE1
dc.titleProbabilistic modelling of somatic alterations in bulk tissue and single cells using repeat DNA
dc.typeThesis
dc.type.qualificationlevelDoctoral
dc.type.qualificationnameDoctor of Philosophy (PhD)
pubs.funder-project-idWellcome Trust (102273/Z/13/Z)
pubs.licence-display-nameApollo Repository Deposit Licence Agreement
pubs.licence-identifierapollo-deposit-licence-2-1
rioxxterms.licenseref.urihttps://www.rioxx.net/licenses/all-rights-reserved/
rioxxterms.typeThesis

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
samer_abujudeh_thesis_final.pdf
Size:
35.32 MB
Format:
Adobe Portable Document Format
Description:
Thesis
Licence
https://www.rioxx.net/licenses/all-rights-reserved/