Repository logo
 

Estimating telomere length from whole genome sequencing data


Loading...
Thumbnail Image

Type

Thesis

Change log

Authors

Farmery, James Henry Royston  ORCID logo  https://orcid.org/0000-0002-5715-1665

Abstract

This thesis details the development of two computational tools, Telomerecat and Parabam, as well as their applications to whole genome sequencing (WGS) data.

Telomerecat is a tool for estimating telomere length from WGS data. The strength of Telomerecat lies in its applicability. This applicability is due to a number of advantages over previous attempts to estimate telomere length from WGS. Chief amongst these advantages is that it makes no assumption about the underlying chromosome count or size of the genome within input samples. This means that Telomerecat lends itself well to analysing cancer samples where such assumptions are unfounded. This also means it is applicable to non-human samples, a first for tools of its kind. Furthermore, a novel method for filtering reads derived from interstitial telomere sequences means that it does not rely on previously applied analyses, a source of bias.

The other tool described in this thesis is Parabam. Parabam is the first tool of its kind to allow users to apply a function to all of the reads in sequence alignment files, in parallel. Furthermore, Parabam includes a novel method for iterating over index sorted sequence files as if they were name sorted. We provide evidence that Parabam is a quicker way to create complex subsets and statistics from sequence alignment files.

In the latter half of the thesis we detail two applications of Telomerecat to large scale WGS projects. The first application, to the Prostate ICGC UK cohort, unveils hitherto uncovered associations between telomere length and previously identified molecular subtypes as well as cancer stage. In the second application, to the NIHR BioResource - Rare Disease cohort, we discover a previously unidentified variant in DKC1 that we propose is directly linked to short telomeres and an immunodeficient phenotype.

Description

Date

2017-09-30

Advisors

Lynch, Andy Graeme

Keywords

telomere, telomerecat, parabam, prostate, cancer, rare blood disease, dyskeratosis congenita

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge
Sponsorship
Funded by Cancer Research UK PhD Studentship