Repository logo
 

Proteogenomics for Personalised Molecular Profiling


Type

Thesis

Change log

Authors

Schlaffner, Christoph Norbert  ORCID logo  https://orcid.org/0000-0003-2717-3406

Abstract

Technological advancements in mass spectrometry allowing quantification of almost complete proteomes make proteomics a key platform for generating unique functional molecular data. Furthermore, the integrative analysis of genomic and proteomic data, termed proteogenomics, has emerged as a new field revealing insights into gene expression regulation, cell signalling, and disease processes. However, the lack of software tools for high-throughput integration and unbiased modification and variant detection hinder efforts for large-scale proteogenomics studies. The main objectives of this work are to address these issues by developing and applying new software tools and data analysis methods. Firstly, I address mapping of peptide sequences to reference genomes. I introduce a novel tool for high-throughput mapping and highlight its unique features facilitating quantitative and post-translational modification mapping alongside accounting for amino acid substitutions. The performance is benchmarked. Furthermore, I offer an additional tool that permits generation of web accessible hubs of genome wide mappings. To enable unbiased identification of post-translational modifications and amino acid substitutions for high resolution mass spectrometry data, I present algorithmic updates the mass tolerant blind spectrum comparison tool ’MS SMiV’. I demonstrate the applicability of the changes by benchmarking against a published mass tolerant database search of a high resolution tandem mass spectrometry dataset. I then present the application of ‘MS SMiV’ on a panel of 50 colorectal cancer cell lines. I show that the adaption of ‘MS SMiV’ outperforms traditional sequence database based identification of single amino acid variants. Furthermore, I highlight the utility of mass tolerant spectrum matching in combination with isobaric labelled quantitative proteomics in distinguishing between post-translational modifications and amino acid variants of similar mass. In the last part of this work I integrate both tools with a high-throughput proteogenomic identification pipeline and apply it to a pilot study of chondrocytes derived from 12 osteoarthritic individuals. I show the value of this approach in identifying variation between individuals and molecular levels and highlight them with individual examples. I show that multi-plexed proteogenomics can be used to infer genotypes of individuals.

Description

Date

2017-06-16

Advisors

Bender, Andreas
Choudhary, Jyoti

Keywords

proteogenomics, proteomics, genomics, bioinformatics, open-source software, annotation, mass spectrometry, personalised profiling, personalised medicine, large-scale, post-translational modification, amino acid variant, single nucleotide variant, isoform, osteoarthritis, workflow, analysis, computational biology, spectrum library searching, spectral similarity

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge
Sponsorship
This work was supported by NIH grant ( U41HG007234 ) to the GENCODE project and Wellcome Trust grant ( WT098051 ) to the Sanger Institute.