Comprehensive analysis of high-throughput experiments for investigating transcription and transcriptional regulation

Toedling, Joern Michael

Comprehensive analysis of high-throughput experiments for investigating transcription and transcriptional regulation

Repository URI

https://www.repository.cam.ac.uk/handle/1810/267885

Repository DOI

https://doi.org/10.17863/CAM.13811

Files

Thesis (3.44 MB)

Type

Thesis

Authors

Toedling, Joern Michael

Abstract

As the number of fully sequenced genomes grows, efforts are shifted towards investigation of functional aspects. One research focus is the transcriptome, the set of all transcribed genomic features. We aspire to understand what features constitute the transcriptome, in which context these are transcribed and how their transcription is regulated. Studies that aim to answer these questions frequently make use of high-throughput technologies that allow for investigation of multiple genomic regions, or transcribed copies of genomic regions, in parallel. In this dissertation, I present three high-throughput studies I have been involved in, in which data gained from oligo-nucleotide tiling microarrays or large-scale cDNA sequencing provided insights into the transcriptome and transcriptional regulation in the model organisms Saccharomyces cerevisiae and Mus musculus. Interpretation of such high-throughput data poses two major computational tasks. The primary statistical analysis includes quality assessment, data normalisation and identification of significantly affected targets, i.e. regions of the genome deemed transcribed or involved in transcriptional regulation. Second, in an integrative bioinformatic analysis, the identified targets need to be interpreted in context of the current genome annotation and related experimental results. I provide details of these individual steps as they were conducted in the three studies. For both primary and integrative analysis, functional, extensible and welldocumented software is required, which implements individual analysis steps, allows for concise visualisation of intermittent and final results and facilitates the construction of automated, programmed workflows. Ideally such software is optimised with respect to scalability, reproducibility and methodical scope of the analyses. This dissertation contains details of two such software packages in the Bioconductor project, which I (co-)developed.

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights

Collections

Theses - European Bioinformatics Institute (EMBL-EBI)