Repository logo
 

Elucidating the function and biogenesis of small non-coding RNAs using novel computational methods & machine learning.


Change log

Authors

Abstract

The discovery of RNA in 1868 by Friedrich Miescher was meant to be the prologue to an exciting new era in Biology full of scientific breakthroughs and accomplishments. Since then, RNAs have been proven to play an indispensable role in biological processes such as coding, decoding, regulation and expression of genes. In particular, the discovery of small non-coding RNAs and especially miRNAs, in C. elegans first and thereafter to almost all animals and plants, started to fill in the puzzle of a complex gene regulatory network present within cells. The aim of this thesis is to shed more light on the features and functionality of small RNAs. In particular, we will focus on the function and biogenesis of miRNAs and piRNAs, across multiple species, by employing advanced computational methods and machine learning.

We first introduce a novel method (Chimira) for the identification of miRNAs from sets of animal and plant hairpin precursors along with post-transcriptional terminal modifications that are not encoded by the genome. This method allows the characterisation of the prevalence of miRNA isoforms within different cell types and/or conditions. We have applied Chimira within a larger study that examines the effect of terminal uridylation in RNA degradation in oocytes and cells in either embryonic or adult stage. This study showed that uridylation is the predominant transcriptional regulation mechanism in oocytes while it does not retain the same functionality on mRNAs and miRNAs, both in embryonic and adult cells.

We then move on to a large-scale analysis of small RNA-Seq datasets in order to identify potential modification signatures across specific conditions and cell types or tissues in Human and Mouse. We extracted the full modification profiles across 461 samples, unveiling the high prevalence of modification signatures of mainly 1 to 4 nucleotides. Additionally, samples of the same cell type and/or condition tend to cluster together based on their miRNA modification profiles while miRNA gene precursors with close genomic proximity showed a significant degree of co-expression. Finally, we elucidate the determinant factors in strand selection during miRNA biogenesis as well as update the miRBase annotation with corrected miRNA isoform sequences. Next, we introduce a novel computational method (mirnovo) for miRNA prediction from RNA-Seq data with or without a reference genome using machine learning. We demonstrate its efficiency by applying it to multiple datasets, including single cells and RNaseIII deficient samples, supporting previous studies for the existence of non-canonical miRNA biogenesis pathways. Following this, we explore and justify a novel piRNA biogenesis pathway in Mouse which is independent of the MILI enzyme. Finally, we explore the efficiency of CRISPR/Cas9 induced editing of miRNA targets based on the computationally predicted accessibility of the targeted regions in the genome.

We have publicly released two web-based novel computational methods and one on-line resource with results regarding miRNA biogenesis and function. All findings presented in this study comprise another step forward within the journey of elucidation of RNA functionality and we believe they will be of benefit to the scientific community.

Description

Date

2017-08-15

Advisors

Enright, Anton

Keywords

bioinformatics, non coding RNAs, miRNAs, machine learning, piRNAs, computational biology, miRNA modifications, chimira, mirnovo, miratlas, epigenetics, post-transcriptional modifications

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge
Sponsorship
EMBL