Repository logo
 

Assessing the usage of alternative transcripts in human tissues from RNA-seq and proteomics data


Type

Thesis

Change log

Authors

Cardoso Marcelino dos Santos, Sérgio Miguel 

Abstract

Alternative splicing is an important step in gene expression regulation in eukaryotes, through which a single gene can express different transcript isoforms. We can now use RNA sequencing (RNA-seq) data to identify which isoforms of each gene are being expressed in a specific condition, by quantifying the expression level of each isoform of a gene, even though quantification of isoforms remains a difficult task. In this way, we can better understand how prevalent this process is and how often a gene expresses different isoforms. It can also be evaluated if all isoforms of a gene are about equally expressed or if there is one dominant isoform that is significantly more expressed than the others. Moreover, by applying this analysis to different tissues, it can be assessed if there are changes in splicing between different conditions and if such a change has a biological role. A dataset of 32 normal human tissues was used in this study. The results show that, although alternative splicing can lead to the expression of different transcripts of a gene, many genes have an n-fold dominant transcript – a transcript that is expressed at n times higher level than the second most expressed one. On average, 68% of protein-coding genes expressed in a given tissue have a 2-fold dominant transcript and 47% have a 5-fold dominant transcript. It was observed that the dominant transcript of a gene tends to be the same across tissues, but there are cases where the dominant isoform switches between tissues, these cases are designated switch events. For a given pair of tissues, there are on average around thirty 2-fold switch events and just below four 5-fold switch events. The switching exons often significantly overlap and the most common types of alternative splicing are alternative 3’ selection (24% of the cases) and alternative 5’ selection (21%). To evaluate the conservation of the transcripts, the dominant transcripts were compared to APPRIS principal isoforms. These isoforms are annotated based on their function, protein structure, and cross-species conservation. 69.2% of the 2-fold dominant transcripts and 81.1% of the 5-fold dominant transcripts are APPRIS principal isoforms. It was also observed that in 80% of the switches there are no protein domain changes. Similar results were obtained when the same analysis was done using the GTEx dataset, which has a much higher number of samples, containing data from 54 conditions. In this case, on average 59% of expressed genes in a given condition had a 2-fold dominant transcript and 31% had a 5-fold dominant transcript. The number of switch events was again low, given the number of dominant transcripts, indicating that dominant transcripts tend to be conserved across normal tissues. A comparative analysis of matching tissues common to the two mentioned datasets was also performed and, although the datasets are different, there were switch events in common between both of them. 5 examples of 5-fold switches involving domain swaps were analysed in detail and it was revealed that the type of genes affected by switches can be quite distinct and the protein domains that change between isoforms can vary both in number and function. The tissues found to be particularly more represented on these switch events were skeletal muscle, testis and cerebral cortex. These results show that in most cases, changes in alternative splicing do not change transcripts significantly, and respectively, the changes at the protein level are minor. This and similar observations indicate that alternative splicing may not be the main process responsible for generating protein diversity. In the last study presented in this thesis, it is analysed how RNA-seq data can be integrated with a data-independent acquisition (DIA) mass spectrometry method, SWATH-MS (sequential window acquisition of all theoretical spectra-mass spectrometry), to study the impact of depleting PRPF8, a core spliceosomal component, on the proteome. The results show that intron retention events lead to decreased protein abundance. It is also shown that differential transcript usage and gene expression have effects on protein abundance, altering it proportionally to transcript levels. Overall, some links between transcript and protein level are revealed and it is demonstrated how perturbed systems can be used in the study of alternative splicing.

Description

Date

2018-09-28

Advisors

Brazma, Alvis

Keywords

alternative splicing, rna-seq, differential transcript usage, switch event, dominant transcript

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge