Show simple item record

dc.contributor.authorBarzine, Mitraen
dc.date.accessioned2021-05-08T00:24:17Z
dc.date.available2021-05-08T00:24:17Z
dc.date.submitted2020-07en
dc.identifier.urihttps://www.repository.cam.ac.uk/handle/1810/322134
dc.description.abstractWith the improvement of high-throughput technologies during the last decade, several studies exploring the normal gene expression in human tissues have been published. Many studies examine the transcriptome with RNA sequencing (RNA-Seq), and others probe the proteome with unlabelled bottom-up Mass Spectrometry. As the sampling of undiseased tissues is difficult, the community often refers to expression atlases, which are collating these studies, to support or validate new findings. Despite many overlapping tissues between the studies, few atlases attempt to integrate all the data. In this thesis, I investigate the consistency of gene expression across tissues and studies in human with the help of transcriptomics captured with high-throughput sequencing (RNA-Seq) and proteomics generated with label-free bottom-up Mass Spectrometry (MS). After describing the transcriptomic and proteomic data and their state-of-art processing (Chapter 2), I review several identified sources of biases and my approaches to limit their effects (Chapter 3). The integration of the various transcriptomic datasets (Chapter 4) shows that the biological signal dominates the technical noise for RNA-Seq data. Tissue samples display higher levels of correlation for identical tissues in other studies than for other tissues in the same datasets. In other words, interstudy correlations for identical tissues are higher than correlations between different tissues within the same study. Globally, genes show similar expression profiles across studies for a given set of tissues. All genes categories are involved, including the tissue-specific genes and the ubiquitously expressed ones. After briefly discussing comparisons of proteomic data, I introduce a new proteomic quantification method, PPKM (Chapter 5). The PPKM method allows me to quantify about twice as many proteins compared to usual methods. Limited numbers of previous studies have shown various correlation levels between the expression of protein and mRNA in studies combining high-throughput transcriptomics and proteomics. I show that, for most tissues, we can observe quite good correlation levels (i.e. significantly better than expected by chance), even when the samples have different biological and technical backgrounds as they have been independently sourced. Many genes share similar patterns of expression between the two biological layers, e.g. genes that have a protein detected in a single tissue are more likely to have their mRNA showing specificity for the same tissue. Additionally, three groups of genes present functional enrichments of biological processes. Genes having highly correlated protein and mRNA expressions across tissues are enriched in catabolic processes. Genes having the most anticorrelated expressions are enriched for ribosomes and ncRNAs regulation. Genes with a protein detected in a single tissue are enriched in signalling processes. Overall, this thesis describes a global picture of the current consolidated knowledge we can extract from the joint study of public transcriptomic and proteomic data. Beyond confirming or improving observations reported in the literature, this work provides new insights into the ubiquitous and tissue-specific genes. To the best of my knowledge, this work has also established the most extensive list of genes with robust transcriptomic and proteomic expression across tissues and studies. Furthermore, it shows that joint study approaches can help the development of new methods, like the new proteomic PPKM quantification method. Finally, the highlighting of distinct functional enrichment profiles for groups of genes across tissues and studies lays a framework for further research.en
dc.description.sponsorshipEMBL International PhD Programmeen
dc.rightsAttribution 4.0 Internationalen
dc.rightsAttribution 4.0 Internationalen
dc.rightsAttribution 4.0 Internationalen
dc.rightsAttribution 4.0 Internationalen
dc.rightsAttribution 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.subjecttranscriptomicsen
dc.subjectRNA-seqen
dc.subjecthumanen
dc.subjecttissueen
dc.subjectgene expressionen
dc.subjectproteomicsen
dc.subjectLabel freeen
dc.subjectMS/MSen
dc.subjectdata exploratory analysesen
dc.subjectdata visualisationen
dc.subjectmRNAen
dc.subjectproteinen
dc.subjectcorrelationen
dc.subjectmeta-analysesen
dc.titleInvestigating normal human gene expression in tissues with high-throughput transcriptomic and proteomic data.en
dc.typeThesis
dc.type.qualificationlevelDoctoralen
dc.type.qualificationnameDoctor of Philosophy (PhD)en
dc.publisher.institutionUniversity of Cambridgeen
dc.identifier.doi10.17863/CAM.69592
rioxxterms.licenseref.urihttp://www.rioxx.net/licenses/all-rights-reserveden
dc.contributor.orcidBarzine, Mitra [0000-0002-0860-9510]
dc.publisher.urlhttps://www.ebi.ac.uk/research/publications/thesesen
dc.publisher.urlhttps://www.ebi.ac.uk/research/publications/thesesen
dc.publisher.urlhttps://www.ebi.ac.uk/research/publications/thesesen
dc.publisher.urlhttps://www.ebi.ac.uk/research/publications/thesesen
dc.publisher.urlhttps://www.ebi.ac.uk/research/publications/thesesen
rioxxterms.typeThesisen
dc.publisher.collegeHughes Hall
dc.type.qualificationtitlePhD in Life Sciencesen
cam.supervisorBrazma, Alvis
datacite.issupplementedby.doi10.5281/zenodo.4644394en


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Attribution 4.0 International
Except where otherwise noted, this item's licence is described as Attribution 4.0 International