Transcriptomic and proteomic dysregulation in neuropsychiatric disorders

Erady, Chaitanya

doi:https://doi.org/10.17863/CAM.96635

Transcriptomic and proteomic dysregulation in neuropsychiatric disorders

Repository URI

https://www.repository.cam.ac.uk/handle/1810/349823

Repository DOI

https://doi.org/10.17863/CAM.96635

Files

Thesis (23.77 MB)

Type

Thesis

Authors

Erady, Chaitanya

Abstract

The human genome is involved in the development of neuropsychiatric disorders such as schizophrenia (SCZ), bipolar disorder (BD), and major depressive disorder (MDD), but the causal genomic molecular players remain unclear. Therefore, in this thesis, we evaluated disorder-associated disruptions within different genomic elements – genes, transcripts, novel open reading frames (nORFs: unannotated ORFs in the genome with recent evidence of coding potential), and their proteins – and considered how these elements contributed to disorder development. We used a combination of transcriptomic and proteomic methods, leveraging large RNA-Seq and microarray datasets to circumvent issues with power for the proposed analyses. A literature summary on previously identified transcriptomic and proteomic dysregulations in neuropsychiatric disorders, and motivations for the work presented in the thesis are introduced in Chapter 1.

In Chapter 2, we evaluated sex-stratified cell count corrected case-control differences in gene expression and (where possible) transcript expression and transcript usage for five MDD RNA-Seq and microarray datasets derived from peripheral blood mononuclear cells (PBMCs) or whole blood. To improve detection of gene expression signals we used two meta-analytic approaches: a weighted Z-score and a Bayesian framework-based method, to compare 2,375 MDD cases and 3,606 CNT samples (largest meta-analysis to-date). Sex-specific dysregulations in gene and transcript expressions were identified. Pathway enrichment analysis showed that MDD was associated with dysregulations in innate and adaptive immunity, translation, complement cascade and chromatin organisation. Furthermore, theoretically motivated gene lists associated with MDD (e.g., cell cycle genes) were evaluated within each sex, and sex-specific dysregulations in replication-dependent histones, ribosomes, OXPHOS - electron transport chain, and nuclear mitochondrial genes were identified.

In Chapter 3, we tested the hypothesis of MDD gene expression heterogeneity by identifying sex-stratified co-expressed clusters of genes present in five different MDD case-control studies and evaluating gene expression profiles in theoretically-motivated MDD subgroups. Weighted gene co-expression network analysis (WGCNA) of these datasets implicated gene modules related to response to wounding, nitrogen catabolism, and protein localisation as different between female MDD and CNT samples. MDD subgroups were identified using a data-driven approach, however, these gene expression-based subgroups did not show significant correlation with MDD associated phenotypic traits. We also evaluated previous evidence of an inflamed subgroup of MDD identified using cell count data and identified enrichments in immune-related pathways in both males and females. Furthermore, using C-reactive protein as a marker of inflammation status, we identified B cell signalling dysregulation in male cases of MDD compared to healthy controls. Therefore, MDD subgroups identified using phenotypic traits have distinct transcriptomic profiles, which should motivate future studies to perform subgroup-based MDD analyses.

In Chapter 4, we explored the role of nORF and their protein products in neuropsychiatric disorders. nORFs - present within both protein-coding and noncoding regions of the human genome, represent an unexplored pool of genomic products within previously investigated genomic regions and could prove beneficial in understanding neuropsychiatric disorders. As the functional relevance of nORFs in the human genome is poorly understood, we conducted a systematic analysis to curate nORFs from three online sources, processed them extensively, and evaluated their genomic functions which revealed involvement of nORFs in nucleoside binding, protein-binding, and kinase-related processes. Furthermore, in comparing the genomic sequences of nORFs to canonical ORFs present within genes defined as protein-coding, we identified nORFs as shorter and more exon-rich sequences. Such evidence of noncanonical translations highlight the need to re-evaluate and update long standing binary categorisations of the genome into protein-coding and noncoding.

In Chapter 5, a proteogenomic pipeline which combines proteomic and genomic datasets for nORF identification - proteomic data matches against genomic data to identify novel, unannotated peptide sequences such as nORFs, was described. However, proteomic datasets are not always available and some of the largest consortiums like GTEx (The Genotype-Tissue Expression project) and TCGA (The Cancer Genome Atlas) provide only RNA-Seq data. To mitigate this limitation, we developed a transcriptomic pipeline to identify nORFs within transcripts assembled from RNA-Seq data. The proteogenomic and transcriptomic pipelines were validated using datasets isolated from post-mortem human brains and mouse B and T cells and 3,054 transcribed nORFs and 1,658 translated nORFs were identified, respectively.

In Chapter 6, the transcriptomic and proteogenomic pipelines we developed were used to identify nORFs in SCZ and BD samples. Commonly available RNA-Seq datasets are prepared using an rRNA depletion or poly(A) selection to remove highly abundant rRNAs which confound detection of transcripts with low abundances, or to select for mRNAs that contain poly(A) tails. As nORFs can be present within non-poly(A)-tailed transcripts, we required RNA-Seq datasets without poly(A) selection which were available for SCZ and BD but not MDD cases. We identified 3,103 transcribed nORFs of which 44 and 61 were differentially expressed in SCZ and BD, respectively. 21 translated nORFs were also identified. nORFs differentially expressed in SCZ and BD were found to be enriched for DHS1 (DNase1 hypersensitive sites) and histone modifications suggesting involvement in gene regulation. Two of the translated nORFs expressed in SCZ and BD but not CNT samples were present near interesting candidate genes: DISC1FP1 and STXBP1. Overlaps between transcribed nORFs, and SCZ- and BD-specific risk genes revealed nORFs to be enriched within SCZ- specific loci on chromosome 2. Functional predictions for the transcribed and translated nORFs revealed potential DNA-binding, enzymatic and cytoskeletal involvement. nORF expression was also found to be different between males and females, and subgroups of schizophrenic cases with and without psychosis. Therefore, nORFs are valuable genomic components that warrant further investigation in neuropsychiatric and other disorder contexts.

In Chapter 7, the major findings from this thesis on the transcriptomic and proteomic dysregulations underlying neuropsychiatric disorders and how they add to our understanding of the human genome is discussed. Therefore, this thesis puts into broader clinical perspective what we know about the genetic architecture of neuropsychiatric disorders, including the novel genetic elements described in this thesis, and how this understanding could aid in the development of appropriate diagnostic and therapeutic strategies for these debilitating conditions in the future.

Date

2022-12-31

Advisors

Bullmore, Edward
Lynall, Mary-Ellen

Keywords

Bipolar disorder, Gene expression, Major Depressive disorder, Neuropsychiatric, novel open reading frames, Proteomics, Schizophrenia, Transcriptomics

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights

Collections

Theses - Psychiatry

Transcriptomic and proteomic dysregulation in neuropsychiatric disorders

Repository URI

Repository DOI

Files

Type

Change log

Authors

Abstract

Description

Date

Advisors

Keywords

Qualification

Awarding Institution

Rights

Collections