Relationships between chromatin features and genome regulation
Regulation of gene expression is an essential process for all living organisms. Transcriptional regulation, associated with chromatin, is governed by: (1) DNA sequence, which creates regulatory sites (promoters, enhancers and silencers), where sequence motifs and features (e. g. CpG) can attract transcription factors (TFs) and influence chromatin structure or RNA polymerase II (Pol II) binding, initiation and elongation; (2) non-sequence, epigenetic factors - histone modifications, TF binding, chromatin remodelling (histone placement, eviction and reconstitution), and non-coding RNA regulation. These factors interact with each other, creating a complex network of interactions. In this thesis I describe computational studies of heterochromatin factors in regulation of gene and repeat expression, an analysis of active regulatory elements, and global analyses of big datasets in C. elegans.
I first show that a team of heterochromatin factors - HPL-2/HP1, LIN-13, LIN-61, LET-418/Mi-2, and H3K9me2 histone methyltransferase MET-2/SETDB1 - collaborates with piRNA and nuclear RNAi pathways to silence repetitive elements and protect the germline. I also found that the TACBGTA motif is particularly enriched on repeats and heterochromatin factors binding sites, and that repeat elements are derepressed in the soma during normal C. elegans ageing.
I then describe the work on active regulatory regions. I show that CFP-1/CXXC1 binds CpG dense, nucleosome depleted promoters and, along SET-2, is required for H3K4me3 deposition at these loci. Using expression profiling I determined that the majority of CFP-1 binding targets are not significantly mis-regulated in cfp-1 mutants, but are weakly upregulated in bulk analyses. I also show that CFP-1 functionally interacts with the Sin3S/HDAC complex. In cfp-1 mutant I observed both loss and gain of SIN-3 binding, depending on chromatin context.
Finally, I performed a data driven study on a large collection of ChIP-seq profiles using non-parametric sparse factor analyses (NSFA) and compared it to other, unsupervised machine learning algorithms. This study uncovered interactions and structure in genomic datasets. In addition, I present a collection of computational tools and methods I developed to facilitate processing, storage, retrieval, annotation, and analyses of large datasets in genomics.