Repository logo
 

Understanding Genomes Through Engineered Structural Variation


Type

Thesis

Change log

Authors

Koeppel, Jonas 

Abstract

Sequencing of the human genome has provided us with a detailed map of its content. While enormous progress has been made towards understanding the 1% of the human genome that is protein coding, we are still mostly in the dark about the function and relevance of the remaining 99%. Progress has been difficult because the non-coding genome is vast, the individual nucleotides hold less information, and we have lacked the tools to engineer and probe it to the necessary extent. This is beginning to change with the advent of ‘search and replace’ genome engineering technologies such as CRISPR prime editing. I leveraged the ability of prime editors to insert recognition sequences for recombinases at high throughput to engineer genomes at an unprecedented scale. In the process, I made discoveries about the biology of genome engineering, structural variation, and gene regulation.

I first outlined the determinants of short sequence insertion using prime editing by systematically measuring the frequency of insertion for 3,604 short sequences in four target sites of three human cell lines with varying DNA repair contexts. I characterized how insertion sequence length and two cellular DNA processing pathways affected the incorporation rate. I reaffirmed that DNA mismatch repair suppressed the insertion of shorter sequences and made the discovery that 3’ flap nucleases TREX1 and TREX2 suppressed the insertion of longer sequences. I further delineated the effects of nucleotide composition and secondary structure of the insertion sequence on editing rates.

Next, I targeted a prime editor to the high copy number LINE-1 retrotransposon to insert hundreds of recombinase sites into a single human genome. These engineered cell lines provided a latent substrate for large-scale genome randomization. After induction with Cre recombinase, I mapped thousands of deletions, inversions, extrachromosomal circular DNA, translocations, and fold- back inversions and tracked their abundance over time. Sequencing surviving variants and comparing them to early ones revealed strong selection pressures against creating non-segregable derivative chromosomes or deleting essential genes. However, it also demonstrated that haploid human cell lines could survive while losing megabases of DNA. I isolated 21 cell clones and linked variants to gene expression changes for three clones with multiple Cre-induced rearrangements.

Finally, I used prime editing to insert loxPsym sites into the regulatory region of the OTX2 developmental transcription factor. Cre recombinase induced stochastic deletions and inversions across the recombinase sites, and created diverse and novel enhancer arrangements. By endogenously fusing OTX2 with a fluorophore and sorting, I could associate alternative regulatory architectures with OTX2 expression and track changes in CpG methylation and chromatin accessibility. I discovered that three enhancers in a 20 kb cluster drove 50% of OTX2 expression and that moving the cluster closer to the transcription start site while simultaneously deleting intermediate regulatory elements resulted in strong OTX2 expression.

The strategies presented here to more efficiently insert short DNA sequences with prime editing, shuffle DNA, and rearrange regulatory regions give a fundamentally new approach to randomizing mammalian genomes which will open new avenues to go beyond the 1% of coding sequence and study the 99% of underexplored regions. The data garnered from molecular phenotyping of novel genome architectures after randomization will allow predictive models to learn parameters beyond the limited diversity of our DNA.

Description

Date

2024-01-15

Advisors

Parts, Leopold

Keywords

Gene regulation, Genome engineering, Genomics, Synthetic Biology, Synthetic Genomics

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge
Sponsorship
Jonas Koeppel was supported by Wellcome (grant no. 220540/Z/20/A, ‘Wellcome Sanger Institute Quinquennial Review 2021–2026’ and grant no. 206194)
Relationships
Is supplemented by: