Understanding Genomes Through Engineered Structural Variation

Koeppel, Jonas

doi:https://doi.org/10.17863/CAM.107311

Understanding Genomes Through Engineered Structural Variation

Repository URI

https://www.repository.cam.ac.uk/handle/1810/366340

Repository DOI

https://doi.org/10.17863/CAM.107311

Files

Primary Thesis (36.24 MB)

Type

Thesis

Authors

Koeppel, Jonas

Abstract

Sequencing of the human genome has provided us with a detailed map of its content. While enormous progress has been made towards understanding the 1% of the human genome that is protein coding, we are still mostly in the dark about the function and relevance of the remaining 99%. Progress has been difficult because the non-coding genome is vast, the individual nucleotides hold less information, and we have lacked the tools to engineer and probe it to the necessary extent. This is beginning to change with the advent of ‘search and replace’ genome engineering technologies such as CRISPR prime editing. I leveraged the ability of prime editors to insert recognition sequences for recombinases at high throughput to engineer genomes at an unprecedented scale. In the process, I made discoveries about the biology of genome engineering, structural variation, and gene regulation.

I first outlined the determinants of short sequence insertion using prime editing by systematically measuring the frequency of insertion for 3,604 short sequences in four target sites of three human cell lines with varying DNA repair contexts. I characterized how insertion sequence length and two cellular DNA processing pathways affected the incorporation rate. I reaffirmed that DNA mismatch repair suppressed the insertion of shorter sequences and made the discovery that 3’ flap nucleases TREX1 and TREX2 suppressed the insertion of longer sequences. I further delineated the effects of nucleotide composition and secondary structure of the insertion sequence on editing rates.

Next, I targeted a prime editor to the high copy number LINE-1 retrotransposon to insert hundreds of recombinase sites into a single human genome. These engineered cell lines provided a latent substrate for large-scale genome randomization. After induction with Cre recombinase, I mapped thousands of deletions, inversions, extrachromosomal circular DNA, translocations, and fold- back inversions and tracked their abundance over time. Sequencing surviving variants and comparing them to early ones revealed strong selection pressures against creating non-segregable derivative chromosomes or deleting essential genes. However, it also demonstrated that haploid human cell lines could survive while losing megabases of DNA. I isolated 21 cell clones and linked variants to gene expression changes for three clones with multiple Cre-induced rearrangements.

Finally, I used prime editing to insert loxPsym sites into the regulatory region of the OTX2 developmental transcription factor. Cre recombinase induced stochastic deletions and inversions across the recombinase sites, and created diverse and novel enhancer arrangements. By endogenously fusing OTX2 with a fluorophore and sorting, I could associate alternative regulatory architectures with OTX2 expression and track changes in CpG methylation and chromatin accessibility. I discovered that three enhancers in a 20 kb cluster drove 50% of OTX2 expression and that moving the cluster closer to the transcription start site while simultaneously deleting intermediate regulatory elements resulted in strong OTX2 expression.

The strategies presented here to more efficiently insert short DNA sequences with prime editing, shuffle DNA, and rearrange regulatory regions give a fundamentally new approach to randomizing mammalian genomes which will open new avenues to go beyond the 1% of coding sequence and study the 99% of underexplored regions. The data garnered from molecular phenotyping of novel genome architectures after randomization will allow predictive models to learn parameters beyond the limited diversity of our DNA.

Date

2024-01-15

Advisors

Parts, Leopold

Keywords

Gene regulation, Genome engineering, Genomics, Synthetic Biology, Synthetic Genomics

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights and licensing

Except where otherwised noted, this item's license is described as Attribution 4.0 International (CC BY 4.0)

Sponsorship

Jonas Koeppel was supported by Wellcome (grant no. 220540/Z/20/A, ‘Wellcome Sanger Institute Quinquennial Review 2021–2026’ and grant no. 206194)

Relationships

Is supplemented by:

https://doi.org/doi:10.17989/ENCSR882CQE
https://doi.org/doi:10.17989/ENCSR460TQV
https://doi.org/doi:10.17989/ENCSR891YFI
https://doi.org/doi:10.17989/ENCSR121CWP
https://doi.org/doi:10.17989/ENCSR450JTP
https://doi.org/doi:10.17989/ENCSR895KTN
https://doi.org/doi:10.17989/ENCSR131DVD
https://doi.org/doi:10.17989/ENCSR000EJR
https://doi.org/doi:10.17989/ENCSR000DTW
https://doi.org/doi:10.1073/pnas.1518552112

Collections

Theses - Wellcome Sanger Institute

Understanding Genomes Through Engineered Structural Variation

Repository URI

Repository DOI

Files

Type

Change log

Authors

Abstract

Description

Date

Advisors

Keywords

Qualification

Awarding Institution

Rights and licensing

Sponsorship

Relationships

Collections