Repository logo
 

Insights into the genomic histories of diverse human populations using whole-genome sequencing analysis.


Type

Thesis

Change log

Authors

Almarri, Mohamed 

Abstract

Despite the progress in sampling many populations, human genomics research is still not fully reflective of the diversity found globally. Understudied populations limit our knowledge of genetic variation and population history, and their inclusion is needed to ensure they benefit from future developments in genomic medicine. In this thesis, I describe extending our understanding of global genetic diversity and population history by two main projects. The first is focused on structural variation in a diverse set of 54 human populations which are part of the Human Genome Diversity Project (HGDP-CEPH) panel. Using whole-genome sequences previously produced at the Wellcome Sanger Institute, I generated a comprehensive catalogue of structural variation identifying a total of 126,018 variants, of which 78% are novel. Some reach high frequency and are private to continental groups or even individual populations, including regionally-restricted runaway duplications and putatively introgressed variants from archaic hominins. By de novo assembly of 25 genomes using linked-read sequencing, I discovered 1643 breakpoint-resolved unique insertions, in aggregate accounting for 1.9 Mb of sequence absent from the GRCh38 reference genome, highlighting the limitation of a single human reference genome. In the second project I collected and analysed a dataset of 137 high-coverage physically-phased genome sequences from eight Middle Eastern populations using linked-read sequencing. Focusing on the population history using single nucleotide variants, I found no genetic traces of archeologically documented early expansions out-of-Africa in present-day populations in the region. I show that Arabian populations have the lowest Neanderthal ancestry of all non-African populations tested, which is explained by them having elevated Basal Eurasian ancestry. By comparing Levantines and Arabian historical population sizes, I find a divergence that starts before the Neolithic era, when Levantines expanded while Arabians maintained small populations that could have derived ancestry from local epipaleolithic hunter-gatherers. All populations suffered a bottleneck overlapping the archaeologically-documented aridification events, with Arabians decreasing in size with the onset of the desert climate in Arabia ~6 kya while the Levantine bottleneck overlaps the 4.2 kiloyear aridification event. I also identify an ancestry that is associated with the spread of Semitic languages across the region during the Bronze Age. Finally, I identify novel variants that show evidence of selection, including signals of polygenic selection. This thesis fills an important gap in the study of diverse human populations, although further work is needed to sequence and characterize additional genetically underrepresented groups.

Description

Date

2021-01-31

Advisors

Tyler-Smith, Chris
Hurles, Matthew

Keywords

Arabia, Middle East, Levant, Basal Eurasian, Selection, Demographic history, HGDP, Human Genome Diversity Project, Structural Variation, Adaptation

Qualification

Awarding Institution

University of Cambridge
Sponsorship
Government of Dubai - Dubai Police GHQ