Reconstructing Past Population Dynamics and Movements from Genomic and Environmental Data
Repository URI
Repository DOI
Change log
Authors
Abstract
Understanding how past changes shaped today’s global ecology is one of the central topics of evolutionary studies. Difficulty in obtaining direct evidence, such as fossils and remains, has greatly limited our ability to understand the process. The last decade has witnessed great advances that facilitate better reconstruction of the past; in terms of data, sequencing techniques, especially ancient DNA, provide direct evidence of the genetic history of the sample. Better modelling techniques also allow us to use larger scale data and integrate with other lines of evidence, like global climate reconstructions, to reconstruct the past using ecological models. This thesis aims to explore new computational methods to reconstruct the past of different species with various data availability, and to answer the ecological questions behind those. Ancient genomics has been a powerhouse for human history studies, but the expanding size and complexity of datasets have become major challenges for analysis and interpretation. In the second chapter, I studied how to reconstruct human genetic history with Biobank-level genomic data, a challenge of both scalability and reliability. I built a new framework, NORNE, for detecting individual genetic connectivity from unphased genotype data with high scalability. I show that NORNE can detect fine-scale genetic structure with a resolution similar to state-of-the-art methods whilst being much more effective, enabling it to analyse extremely large datasets. Using NORNE on a dataset of ancient DNA from over 6,000 humans across the globe, I show that NORNE can capture broad and fine-scale genetic structure of the past. Furthermore, with geographical and temporal information, NORNE captures connections indicating major migration events and mobility change in the past. I then show how NORNE can be used to reveal individual-level migration histories over the past 6,000 years with case studies for the UK, Hungary and the Roopkund lake. New methods in human genetic studies, though, can also spark controversies if used improperly. In the second chapter, I examined a popular non-linear dimension reduction tool, UMAP, in the context of high-dimensional biological data. UMAP has been widely used for inferring population structure and cell clusters since its development, but its mathematical principle is non-trivial for most biological scientists, and often leads to misuse and misinterpretations, sometimes even controversies. I dissected each step of UMAP, with extensive illustrations and experiments to show how it works and potential pitfalls at each step. How environmental factors contribute to differentiation has been a long-standing question in evolution. The next two chapters aim to illustrate two examples for this question by modelling past species dynamics with climate, geography and genetics data with Climate Informed Spatial Genetics Models (CISGeM). In the Chapter 4, I examined the role of climate in shaping the ecological history of leopards. One of the widest distributed large Felidae, leopards show stark differences in the degree of subspecies separation in Asia and Africa, the former continent being home to 7 of 8 subspecies and the latter to only one, despite similar land area. The extinction of European leopard subspecies has also been hypothesized to be associated with climate change during the Last Glacial Maximum, yet there have been no formal tests of this hypothesis. Here I explore the drivers behind these phenomena using climate and genetics reconstructions. I found that climate is one of the main drivers creating this different speciation pattern. Stable African climate has enabled stable migration of leopards, whilst in Asia climate fluctuations have created multiple isolated refugia that facilitate differentiation into subspecies. I also found climate fluctuation in the late Pleistocene to be sufficient to explain the extinction of European leopards during the last glaciation. In the last chapter, I used CISGeM to examine how climate has shaped the landscape of another large Felidae - tiger. First, by revisiting previous genetic modelling results, I show that genetic data alone are not able to identify a single demographic history for tigers (an example of equifinality despite a large genomic dataset). I then used CISGeM, which combines genetics, ecological niche models and paleoclimates, to reconstruct the demography of tigers. I found that, similar to African leopards, the structure of extant tigers is weak due to strong connectivity caused by stable climate conditions. Importantly, I show that the extinct Caspian tigers were shaped by recurrent gene flow from both India and Northeast Asia, an example of complex metapopulation dynamics that could not be inferred from the simpler genetic models routinely used to analyse this sort of data. In conclusion, this thesis explores new computational options for reconstructing the past dynamics of species with genomic and climate data to answer some of the most challenging questions in evolution and ecology. In the light of both expanding data modality and data size, these advances can provide foundations for a better understanding of the past.
