Phylogenetic inference using ancient environmental DNA
Ancient environmental DNA (aeDNA) has revolutionized our ability to describe and analyze biological communities in space and time by allowing for joint sequencing of entire ecosystems across thousands of years. However, because samples contain damaged, short fragments from multiple individuals or taxa, the field has been so far limited in its scope, and aeDNA has only been applied to population and phylogenetic studies in the last few years. In this thesis, I first build a theoretical coalescent framework to analyze error in supervised binning algorithms, which assign reads from environmental samples to individual taxa in a reference database. Under this framework, I determine the expected error rate under a wide range of parameters and the degradation in assignment accuracy as samples diverge from their closest reference sequence, and with incompleteness of reference sequences. Second, I describe a phylogenetic placement algorithm for non-recombining sequences such as mitochondria or chloroplast DNA, and apply this method to Mammuthus or mammoth and Equus or horse samples from an Arctic-wide aeDNA dataset spanning the last 50,000 years. This analysis demonstrates the potential existence of a previously undiscovered clade of mammoths, and extends the survival of an existing clade. Next, I report one of the first whole genome ancient environmental DNA studies, using DNA extracted from 14-16,000 year old cave soil with material from two closely related species, Ursus arctos or the American black bear and Arctodus simus or the extinct giant short-faced bear. By comparing the ancient sequence against a modern reference panel of black bears and a high-quality fossil giant short-faced bear reference, I infer evolutionary relationships between the Late Pleistocene populations and their modern relatives. Lastly, I molecularly date an ancient environmental Betula or birch tree chloroplast sequence from Northern Greenland, confirming that it was approximately 2 million years old, the oldest DNA to be successfully sequenced so far. All together, this work demonstrates the ability to infer phylogenies and population histories of individual taxa from ancient environmental DNA.