Power Law Tails In Phylogenetic Systems
Proceedings of the National Academy of Sciences of the United States of America
National Academy of Sciences
MetadataShow full item record
Colwell, L., & Qin, C. (2018). Power Law Tails In Phylogenetic Systems. Proceedings of the National Academy of Sciences of the United States of America https://doi.org/10.1073/pnas.1711913115
Covariance analysis of protein sequence alignments uses coevolving pairs of sequence positions to predict features of protein structure and function. However, current methods ignore the phylogenetic relationships between sequences, potentially corrupting the identification of covarying positions. Here, we use random matrix theory to demonstrate the existence of a power law tail that distinguishes the spectrum of covariance caused by phylogeny from that caused by structural interactions. The power law is essentially independent of the phylogenetic tree topology, depending on just two parameters - the sequence length, and the average branch length. We demonstrate that these power law tails are ubiquitous in the large protein sequence alignments used to predict contacts in 3D structure, as predicted by our theory. This suggests that to decouple phylogenetic effects from the interactions between sequence distal sites that control biological function, it is necessary to remove or downweight the eigenvectors of the covariance matrix with largest eigenvalues. We confirm that truncating these eigenvectors improves contact prediction.
power law, sequence covariance, phylogeny, protein, structure prediction
This work was supported by a Next Generation fellowship (to L.J.C.), a Marie Curie Career Integration Grant [Evo-Couplings, Grant 631609], and an Engineering and Physical Sciences Research Council PhD studentship (to C.Q.).
European Commission (631609)
Embargo Lift Date
External DOI: https://doi.org/10.1073/pnas.1711913115
This record's URL: https://www.repository.cam.ac.uk/handle/1810/270165