Repository logo
 

Power Law Tails In Phylogenetic Systems

Accepted version
Peer-reviewed

Type

Article

Change log

Authors

Colwell, LJ 
Qin, Chongli 

Abstract

Covariance analysis of protein sequence alignments uses coevolving pairs of sequence positions to predict features of protein structure and function. However, current methods ignore the phylogenetic relationships between sequences, potentially corrupting the identification of covarying positions. Here, we use random matrix theory to demonstrate the existence of a power law tail that distinguishes the spectrum of covariance caused by phylogeny from that caused by structural interactions. The power law is essentially independent of the phylogenetic tree topology, depending on just two parameters - the sequence length, and the average branch length. We demonstrate that these power law tails are ubiquitous in the large protein sequence alignments used to predict contacts in 3D structure, as predicted by our theory. This suggests that to decouple phylogenetic effects from the interactions between sequence distal sites that control biological function, it is necessary to remove or downweight the eigenvectors of the covariance matrix with largest eigenvalues. We confirm that truncating these eigenvectors improves contact prediction.

Description

Keywords

power law, sequence covariance, phylogeny, protein, structure prediction

Journal Title

Proceedings of the National Academy of Sciences of the United States of America

Conference Name

Journal ISSN

0027-8424
1091-6490

Volume Title

Publisher

National Academy of Sciences
Sponsorship
European Commission (631609)
This work was supported by a Next Generation fellowship (to L.J.C.), a Marie Curie Career Integration Grant [Evo-Couplings, Grant 631609], and an Engineering and Physical Sciences Research Council PhD studentship (to C.Q.).