Investigation of protein family relationships with deep learning.
Published version
Peer-reviewed
Repository URI
Repository DOI
Type
Change log
Authors
Abstract
MOTIVATION: In this article, we propose a method for finding similarities between Pfam families based on the pre-trained neural network ProtENN2. We use the model ProtENN2 per-residue embeddings to produce new high-dimensional per-family embeddings and develop an approach for calculating inter-family similarity scores based on these embeddings, and evaluate its predictions using structure comparison. RESULTS: We apply our method to Pfam annotation by refining clan membership for Pfam families, suggesting both new members of existing clans and potential new clans for future Pfam releases. We investigate some of the failure modes of our approach, which suggests directions for future improvements. Our method is relatively simple with few parameters and could be applied to other protein family classification models. Overall, our work suggests potential benefits of employing deep learning for improving our understanding of protein family relationships and functions of previously uncharacterized families. AVAILABILITY AND IMPLEMENTATION: github.com/iponamareva/ProtCNNSim, 10.5281/zenodo.10091909.
Description
Funder: EMBL; doi: https://doi.org/10.13039/100013060
Journal Title
Conference Name
Journal ISSN
2635-0041

