Repository logo
 

Investigation of protein family relationships with deep learning.

Published version
Peer-reviewed

Repository DOI


Change log

Abstract

MOTIVATION: In this article, we propose a method for finding similarities between Pfam families based on the pre-trained neural network ProtENN2. We use the model ProtENN2 per-residue embeddings to produce new high-dimensional per-family embeddings and develop an approach for calculating inter-family similarity scores based on these embeddings, and evaluate its predictions using structure comparison. RESULTS: We apply our method to Pfam annotation by refining clan membership for Pfam families, suggesting both new members of existing clans and potential new clans for future Pfam releases. We investigate some of the failure modes of our approach, which suggests directions for future improvements. Our method is relatively simple with few parameters and could be applied to other protein family classification models. Overall, our work suggests potential benefits of employing deep learning for improving our understanding of protein family relationships and functions of previously uncharacterized families. AVAILABILITY AND IMPLEMENTATION: github.com/iponamareva/ProtCNNSim, 10.5281/zenodo.10091909.

Description

Journal Title

Bioinform Adv

Conference Name

Journal ISSN

2635-0041
2635-0041

Volume Title

4

Publisher

Oxford University Press (OUP)

Rights and licensing

Except where otherwised noted, this item's license is described as Attribution 4.0 International
Sponsorship
Simons Foundation (598399)