SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins.
Authors
Ahmad, Saeed
Charoenkwan, Phasit
Quinn, Julian MW
Moni, Mohammad Ali
Hasan, Md Mehedi
Lio', Pietro
Shoombuatong, Watshara
Publication Date
2022-03-08Journal Title
Sci Rep
ISSN
2045-2322
Publisher
Springer Science and Business Media LLC
Volume
12
Issue
1
Language
en
Type
Article
This Version
VoR
Metadata
Show full item recordCitation
Ahmad, S., Charoenkwan, P., Quinn, J. M., Moni, M. A., Hasan, M. M., Lio', P., & Shoombuatong, W. (2022). SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins.. Sci Rep, 12 (1) https://doi.org/10.1038/s41598-022-08173-5
Description
Funder: Mahidol University
Funder: College of Arts, Media and Technology, Chiang Mai University
Funder: Chiang Mai University
Funder: Information Technology Service Center (ITSC) of Chiang Mai University
Abstract
Fast and accurate identification of phage virion proteins (PVPs) would greatly aid facilitation of antibacterial drug discovery and development. Although, several research efforts based on machine learning (ML) methods have been made for in silico identification of PVPs, these methods have certain limitations. Therefore, in this study, we propose a new computational approach, termed SCORPION, (StaCking-based Predictior fOR Phage VIrion PrOteiNs), to accurately identify PVPs using only protein primary sequences. Specifically, we explored comprehensive 13 different feature descriptors from different aspects (i.e., compositional information, composition-transition-distribution information, position-specific information and physicochemical properties) with 10 popular ML algorithms to construct a pool of optimal baseline models. These optimal baseline models were then used to generate probabilistic features (PFs) and considered as a new feature vector. Finally, we utilized a two-step feature selection strategy to determine the optimal PF feature vector and used this feature vector to develop a stacked model (SCORPION). Both tenfold cross-validation and independent test results indicate that SCORPION achieves superior predictive performance than its constitute baseline models and existing methods. We anticipate SCORPION will serve as a useful tool for the cost-effective and large-scale screening of new PVPs. The source codes and datasets for this work are available for downloading in the GitHub repository ( https://github.com/saeed344/SCORPION ).
Keywords
Article, /631/114, /631/114/2397, /631/114/1305, article
Identifiers
s41598-022-08173-5, 8173
External DOI: https://doi.org/10.1038/s41598-022-08173-5
This record's URL: https://www.repository.cam.ac.uk/handle/1810/334814
Rights
Licence:
http://creativecommons.org/licenses/by/4.0/
Statistics
Total file downloads (since January 2020). For more information on metrics see the
IRUS guide.