Show simple item record

dc.contributor.authorAhmad, Saeed
dc.contributor.authorCharoenkwan, Phasit
dc.contributor.authorQuinn, Julian MW
dc.contributor.authorMoni, Mohammad Ali
dc.contributor.authorHasan, Md Mehedi
dc.contributor.authorLio', Pietro
dc.contributor.authorShoombuatong, Watshara
dc.date.accessioned2022-03-09T16:02:15Z
dc.date.available2022-03-09T16:02:15Z
dc.date.issued2022-03-08
dc.date.submitted2022-01-24
dc.identifier.issn2045-2322
dc.identifier.others41598-022-08173-5
dc.identifier.other8173
dc.identifier.urihttps://www.repository.cam.ac.uk/handle/1810/334814
dc.descriptionFunder: Mahidol University
dc.descriptionFunder: College of Arts, Media and Technology, Chiang Mai University
dc.descriptionFunder: Chiang Mai University
dc.descriptionFunder: Information Technology Service Center (ITSC) of Chiang Mai University
dc.description.abstractFast and accurate identification of phage virion proteins (PVPs) would greatly aid facilitation of antibacterial drug discovery and development. Although, several research efforts based on machine learning (ML) methods have been made for in silico identification of PVPs, these methods have certain limitations. Therefore, in this study, we propose a new computational approach, termed SCORPION, (StaCking-based Predictior fOR Phage VIrion PrOteiNs), to accurately identify PVPs using only protein primary sequences. Specifically, we explored comprehensive 13 different feature descriptors from different aspects (i.e., compositional information, composition-transition-distribution information, position-specific information and physicochemical properties) with 10 popular ML algorithms to construct a pool of optimal baseline models. These optimal baseline models were then used to generate probabilistic features (PFs) and considered as a new feature vector. Finally, we utilized a two-step feature selection strategy to determine the optimal PF feature vector and used this feature vector to develop a stacked model (SCORPION). Both tenfold cross-validation and independent test results indicate that SCORPION achieves superior predictive performance than its constitute baseline models and existing methods. We anticipate SCORPION will serve as a useful tool for the cost-effective and large-scale screening of new PVPs. The source codes and datasets for this work are available for downloading in the GitHub repository ( https://github.com/saeed344/SCORPION ).
dc.languageen
dc.publisherSpringer Science and Business Media LLC
dc.subjectArticle
dc.subject/631/114
dc.subject/631/114/2397
dc.subject/631/114/1305
dc.subjectarticle
dc.titleSCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins.
dc.typeArticle
dc.date.updated2022-03-09T16:02:14Z
prism.issueIdentifier1
prism.publicationNameSci Rep
prism.volume12
dc.identifier.doi10.17863/CAM.82246
dcterms.dateAccepted2022-03-03
rioxxterms.versionofrecord10.1038/s41598-022-08173-5
rioxxterms.versionVoR
rioxxterms.licenseref.urihttp://creativecommons.org/licenses/by/4.0/
dc.contributor.orcidLio, Pietro [0000-0002-0540-5053]
dc.identifier.eissn2045-2322
cam.issuedOnline2022-03-08


Files in this item

Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record