Target Fishing: A Single-Label or Multi-Label Problem?
MetadataShow full item record
Afzal, A., Mussa, H., Turner, R., Bender, A., & Glen, R. Target Fishing: A Single-Label or Multi-Label Problem?. arXiv https://arxiv.org/abs/1411.6285
According to Cobanoglu et al and Murphy, it is now widely acknowledged that the single target paradigm (one protein or target, one disease, one drug) that has been the dominant premise in drug development in the recent past is untenable. More often than not, a drug-like compound (ligand) can be promiscuous - that is, it can interact with more than one target protein. In recent years, in in silico target prediction methods the promiscuity issue has been approached computationally in different ways. In this study we confine attention to the so-called ligand-based target prediction machine learning approaches, commonly referred to as target-fishing. With a few exceptions, the target-fishing approaches that are currently ubiquitous in cheminformatics literature can be essentially viewed as single-label multi-classification schemes; these approaches inherently bank on the single target paradigm assumption that a ligand can home in on one specific target. In order to address the ligand promiscuity issue, one might be able to cast target-fishing as a multi-label multi-class classification problem. For illustrative and comparison purposes, single-label and multi-label Naive Bayes classification models (denoted here by SMM and MMM, respectively) for target-fishing were implemented. The models were constructed and tested on 65,587 compounds and 308 targets retrieved from the ChEMBL17 database. SMM and MMM performed differently: for 16,344 test compounds, the MMM model returned recall and precision values of 0.8058 and 0.6622, respectively; the corresponding recall and precision values yielded by the SMM model were 0.7805 and 0.7596, respectively. However, at a significance level of 0.05 and one degree of freedom McNemar test performed on the target prediction results returned by SMM and MMM for the 16,344 test ligands gave a chi-squared value of 15.656, in favour of the MMM approach.
q-bio.BM, q-bio.BM, cs.LG, stat.ML
External link: https://arxiv.org/abs/1411.6285
This record's URL: https://www.repository.cam.ac.uk/handle/1810/263062