Show simple item record

dc.contributor.authorAllen, Chaden
dc.contributor.authorKoutsoukas, Alexiosen
dc.contributor.authorCortés-Ciriano, Isidroen
dc.contributor.authorMurrell, Daniel Sen
dc.contributor.authorMalliavin, Thérèse Een
dc.contributor.authorGlen, Robert Cen
dc.contributor.authorBender, Andreasen
dc.identifier.citationC. H. G. Allen et al. Toxicology Research (2016). DOI:10.1039/C5TX00406Cen
dc.description.abstractPrediction of compound toxicity is essential because covering the vast chemical space requiring safety assessment using traditional experimentally-based, resource-intensive techniques is impossible. However, such prediction is nontrivial due to the complex causal relationship between compound structure and in vivo harm. Protein target annotations and in vitro experimental outcomes encode relevant bioactivity information complementary to chemicals’ structures. This work tests the hypothesis that utilizing three complementary types of data will afford predictive models that outperform traditional models built using fewer data types. A tripartite, heterogeneous descriptor set for 367 compounds was comprised of (a) chemical descriptors, (b) protein target descriptors generated using an algorithm trained on 190 000 ligand–protein interactions from ChEMBL, and (c) descriptors derived from in vitro cell cytotoxicity dose–response data from a panel of human cell lines. 100 random forests classification models for predicting rat LD₅₀ were built using every combination of descriptors. Successive integration of data types improved predictive performance; models built using the full dataset had an average external correct classification rate of 0.82, compared to 0.73–0.80 for models built using two data types and 0.67–0.78 for models built using one. Pairwise comparisons of models trained on the same data showed that including a third data domain on top of chemistry improved average correct classification rate by 1.4–2.4 points, with p-values <0.01. Additionally, the approach enhanced the models’ applicability domains and proved useful for generating novel mechanism hypotheses. The use of tripartite heterogeneous bioactivity datasets is a useful technique for improving toxicity prediction. Both protein target descriptors – which have the practical value of being derived in silico – and cytotoxicity descriptors derived from experiment are suitable contributors to such datasets.
dc.description.sponsorshipWe thank Alexander Sedykh, Ivan Rusyn and Alexander Tropsha (University of North Carolina – Chapel Hill) for providing the chemical and qHTS data used in this study. We also thank the European Chemical Industry Council Long-range Research Initiative (CEFIC-LRI) for funding (via the LRI Innovative Science Award 2012 to AB). ICC thanks the Pasteur-Paris International PhD Programme for funding. ICC and TM thank Institut Pasteur for funding. AB and DSM thank Unilever and the European Research Commission (Starting Grant ERC-2013-StG 336159 MIXTURE) for funding.
dc.publisherRoyal Society of Chemistry
dc.rightsAttribution 2.0 UK: England & Wales*
dc.titleImproving the Prediction of Organism-level Toxicity through Integration of Chemical, 2 Protein Target and Cytotoxicity qHTS Dataen
dc.description.versionThis is the final version of the article. It first appeared from Wiley via
prism.publicationNameToxicology Researchen
dc.contributor.orcidAllen, Chad [0000-0001-7289-6529]
dc.contributor.orcidBender, Andreas [0000-0002-6683-7546]
rioxxterms.typeJournal Article/Reviewen
pubs.funder-project-idEuropean Research Council (336159)

Files in this item


This item appears in the following Collection(s)

Show simple item record

Attribution 2.0 UK: England & Wales
Except where otherwise noted, this item's licence is described as Attribution 2.0 UK: England & Wales