Towards quantifying the uncertainty in in silico predictions using Bayesian learning

Allen, TEH; Middleton, AM; Goodman, JM; Russell, PJ; Kukic, P; Gutsell, S

Towards quantifying the uncertainty in in silico predictions using Bayesian learning

Accepted version

Peer-reviewed

Repository URI

https://www.repository.cam.ac.uk/handle/1810/336947

Repository DOI

https://doi.org/10.17863/CAM.84370

Files

Accepted version (3.63 MB)

Type

Article

Authors

Allen, TEH

Middleton, AM

Goodman, JM

Russell, PJ

Kukic, P

Show 1 more

Abstract

Next-generation risk assessment (NGRA) involves the combination of in vitro and in silico models for more human-relevant, ethical, and sustainable human chemical safety assessment. NGRA requires a quantitative mechanistic understanding of the effects of chemicals across human biology (be they molecular, cellular, organ-level or higher) coupled with a quantitative understanding of the uncertainty in any experimentally measured or predicted values. These values with their uncertainties can then be considered as a probability distribution, which can then be compared to exposure estimates to establish the presence or absence of a margin of safety. We have constructed Bayesian learning neural networks to provide such quantitative predictions and uncertainties for 20 pharmacologically important human molecular initiating events. These models produce high quality quantitative estimates (p(IC50), p(EC50), p(Ki), p(Kd)) of biochemical activity at a molecular initiating event (MIE) with average mean absolute errors (in Log units) of 0.625 ± 0.048 in test data and 0.941 ± 0.215 in external validation data. The key advantage of these models is their ability to also produce standard deviations and credible intervals (CIs) to quantify the uncertainty in these predictions, which we show to be able to distinguish between molecules close to the training data in chemical structure, those less similar to the training data, and decoy compounds drawn from the wider ChEMBL database. These uncertainty values mean that when a prediction is made a user can understand the certainty of the prediction, similar to a quantitative applicability domain, aiding prediction usefulness in NGRA. The ability for in silico methods to produce quantitative predictions with these kinds of probability distributions will be vital to their further use in NGRA, and here clear first steps have been taken.

Keywords

3404 Medicinal and Biomolecular Chemistry, 34 Chemical Sciences, Clinical Research, Bioengineering, Generic health relevance

Journal Title

Computational Toxicology

Journal ISSN

2468-1113

Publisher

Elsevier BV

Publisher DOI

https://doi.org/10.1016/j.comtox.2022.100228

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International

Collections

Cambridge University Research Outputs