1 
 
Errors & Clarifications 
Introduction 
This document serves to correct typographical errors and clarify potentially misleading aspects of 
the text of the following PhD thesis.  
"Computational approaches to predicting drug induced toxicity" 
Richard Liam Marchese Robinson, King's College 
University of Cambridge, 2013 
http://www.dspace.cam.ac.uk/handle/1810/244242 
Where appropriate, corrections are proposed to the existing text, as indicated in blue font. 
The issues documented in this report are subdivided into the following sections (Table of Contents). 
N.B. This document has not undergone any form of peer review. 
Richard Marchese Robinson 
 
2 
 
Table of Contents 
Introduction ............................................................................................................................................ 1 
Typographical errors in the main text .................................................................................................... 3 
Errors in citations .................................................................................................................................... 3 
Errors in references ................................................................................................................................. 4 
Errors in the analysis ............................................................................................................................... 4 
 
3 
 
Typographical errors in the main text 
1. On p.49, the definition of the covariance matrix is incomplete. The following sentence 
should be replaced. 
a. Old sentence: “The principal components (PCs) are the M eigenvectors of the 
covariance matrix (𝑋𝑇𝑋), computed from the N×M matrix (X) with elements Xim 
denoting the value of the mth descriptor for the ith molecule.” 
b. New sentence: The principal components (PCs) correspond to the M eigenvectors 
of the matrix 𝑋𝑇𝑋, computed from the N×M matrix (X) with elements Xim 
corresponding to the mean centred value of the mth descriptor for the ith 
molecule. 
2. On p.49, “Shusko et al.” should read “Sushko et al.” 
3. On p.68, "See Chapter 2, section 2.5" should say "See Chapter 2, section 2.3." 
4. On p.90, the following sentence should have read as follows – reflecting the fact that the 
results obtained with some modelling approaches were summarised as the mean 
performance, in terms of the MCC, across multiple different models obtained using a 
given modelling approach with different random selections carried out during the model 
building phase. N.B. In some cases, for short hand, the text may discuss the performance 
of “models” in terms of the mean MCC, whereas this should – strictly speaking – refer to 
the performance of “modelling approaches”. 
a. Old sentence: “The range of MCC values obtained on the ‘external’ validation 
sets, with either feature selection protocol, is comparable with those previously 
reported in the literature (Appendix B, Table B.1).” 
b. New sentence: “The range of (mean) MCC values obtained on the ‘external’ 
validation sets, with either feature selection protocol, is comparable with those 
previously reported in the literature (Appendix B, Table B.1).” 
5. On p.93, the caption for Table 4.3 should have read as follows. 
a. Old: “MCC values obtained on ‘external’ test sets for all partitions of the Thai-313 
and Dubus-203 datasets used to evaluate this author’s modelling procedures. See 
Table 4.2 for presentational details.” 
b. New: “(Mean) MCC values obtained on ‘external’ test sets for all partitions of the 
Thai-313 and Dubus-203 datasets used to evaluate this author’s modelling 
procedures. See Table 4.2 for presentational details.” 
6. On p. 130, “corresponding to the geometric mean of their descriptor vectors” should state 
“corresponding to the arithmetic mean of their descriptor vectors”. 
 
Errors in citations 
(1) The wrong reference is cited in the following footnote on p.42. 
a. Old: “Alternatively, this matrix being an example of a "contingency table",94 the 
"contingency matrix".230” 
b. New: “Alternatively, this matrix being an example of a "contingency table",233 the 
"contingency matrix".230” 
4 
 
(2) The wrong reference is cited for Toxtree on p.55. 
a. Old: “…freely available Toxtree (version 1.51)159–161” 
b. New: “…freely available Toxtree (version 1.51)162,163” 
(3) The wrong reference is cited for Toxtree on p.183. 
a. Old: “In Chapter 3, a consensus model for mutagenicity/carcinogenicity was 
developed by combining the output generated by two commonly employed 
predictive toxicology programs: Toxtree160 and Derek for WindowsTM.271” 
b. New: “In Chapter 3, a consensus model for mutagenicity/carcinogenicity was 
developed by combining the output generated by two commonly employed 
predictive toxicology programs: Toxtree162 and Derek for WindowsTM.271” 
(4) A reference was missing from the following sentence on p.152. 
a. Old: “A variety of Machine Learning methods, and feature selection strategies were 
considered in these studies - although, in most cases, the same descriptors were 
employed as per the original publications.124–127,199” 
a. New: “A variety of Machine Learning methods, and feature selection strategies were 
considered in these studies - although, in most cases, the same descriptors were 
employed as per the original publications.123–127,199” 
Errors in references 
1. Reference (405). 
a. Old: “Shanle, E. K.; Xu, W. Chem. Res. Toxicol. 2010, 24, 6–19.” 
b. New: “Shanle, E. K.; Xu, W. Chem. Res. Toxicol. 2011, 24, 6–19.” 
Errors in the analysis 
1. The following explanation of the meaning of the p-values presented in the 
corresponding text in the footnote starting on p.90 and ending on p.91 is incorrect. 
However, this does not affect the p-values presented or the description of how they 
were calculated. In part, this footnote contains a typographical error (blue text) and, in 
part, the reasoning employed here is problematic (red text). 
a. Old footnote: “The uncorrected p-values were calculated (see Chapter 2, section 
2.6.4.1) from the MCC value (or mean value for the pseudo-stochastic methods – 
Winnow, RF, QuaSAR-Classify) obtained when training and testing the selected 
modelling approaches on a given train/test partition. They denote the conditional 
probability of obtaining a (mean) MCC value with at least the magnitude observed 
on a given test set, supposing a random predictor had been built on the 
corresponding training set. The Bonferroni correction provides an upper bound, 
equal to the value below which the corrected p-values are deemed statistically 
significant, to the conditional probability, supposing an unknown number of 
approaches performed like random predictors on average, of erroneously declaring 
that any model (or random selection from a set of corresponding models for the 
pseudo-stochastic algorithms) built, in this work, on one of the training sets would 
perform no differently, in terms of the mean MCC across various test sets, to a 
random predictor - based on the (mean) MCC value observed on the single 
corresponding test set.” 
5 
 
b. Typographical correction: “The uncorrected p-values were calculated (see Chapter 
2, section 2.6.4.1) from the MCC value (or mean value for the pseudo-stochastic 
methods – Winnow, RF, QuaSAR-Classify) obtained when training and testing the 
selected modelling approaches on a given train/test partition. They denote the 
conditional probability of obtaining a (mean) MCC value with at least the magnitude 
observed on a given test set, supposing a random predictor had been built on the 
corresponding training set. The Bonferroni correction provides an upper bound, 
equal to the value below which the corrected p-values are deemed statistically 
significant, to the conditional probability, supposing an unknown number of 
approaches performed like random predictors on average, of erroneously declaring 
that any model (or random selection from a set of corresponding models for the 
pseudo-stochastic algorithms) built, in this work, on one of the training sets would 
perform differently, in terms of the mean MCC across various test sets, to a random 
predictor - based on the (mean) MCC value observed on the single corresponding 
test set.” 
c. Errors in this analysis:  
i. For a single MCC value corresponding to a given train/test partition, 
obtained using a single model rather than the average across multiple runs 
of a pseudo-stochastic modelling approach, the corresponding uncorrected 
p-value denotes the conditional probability of obtaining an MCC value - on 
any test set - with at least the magnitude observed on the given test set, 
supposing a random predictor had been built on the corresponding training 
set. If the p-value is less than a pre-defined limit, the observed performance 
may be declared to be statistically significantly different to a random 
predictor. However, it is suggested in this thesis that the p-value calculated 
from the mean MCC value obtained for a single train/test partition for a 
pseudo-stochastic modelling approach, which generates different models 
for a given training set, would allow the average performance of that 
modelling approach, applied to that training set, to be declared statistically 
significantly different to the performance expected with an approach which 
only generated random predictors on that training set. This reasoning may 
not be mathematically valid.  
ii. In general, for a collection of models, the Bonferroni correction provides an 
upper bound, equal to the value below which the corrected p-values are 
deemed statistically significant, to the conditional probability, supposing an 
unknown number of models were random predictors, of erroneously 
declaring that any model built on one of the training sets would perform 
differently to a random predictor - based on the single MCC value observed 
on the corresponding test set. In the current context, this upper bound was 
also supposed to hold for the conditional probability of erroneously 
declaring that one of the pseudo-stochastic methods – when trained 
repeatedly on a given training set – would perform differently to a method 
which only generated random predictors on that training set. Again, this 
reasoning may not be mathematically valid. 
6 
 
2. The statistical analysis carried out in chapters 5 and 6 (see sections 5.3.5 and 6.4.3) was 
flawed and this may affect the remarks regarding statistical significance and the lack of clear 
differences in method performance commented upon in the abstract of this thesis. The most 
appropriate means for testing the two null-hypotheses presented in section 5.3.5 remains 
unclear, as was originally noted in section 5.3.5. However, in hindsight, the following 
additional comments may be made:  
a. The proposed ad hoc approaches for evaluating the statistical significance of 
(differences between) overall mean values of performance metrics, averaged across 
different possible train/test partitions and RNG seeds, are logically flawed e.g. 
i. The variability associated with different train/test partitions, obtained using 
a single RNG seed, should not have been evaluated as the statistical 
significance associated with the overall mean performance was of interest. 
ii. By generating multiple p-values, in a stepwise approach, when nominally 
evaluating the statistical significance associated with a single overall mean 
performance metric value– or a single pairwise difference between two 
overall mean performance metric values – this will have artificially inflated 
the number of p-values and hence skewed the corresponding p-value 
adjustments according to Benjamini and Yekutieli's method. 
b. Multiple hypothesis corrections for Chapter 5 should not have double counted 
pairwise comparisons between 2D descriptors on different hERG datasets that were 
only different due to different conformations.