Department of Statistical Sciences, University of Cape Town, Rondebosch, 7701, South Africa
MRC Biostatistics Unit, Institute of Public Health, Cambridge, UK
Abstract
Background
Metaanalysis typically involves combining the estimates from independent studies in order to estimate a parameter of interest across a population of studies. However, outliers often occur even under the random effects model. The presence of such outliers could substantially alter the conclusions in a metaanalysis. This paper proposes a methodology for identifying and, if desired, downweighting studies that do not appear representative of the population they are thought to represent under the random effects model.
Methods
An outlier is taken as an observation (study result) with an inflated random effect variance. We used the likelihood ratio test statistic as an objective measure for determining whether observations have inflated variance and are therefore considered outliers. A parametric bootstrap procedure was used to obtain the sampling distribution of the likelihood ratio test statistics and to account for multiple testing. Our methods were applied to three illustrative and contrasting metaanalytic data sets.
Results
For the three metaanalytic data sets our methods gave robust inferences when the identified outliers were downweighted.
Conclusions
The proposed methodology provides a means to identify and, if desired, downweight outliers in metaanalysis. It does not eliminate them from the analysis however and we consider the proposed approach preferable to simply removing any or all apparently outlying results. We do not however propose that our methods in any way replace or diminish the standard random effects methodology that has proved so useful, rather they are helpful when used in conjunction with the random effects model.
Background
Metaanalysis typically involves combining summary information from related but independent studies in order to estimate an overall treatment effect. One problem in metaanalysis concerns the presence of outlying studies whose results can excessively influence parameter estimates. If some results appear unusual then it is of course appropriate to check these for errors; in particular if estimates and standard errors of treatment effects have been confused with other quantities then the results from the metaanalysis will be erroneous. We assume here that despite the careful cleaning of data, some study results still appear unusual and are considered potential outliers. In some instances, the identification and understanding of the reason for unusual study results can in itself lead to further understanding of the subject area. With the usual metaanalysis aim of estimating the overall treatment effect in mind, if a small proportion of studies are not in fact truly representative of the population of interest for some unknown reason then their inclusion in the analysis can have unfortunate implications for the resulting inferences. In order to address this type of issue, the Cochrane Collaboration have developed their Risk of Bias tool
Despite the usefulness of such investigations, it is sometimes the case that some study results are apparent outliers whose cause is unclear. In such instances there is the natural concern that if these study results are included in the analysis but they are not truly representative of the population of interest then misleading inference is almost inevitable. Excluding trial results based on their findings is another source of bias however and the presence of unexplained and unusual study results places the statistician in an uncomfortable situation. Outliers in regression analysis have been widely researched
In this paper we propose a random effects variance shift outlier model (RVSOM). This model initially allows the identification of any apparent outliers under the standard random effects model for metaanalysis. This approach is useful because the identification of outliers is in itself problematic. For example, a very large study may initially appear consistent with the rest of the data, but on closer inspection it might provide an estimate much further from the pooled estimate than we expect by chance. Under such circumstances, it may be that this study is an outlier, rather than a smaller study with a more extreme point estimate, which might at first glance be a more obvious candidate.
If any study or studies are identified as outliers, then the RVSOM model further allows their downweighting. The extent of this downweighting depends on how unusual the outliers appear to be, and takes into account the studies' estimated treatment effect and the variance structure. This approach can be used to complement the more usual random effects analysis as a secondary or sensitivity analysis and we consider this approach more satisfactory than performing analyses that simply omit any outliers although these analyses may also be performed if desired. The model underlying the RVSOM was also considered by
The paper is set out as follows. Firstly, we present a random effects variance shift outlier model RVSOM for the random effects model for metaanalytic data. Secondly, we show how this model may be used in practice. In particular, we propose a parametric bootstrap procedure to generate the empirical distribution of the resulting likelihood ratio test statistics and to account for multiple testing using these statistics. The proposed approach is then applied to three metaanalytic data sets, two of which come from the Cochrane Collaboration. Finally, we conclude the paper with a discussion.
Methods
A RVSOM in metaanalysis
The standard random effects model
We base our modelling on the standard random effects model
where
with the variance of the
We will obtain inferences using restricted maximum likelihood estimation (REML). This is a standard approach in the context of metaanalysis
where
The associated REML estimate of the betweenstudy variance parameter from this model is denoted
Extending the random effects model to the RVSOM
The random effects variance shift outlier model (RVSOM) for the
which adds an extra term
If we further define
where
is the estimate of
The random effects model assumes that the studies' treatment effects are normally distributed and exchangeable. This exchangeability assumption is crucial for the identification of the RVSOM. If instead, for example, fixed effects were used for
A RVSOM for observation
An extension of model (4) which allows different inflated variances for more than one study can be written as
where
Implementation of the RVSOM
Having fitted the random effects model and made the usual inferences, there may be the concern that outliers are present and that these might have unfortunate implications for the resulting inferences. We then suggest initially fitting the RVSOM to each observation in turn. If a large
We consider the use of the likelihood ratio test (LRT) to evaluate the null hypothesis
The LRT statistic is analytically intractable except in special cases, but readily obtained from standard mixed model software. Although the restricted loglikelihood is suitable for constructing likelihood ratio tests for variance components provided the mean structure of the null and alternative models are the same
Empirical distribution of the LRT statistic and multiple testing
We propose the following parametric bootstrap procedure to obtain empirical distribution of the likelihood ratio test statistics under the null hypothesis that no outliers are present in the data.
Step 1: Fit the null model defined by (1) to the data to obtain estimates
Step 2a: Generate a new data vector
where
Step 2b: Compute the likelihood ratio test statistics
Step 3: Repeat steps 2a and 2b
Step 4: Calculate the 100(1
Identifying and downweighting outliers
Using any significance level alpha for the LRT order statistics, the presence of outliers can be formally assessed by placing the LRT statistics in descending order and regarding any set of
If there is no apparent reason to exclude these trials, perhaps from the Cochrane Risk of Bias tool for example, then the concern that they might be unduly influential but excludable on some unforseen grounds persists. In such circumstances a simple approach is to consider sensitivity analyses where some or all of these trials are excluded but simply discarding entire trials is rather extreme and various combinations of exclusions might be contemplated in this procedure. Instead of this deletion of observations we propose fitting an 'extended RVSOM' where separate and additional variance components
Results and Discussion
In this section we analyze three metaanalytic data sets, two of which come from the Cochrane Collaboration. The two data sets: CDPcholine for cognitive and behavioural disturbances, and Fluoride toothpaste for preventing dental caries have been previously analysed
CDPcholine for cognitive and behavioural disturbances
Fioravanti and Yanagi
Forest plot for the CDP study
Forest plot for the CDP study. Solid vertical line represents the value of the treatment effect under a random effects model. The centre of each circle in the confidence interval for each study represents the treatment effect from that study. The size of each circle is inversely proportional to the total variance under a random effects model.
Figure
RVSOM statistics plotted against study number for the CDP study
RVSOM statistics plotted against study number for the CDP study. (a) Variance shift estimates,
Estimated parameters for models fitted to the CDP data: overall treatment effect (
Model
Model
Parameter
Estimate
95% CI
Estimate
95% CI
μ
0.401
(0.08;0.72)
0.191
(0.058;0.324)
0.192

0:000



3.951

Intravenous magnesium in acute myocardial infarction
These 16 trials are a wellknown example where the results of a metaanalysis were contradicted by a single large trial
Forest plot for the magnesium study
Forest plot for the magnesium study. The centre of each circle in the confidence interval for each study represents the treatment effect from that study. The size of each circle is inversely proportional to the total variance under a random effects model.
Figure
RVSOM statistics plotted against study number for the magnesium study
RVSOM statistics plotted against study number for the magnesium study. (a) Variance shift estimates,
Fluoride toothpaste for preventing dental caries
Marinho
Forest plot for the flouride toothpaste study
Forest plot for the flouride toothpaste study. The centre of each circle in the confidence interval for each study represents the treatment effect from that study. The size of each circle is inversely proportional to the total variance under a random effects model.
Figure
RVSOM statistics plotted against study number for the flouride toothpaste study
RVSOM statistics plotted against study number for the flouride toothpaste study. (a) Variance shift estimates,
Estimated parameters for models fitted to the for the flouride toothpaste data: overall treatment effect (
Model
Model
Para meter
Est imate
95% CI
Estimate
95% CI
μ
0.3008
(0.33;0.27)
0.284
(0.32;0.25)
0.015

0.009



0.897



2.082



5.879

Conclusions
The proposed RVSOM provides a means to identify and, if desired, downweight outliers in metaanalysis. It does not eliminate them from the analysis however and we consider the proposed approach preferable to simply removing any or all apparently outlying results. We do not however propose that our methods in any way replace or diminish the standard random effects methodology that has proved so useful, rather they are helpful when used in conjunction with the random effects model. We note that statistical inferences based on modelling choices that were determined by the outcome of statistical tests, such as ours, are open to question and critique and partly for this reason we present our methods only in the context of sensitivity or secondary analyses. Our methods cannot provide reasons for any apparent outliers but are useful when some findings seem unaccountably unusual and their presence is a cause for concern. We have focused our attention on the notion of outlying trial results, rather than those that are influential, or have high leverage, and so on, but outliers very often exert alarming amounts of influence and these concepts are related. For the three examples considered here, the apparent outliers only had serious implications for the CPDcholine analysis, i.e. different estimated treatment effects result when outliers are downweighted, and hence our methods can either confirm or diminish any fears that inferences are driven by a handful of unusual results.
The likelihood ratio test gives an objective measure for detecting outliers in metaanalysis. Some may consider this objective measure in itself useful and use this part of the methodology alone rather than take the next step and downweight any apparent outliers. Determining which studies might be designated as outliers may be difficult from the visual inspection of plots and our methods could be used to inform this common but usually informal process. The methodology could be applied to aid the identification of any unusual findings and shortlist trials whose protocols and conduct should be examined especially carefully before being entered into the analysis.
We suggest that the results from the extended RVSOM, with inflated variances for any studies considered to be outliers, provides a useful sensitivity analysis. If the resulting inferences for the treatment effect are very different to those from the random effects model then all inferences should be very cautiously interpreted.
The fixed effects version of our model (with
The RVSOM model is not especially appropriate if there are many apparent outliers or the collection of trial results are truly unusual; it is not helpful or meaningful to designate a large proportion of the studies as outliers. In such instances many would balk at the possibility of metaanalysis altogether but heavy tailed or less usual models for the random effect may be useful in such instances, as shown by Baker and Jackson
Our procedure for the computation of thresholds for the likelihood ratio test statistics makes our proposal quite computationally intensive but in the current climate this presents little difficulty. More computationally intensive methods involving bootstrapping and permutation tests are becoming more common proposals in metaanalysis however and we anticipate that this trend will continue.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
Both FNG and DJ conceived the study, performed the statistical analysis and wrote the manuscript. Both authors read and approved the final manuscript.
Appendix: Computations
The analysis for the three examples given in the manuscript were conducted using Genstat
Acknowledgements
Freedom N. Gumedze is employed by University of Cape Town and would like to thank University of Cape Town Research Council for funding this research. Dan Jackson is employed by the UK Medical Research Council (grant code U.1052.00.006). We thank Prof. Jane Hutton for the helpful discussions in the earlier stages of this research. We also thank the two reviewers for their comments and suggestions, which led to substantial improvement of the manuscript.
Prepublication history
The prepublication history for this paper can be accessed here: