MRC Biostatistics Unit, Institute of Public Health, Robinson Way, Cambridge, CB2 2SR, UK

Cancer Research UK Department of Epidemiology, Mathematics and Statistics, Wolfson Institute of Preventive Medicine, Charterhouse Square, London, EC1M 6BQ, UK

Taunton and Somerset Hospital, Department of Obstetrics and Gynaecology, Musgrove Park, Taunton, TA1 5DN, UK

Abstract

Background

The Anglia Menorrhagia Education Study (AMES) is a randomized controlled trial testing the effectiveness of an education package applied to general practices. Binary data are available from two sources; general practitioner reported referrals to hospital, and referrals to hospital determined by independent audit of the general practices. The former may be regarded as a surrogate for the latter, which is regarded as the true endpoint. Data are only available for the true end point on a sub set of the practices, but there are surrogate data for almost all of the audited practices and for most of the remaining practices.

Methods

The aim of this paper was to estimate the treatment effect using data from every practice in the study. Where the true endpoint was not available, it was estimated by three approaches, a regression method, multiple imputation and a full likelihood model.

Results

Including the surrogate data in the analysis yielded an estimate of the treatment effect which was more precise than an estimate gained from using the true end point data alone.

Conclusions

The full likelihood method provides a new imputation tool at the disposal of trials with surrogate data.

Background

The Anglia Menorrhagia Education Study (AMES)

The general practice was the unit of randomization and the primary outcome measure of interest was the proportion of referrals of women with menorrhagia to hospital. In this part of the trial, data were collected in two ways. Firstly the doctors in the practices in the study were asked to keep a record of consultations for menorrhagia, with outcome of consultation, on supplied data sheets. We refer to this as the reported data. Secondly, an audit of 52% of the practices was performed after the trial was over. This was performed in order to have an objective measure of referral which did not depend on a busy practitioner reporting. 52% of the practices was considered enough for sufficient power having seen the reported data. The reported data was only recorded for one year post-intervention, whereas one-year pre-intervention data was also available for the audited part of the trial. Total numbers of patients seen and patients referred for the reported and audits phase are given in Table

Reported and audited outcome data

Trial phase Audited

Pre-intervention

Post-intervention

Intervention

Control

Intervention

Control

Patients seen

307

209

418

237

Referrals

56

39

80

63

Number of practices

27

25

27

25

Reported

Patients seen

NA

NA

381

215

Referrals

NA

NA

93

92

Number of practices

NA

NA

40

36

In analysis, one might simply exclude those practices which do not have audited data. On the other hand, it is reasonable to suppose that some information, albeit less reliable, is contained in the reported data. Surrogate endpoints have been used in a variety of studies, notably in trials of cancer screening

Three approaches are considered. A regression method, multiple imputation and a full likelihood model. The regression method is a three stage process. Firstly, the observed audited data is modelled as a function of the corresponding reported data and general practice characteristics. Secondly, the missing audited data is generated using the parameter estimates from this modelling. Thirdly, a random effects model is fitted to assess the effectiveness of the intervention. This model also includes a term that denotes whether the true endpoint was observed or estimated. Multiple imputation generates several realisations of the missing audited data, given the observed data. Each of these imputed data sets is then used to generate an estimate of the effectiveness of the intervention. Finally each of these estimates is combined to give an overall estimate of the true outcome effect. The full likelihood model generates an imputation of the missing audited data from the reported and audited data, and performs the randomized trial comparison simultaneously. In all these approaches, we are assuming the audit data is missing at random (MAR)

General practice characteristics

All audited patients

All patients with reported data

Practice characteristic

Intervention

Control

Intervention

Control

Mean list size

6974

5167

6965

5314

Fund-holding

7/27

26%

4/25

16%

11/40

28%

6/36

17%

Has branch surgeries

15/27

56%

16/25

64%

17/40

43%

24/36

67%

Rural

10/27

37%

7/25

28%

16/36

44%

9/36

25%

Has drug dispensing facilities^{1}

0.34

0.42

0.32

0.44

Male partners^{1}

0.63

0.77

0.67

0.76

Has trainees

15/27

56%

9/25

36%

15/40

38%

9/36

25%

Partners on obstetric list^{1}

0.92

1.00

0.89

0.99

1 = characteristic is measured as a mean proportion

The aim of this paper is to estimate the treatment effect using data from every practice in the study. Where the true endpoint is not available, it is estimated via a surrogate by three approaches, a regression method, multiple imputation and a full likelihood model.

Methods

Since our endpoint was referral of individual patients, but the unit of randomization was general practice, all models assessing the treatment effect incorporated a random effects component for practice, to take account of this cluster randomization

Regression models

For the 26 practices which were not audited, but for which we had reported data, our aim was to predict what audit data (pre and post intervention numbers of patients seen and referred) would have come from these practices had they been audited. We used the post intervention reported data to predict both the pre and post intervention audited data. We also included practice characteristics in this prediction. Log linear regression models were fitted, where the audited data is a function of the reported data and the practice characteristics. Then estimated parameters from these models were used to estimate the missing values from the audited data. Finally, the overall effect of intervention was estimated by a random effects logistic regression model, where an extra random effect was included to add extra variance from the observations that were estimated and not observed.

Firstly, we fitted log linear regression models of audit data on reported data and practice characteristics. Randomization group status was included in the practice characteristics vector in the pre-intervention regression, as reported data, the crucial independent variable, is only observed after the intervention, and the relationship between reported behaviour after intervention and true behaviour before intervention may be modified by the effect of the intervention (e.g. a reduction in referral rates after intervention in the groups receiving the intervention). On the other hand, in the post-intervention regression, the reported and audited data are observed post-intervention, so the effect of intervention is already included. The following models are fitted for the 50 practices that were audited and which returned at least one reported data form:

^{b }and ^{a }denote the number of women presenting with menorrhagia before and after intervention from the audited data respectively; ^{r }denotes the corresponding number from the post-intervention reported data. ^{b}, ^{a }and ^{r }are the corresponding number of referrals from the pre- and post-intervention audited and post-intervention reported data respectively. _{2 }is the vector of the eight practice characteristics given in table _{1 }this same vector, but also including the randomization group of the practice.

The values of α, β and γ are used to generate fitted values for the 76 practices which have reported data. A full data set can now be constructed for all 78 practices with any data at all. For the 52 practices with observed audited data, this is used, and the fitted values are ignored. Plots of observed data verses fitted values for the 50 practices that supplied both audited and reported data are shown in Figure

Plots of observed versus fitted values for the 50 practices that supplied audited and reported data

Plots of observed versus fitted values for the 50 practices that supplied audited and reported data. The observed values correspond to the audited information recorded, the fitted values correspond to the audited information that is predicted from the reported data via the regression model. Lines with a zero intercept and a gradient of one are plotted to gauge agreement between observed and fitted values.

Where _{ijkl }and _{ijkl }are the number of women presenting with menorrhagia, and the number of women referred in practice _{ijkl }denotes the true underlying probability of being referred. _{00 }= _{01 }= _{10 }= 0 (control group and intervention group pre-intervention), _{11 }= 1 (intervention group post-intervention), _{0 }= 0 (pre intervention), _{1}= 1 (post intervention) and _{0 }= 0 (observed), _{1 }= 1 (estimated). β is used to denote the log odds ratio of being referred in an intervention practice post intervention compared to a control practice or intervention practice pre intervention. In this model we allow a variation in trend for each practice γ_{i}, around an average trend for all practices μ_{1}. There is a common intercept for each practice, within trial arm, at the point (_{k }- 0.5). δ_{i }is a random effect that is "switched off" for the practices that have observed audited values and "switched on" for the practices that use estimated audited values. In this way extra variability is allowed in the model for the practices that have estimated audit information.

Multiple imputation

Methodology

Multiple imputation using auxiliary variables can be used to strengthen the true endpoint

More formally suppose we have a vector of discrete data _{obs }and missing values _{mis}. _{1}..._{K }with probabilities θ = (θ_{1},..., θ_{K}) respectively. Rubin _{mis }are independently drawn from _{1}..._{K}, such that the _{k}) =

The Bayesian bootstrap imputation is complicated to implement as it requires sampling from a Dirichlet distribution, followed by taking a weighted sample from the possible values the components

Suppose _{obs }and _{mis }is of length _{0 }is of length _{1}. The approximate Bayesian bootstrap imputation is as follows:

• Draw _{1 }components, with replacement, from _{obs}. Call this vector

• Draw _{0 }components, with replacement from _{mis}.

In this way the approximate Bayesian bootstrap method draws θ from a scaled multinomial distribution rather then a Dirichlet posterior as in the Bayesian bootstrap case.

Suppose we wish to estimate β from the data.

Variance of

and the between-imputation variance, which is the variance of the estimates of

The total variance is defined as:

^{-1})

Inferences about β can be gained from the approximation:

Where the degrees of freedom of the t distribution is given by:

So β is estimated by

where

Application to the AMES data set

Let

In this case

We are only interested in imputing the missing audited data, as ultimately this is considered the most accurate, and the pre-intervention data can be used in the modelling. The missing audit data always comes in groups of four for each practice: the number of women presenting with menorrhagia and the number of referrals both pre and post intervention. The theory outlined above is for imputation of missing data in a vector. We identify the audited data for each practice (i.e. row of the data) with an element of a vector

The approximate Bayesian bootstrap imputation was then performed on the data. A random sample of 52 rows was taken with replacement from the 52 rows of complete data. From this a random sample of 26 rows was taken with replacement. This, along with the original 52 complete rows forms an imputed data set. This process was independently repeated five times, and these five data sets are each analysed.

As with the analysis of the audit data before, we wish to get an estimate of the odds of being referred in the intervention group compared to the control group. We fit the model:

Where the variable definitions are the same as those used in equation 2.

This imputation assumes that the missing audit data is missing completely at random (MCAR) ^{rd }and 66^{th }percentiles of the proportion of patients reported to have been referred were 0.15 and 0.4. Within each strata the missing data were then imputed from the observed data.

52 practices have observed audited data. In the stratified imputation only 50 of these can be used to sample from, as two of these practices have no reported data on which to stratify.

Full likelihood model

The previous two methods used a two-stage procedure where firstly missing data was imputed and then the treatment effect estimated. Here a method is proposed that performs both these stages simultaneously. Consider the following model:

Note that this model uses only the post-intervention data and does not use the practice characteristics data. Here

Directed Acyclic Graph of the full likelihood model

Directed Acyclic Graph of the full likelihood model. In the graph circles represent unknown parameters and rectangles represent observed data. Dashed arrows represent deterministic dependence and solid arrows represent stochastic dependence. A dashed rectangle represents data that is partially observed, and is imputed (so a parameter) for missing values.

Neither the audited data nor the reported data is complete. Of the 78 practices included in this model, 76 have reported data and 52 have audited data (50 have both). Because of the nature of the MCMC sampler used in the model fitting, at each iteration the observed values and the current imputed values of ^{a }and φ^{r }respectively; in turn φ^{a }and φ^{r }then impute another set of missing values of

For each practice the logit true audited probability of referral, logit(_{1 }is an estimate of the log odds of referral in the intervention group compared to the control. Each practice as allowed to have a different underlying probability of referral via the random effect γ_{i}, thus making adjustments for the cluster randomized nature of the design.

The reported data was considered to be a surrogate for the audited data. In this model the logit surrogate probability of referral, logit(

In MCMC sampling, at every iteration an estimate of every parameter is obtained. This means that missing data imputation and the randomized trial comparison were performed simultaneously and not in a two stage process.

Model fitting

The regression models given in equation 1, that are used to generate missing values, were fitted using Splus

Where BUGS was used to estimate parameters, prior distributions that are locally almost uniform were chosen, with variances at least two orders of magnitude larger than the posterior variances of the corresponding nodes. These priors are considered to be non informative. The model fitting for the full likelihood model was achieved with the BUGS code in the Appendix. Convergence was assessed by the methods of Geweke

Results

Table

Odds of being referred in an education practice compared to a control: comparison of the various modelling strategies used.

Method set

Point estimate

CI

s.e. (log OR)

Audited data only

0.73

(0.47,1.08)

0.212

Regression

0.68

(0.42,1.01)

0.218

Unstratified imputation

0.74

(0.45,1.02)

0.203

Stratified imputation

0.75

(0.45,1.05)

0.212

Full likelihood

0.68

(0.44,0.91)

0.188

Table

The correlation between the audited and reported data for the number of referrals

0.17

0.36

0.30

Discussion

These results show reasonable agreement with regard to the point estimate. The educational package reduced the proportion of women who are referred to hospital by around 30%. Some of this benefit may be artificial, due to increased diagnostic activity in the intervention group.

These results differ from those previously reported

In all the modelling strategies used we have attempted to impute missing data from surrogate data and assess the effect of intervention. Each strategy has used different methods for imputing data, and for adjusting the variance of the outcome measure to allow for the fact that this data is estimated rather than observed.

The regression models attempt to account for extra variation, caused by using imputed values, by adding a random effect in the regression model which estimated the outcome measure. However, a weakness in this model is that the estimated values are artificially too good.

The fitted values will all lie on hyper-planes defined by the estimated parameters from model 1, whereas the observed values used when these are available will all lie around these planes, but will never lie exactly on them. The regression models used to estimate the missing values are therefore giving the exact values that one would expect and do not allow for random variation in the realised values. Extra variation is allowed in the model for these values by inclusion of an additional random effect. However, there is an element of a "self fulfilling prophecy" where the regression model 2 that estimates the outcome measure is based on data that will fit the model better at the estimated points.

A further problem with this method is that it is possible in general for _{1 }>_{3 }and _{2 }>_{4}. An alternative modelling strategy to protect against this could be to estimate the number of events

The imputation models do not have these problems, as the missing audited data is imputed from the observed audited data. The stratified method is to be preferred as this generates data which is more likely to have occurred for the practice that the missing data is being imputed for. The results from the imputation methods give estimates of the effect of intervention and s.e. of the log odds ratio in between the other two methods. It should be noted that the formulas used to estimate this standard error have been shown to be inconsistent in certain settings

The full likelihood model has the desirable property of performing the imputation and the randomized trial comparison simultaneously. Despite not making use of pre-intervention information, this model achieves the lowest standard error of the log odds ratio of all the models considered. The validity of the point estimate is unlikely to be impaired by the absence of pre-intervention data as the audited pre-intervention probabilities of referral were similar in the intervention and control groups. This model could be improved in principle by including pre-intervention data and practice characteristics. This was tried and imposed too heavy a burden on the estimation algorithm.

The standard error of the log odds ratio obtained from a random effects logistic regression on the audited data alone was 0.212. This was improved upon by the methods here which estimate the missing audited data, with the exception of the regression method, which was conservative, probably due to too much extra variation being add by the random effect for imputed values. These improvements are due to the added information from the auxiliary reported variable. The choice of parametric assumptions used in the generation of missing values would also influence this gain in precision.

The reported data is strongly related to the audited data. The relationship of the logit reported probability with the logit audited probability is

logit (^{r}) = 5.44 + 1.09 logit (^{a}) (13)

The 95% credible interval of the estimate of 1.09 is (0.17,2.02), indicating a significant (p = 0.02) dependency of surrogate on true endpoint. Thus, while this surrogate is unlikely to satisfy Prentice's criteria

Conclusion

Using reported data as a surrogate for audited in the full likelihood model gives a point estimate that is accurate, and improves the precision of the estimate from that yielded using audited data alone. Regression type approaches and the Bayesian bootstrap imputation technique have already been used in other studies. The full likelihood approach provides an additional possible strategy in the case where only partial information is available on the true endpoint.

Competing interests

None declared.

Authors' contributions

RN developed the models, performed all the analysis and drafted the manuscript. SD aided in the development of the models and production of the final manuscript. GF designed and coordinated the AMES study.

Appendix: BUGS code for full likelihood model

model{

for(i in 1 : N){

ref.a [i] ~ dbin(p.a [i], tot.a [i])

tot.a [i] ~ dpois(phi.a)

logit(p.a [i]) <- alpha1 + beta1 * treat [i] + gamma.a [i]

gamma.a [i] ~ dnorm(0, tau.a)

ref.rep [i] ~ dbin(p.rep [i], tot, rep [i])

tot.rep [i] ~ dpois(phi.rep)

lp.a [i] <- logit(p.a [i])

logit(p.rep [i]) <- alpha2 + beta2 * (lp.a [i]- Ip.a.bar)

}

Ip.a.bar <- mean(lp.a[])

tau.a <- l/(s.a*s.a)

s.a <- exp(ls.a)

#PRIORS

phi.a ~ dnorm(0,1.0E-6) I(0,)

phi.rep ~ dnorm(0,1.0E-6) I(0,)

ls.a ~ dunif(-6,6)

a ~ dnorm(0.0,1.0E-6)

b ~ dnorm(0.0,1.0E-6)

c ~ dnorm(0.0,1.0E-6)

d ~ dnorm(0.0,1.0E-6)

#EXTRA VARIABLES

exp.b <- exp(b)

}

Odds of being referred in an education practice compared to a control: results from the individual multiple imputations.

Imputation

Point estimate of OR

s.e.

Unstratified

1

0.70

0.116

2

0.73

0.115

3

0.67

0.108

4

0.85

0.123

5

0.72

0.116

Stratified

1

0.84

0.136

2

0.67

0.108

3

0.71

0.114

4

0.72

0.117

5

0.82

0.137

Pre-publication history

The pre-publication history for this paper can be accessed here: