Composite responder endpoints feature frequently in rheumatology due to the multifaceted nature of many of these conditions. Current analysis methods used to analyse these endpoints discard much of the data used to classify patients as responders and are therefore highly inefficient, resulting in low power. We highlight a novel augmented methodology that uses more of the information available to improve the precision of reported treatment effects. Since these methods are more challenging to implement, we developed free, user-friendly software available in a web-based interface and as R packages. The software consists of two programs: one that supports the analysis of responder endpoints; the second that facilitates sample size estimation. We demonstrate the use of the software to conduct the analysis with both the augmented and standard analysis method using the MUSE study, a phase IIb trial in patients with systemic lupus erythematosus.

The software outputs similar point estimates with smaller confidence intervals for the odds ratio, risk ratio and risk difference estimators using the augmented approach. The sample size required in each arm for a future trial using the novel approach based on the MUSE data is 50 versus 135 for the standard method, translating to a reduction in required sample size of approximately 63%.

We encourage trialists to use the software demonstrated to implement the augmented methodology in future studies to improve efficiency.

The online version contains supplementary material available at

Composite endpoints combine a number of individual outcomes in order to assess the effectiveness or efficacy of a treatment. They are typically used in situations where it is difficult to identify a single relevant endpoint to sufficiently capture the change in disease status incited by the treatment, however they may be employed for multiple purposes [

List of rheumatic conditions where composite responder endpoints containing at least one continuous component are used

Condition | Endpoint | Response definition |
---|---|---|

Acute Gout | Proportion of patients who responded* | 1. sUA level of < 6.0 mg |

Ankylosing spondylitis | ASAS20 response | 1. 20% improvement and ≥ 10 units of change (on a 0–100 scale) in each of 3 domains 2. No worsening of a similar amount in the fourth domain (Components are physical function, pain, inflammation and patient’s global assessment) |

Idiopathic arthritis-associated uveitis | Best corrected visual acuity above threshold and no light perception | 1. Best-corrected visual acuity, thresholds ≤ 20/50, ≤ 20/200 2. No light perception 3. Estimate contribution of amblyopia, yes/no |

Juvenile arthritis | Response | 1. Improvement by 30% in at least 3 of: a. MD global assessment; b. parent or patient global assessment c. functional ability; d. number of joints with active arthritis; e. number of joints with limited range of motion; f. erthrocyte sedimentation rate |

Juvenile dermatomyositis | Responder index | 1. ≥ 4 point reduction from baseline in safety of estrogen in lupus national assessment (SELENA) systemic lupus erythematosus disease activity index (SLEDAI) score 2. No worsening (increase of <0.30 points from baseline) in physician's global assessment (PGA) 3. No new British Isles Lupus Assessment Group of SLE clinics (BILAG) A organ domain score or 2 new BILAG B organ domain scores compared with baseline |

Prevention of fracture in high-risk populations | Response | 1. Bone mineral density increase 2. Occurrence of new vertebral fractures |

Proliferative and membranous lupus renal disease | Urinary protein levels within normal range* | 1. Between 6 and 8.3 g per deciliter (g/dL) |

Rheumatoid arthritis | ACR20 response | 1. ≥ 20% improvement in ACR score 2. Can be combined with additional requirements e.g. no additional medication |

Sarcopenia prevention | Occurrence of sarcopenia | Heterogeneity in precise definition, but severe sarcopenia defined by all of the following: 1. Low muscle strength (assessed with chair stand test or grip strength) 2. Low muscle quantity/quality 3. Low physical performance as assessed with gait speed test or short physical performance battery |

Sjogren's syndrome | Response | 1. > 30% reduction in analog scales evaluating dryness, pain and fatigue |

Systemic lupus erythematosus | SRI responder index | 1. SLEDAI change e.g. ≤− 4 2. PGA change e.g. <0.3 3. No Grade A or more than one Grade B in BILAG |

Systemic sclerosis | SCP in normal range, no renal crisis | E.g 1. <3.0 mg/dl not drug related 2. No renal crisis |

Vasculitis disorders | Response/partial improvement* | 1. 50% improvement in disease activity score |

^{*}Denotes a single dichotomized continuous variable

Employing composite endpoints as the primary outcome measure in a study has many advantages. Proponents of composite endpoints believe that they are appropriate as they estimate the net clinical benefit of an intervention by accounting for the multiple factors of interest in a given disease [

Additional criticisms arise from the analysis of these endpoints. The endpoints are typically treated as binary measures based on whether or not the patient responded, meaning the analysis is straight-forward to implement. However, for composites containing continuous outcome measures, this is at the expense of losing large amounts of information contained in those components [

One limitation of these methods is that they are more difficult to implement. Therefore, in this paper we demonstrate the use of free, user-friendly online software for conducting analyses of composite responder endpoints using the augmented approach. We illustrate this using the MUSE trial (NCT01438489) [

In what follows we give a brief description of the methods. In Sect. 2 we summarise the MUSE trial data and demonstrate the capability of the software; in Sect. 3 we describe the software output from the application and in Sect. 4 we discuss the implications for practice.

We refer to the analysis method routinely applied to composite responder endpoints as the binary approach. This consists of collapsing the outcome information to form a binary response variable based on whether or not the patients meet the overall response criteria. This response variable is analysed using an appropriate binary analysis method, such as logistic regression. The treatment effect can then be reported in terms of odds ratios, risk ratios or risk differences along with confidence intervals and

The augmented approach involves using a more sophisticated model that jointly models data from each of the components using a latent variable framework. The information contained in the continuous components is retained and used to weight patients differently in the analysis, based on how close their readings were to the response threshold. The probability of response in each arm is subsequently obtained which can then be used to form treatment effect estimates in terms of odds ratios, risk ratios or risk differences, as in the standard binary case. The increased efficiency compared to the binary approach is due to making inference on the probability of response without discarding any of the continuous data. In datasets where many patients’ continuous readings are close to the dichotomisation threshold, this may have a substantial impact on the precision of the estimate and hence on the conclusions reached. More technical detail on the specification and assumptions of the models used in the augmented approach for a range of outcome types is provided elsewhere [

To illustrate how the analysis can be conducted using the software, we focus on the MUSE trial [

Table

Observed response rates in each of the SRI + OCS components in the anifrolumab 300 mg arm and placebo arm of the MUSE trial

Components | Response criteria | Treatment arm | |
---|---|---|---|

Anifrolumab 300 mg | Placebo | ||

SLEDAI | Improvement of at least 4 points (change from baseline ≤− 4) | 58/89 | 41/76 |

PGA | No flare/worsening of disease as measured by PGA (change from baseline <0.3) | 87/89 | 75/76 |

BILAG | No flare/worsening of disease as measured by BILAG (no new Grade A or more than one Grade B compared to baseline) | 86/89 | 72/76 |

OCS | Sustained reduction in oral corticosteroids | 53/95 | 37/87 |

Overall SRI + OCS response | Must responds in all four components | 34/95 | 18/87 |

SLE index is comprised of a continuous SLEDAI outcome, continuous PGA outcome, ordinal BILAG outcome and binary OCS measure

The software to implement the analysis is a Shiny application, a Graphical User Interface (GUI) for programming language ‘R’ which can be accessed at

The user begins by selecting the analysis tab and uploading the csv file using the ‘Upload Files’ panel. A table displaying the uploaded data will be shown on the right-hand side (see Fig.

MUSE trial data is uploaded in the left-hand panel where the user can indicate preferences such as whether the file includes column headers and whether to display some or all of the data. The raw data is viewed in the right-hand panel where users may also search for particular subjects

The raw data can be visualised using boxplots, histograms, density plots or bar graphs in the ‘Raw Data Plots’ panel. The user must then select the structure of the composite endpoint, where this can be one or two continuous components and zero or one binary components. As the ordinal and binary outcomes have been combined, the SLE endpoint has two continuous and one binary component. Details of the model fitted can be viewed by selecting ‘Generate model’. Both of these steps are demonstrated in the Additional file

The analysis is initiated in the ‘Analysis’ panel by selecting the response threshold for the continuous outcomes. In this example the SLEDAI threshold is -4 and the PGA threshold is 0.3, where patients with readings below these values are considered to be responders and are otherwise treated as non-responders in the analysis.

A critical aspect of planning a future study using the augmented approach is how to determine the sample size, in order to avail of the efficiency gains. The ‘MultSampSize’ Shiny application allows users to determine sample sizes required through using preliminary data to inform the estimates. This may be in the form of pilot trial data, trial data from earlier phase studies or another source. The sample size estimation app can be accessed at

The user should select the ‘Sample Size’ tab and choose the ‘Composite’ option to proceed. Note that the app also accommodates co-primary and multiple primary endpoints, which also feature in rheumatology [

‘MultSampSize’ app with sample size calculator for co-primary, multiple primary and composite endpoints. The interface for the composite endpoint is shown where the number of continuous and binary components and response thresholds for the continuous measures are selected in the ‘Endpoint’ panel. ‘Get Model’ generates the model summary of the latent variable model and the power function

The pilot data can be uploaded using the ‘Parameter Estimates’ panel, where the columns must be ordered as before. Further guidance is available at

Figure

Analysis of the SRI + OCS endpoint in the phase II MUSE trial where the tables show the probability of response in each method, the treatment effects and 95% CIs using the latent variable method and the treatment effects and 95% CIs using the standard binary method

The ‘Sample Size Estimation’ panel displays the power curve and highlights the number of patients needed per arm to attain a desired power and alpha level, which can be set by the user. These values are also provided assuming the standard binary method was used as shown in Fig.

The MUSE trial dataset is uploaded in the ‘Parameter Estimates’ panel, where the probability of response in each arm, treatment effect and variance is shown for both the augmented and binary approaches. The power curve for a future study based on MUSE trial results is shown in the ‘Sample Size Estimation’ panel

In this paper we highlight novel methods to address inefficiencies in the analysis of composite responder endpoints commonly used in rheumatology, which use more sophisticated models to retain the information provided by continuous components. As this approach is more difficult to implement, we developed user-friendly, free to use software. This software conducts the analysis using both methods and offers sample size determination for future studies using the augmented technique as the primary analysis method. We demonstrated the functionality of the apps using the MUSE trial dataset, a phase II study in patients with SLE using a composite comprised of two continuous and one binary outcomes. The analysis took approximately 5 min to complete, where the gains in efficiency resulted in 37% reduction in confidence interval width for the log-odds estimate, equating to approximately 60% reduction in required sample size. This means that 60% fewer patients could be recruited by using the augmented analysis method without requiring any additional data to be collected. Using the MUSE trial to inform a future study in SLE indicated that the novel approach would require 50 per arm versus 135 required for the standard approach to detect the risk difference of 0.14 estimated by the binary method.

As the structure of the composite endpoints vary substantially and may be quite complex, the efficiency gains offered by this technique also depends on many factors. In particular, the number of continuous and binary components, response probabilities in each arm, the responder thresholds and correlation between components. Both Shiny applications therefore report the results for the standard and novel approaches. In the case of sample size estimation, the investigator has the option to recruit the number of patients dictated using the binary approach and benefit from the additional power instead.

The methods underpinning the apps allow for any number of continuous, ordinal and binary components to be included in the composite endpoint however the app currently only implements this for up to two continuous components along with up to one binary component. Each additional continuous endpoint may add a substantial amount of efficiency and so future work will involve updating the software to allow for more continuous components to facilitate endpoints such as those used in juvenile arthritis. In its current form the software may still be used for such endpoints however the additional components will have to be combined as a binary indicator, retaining the most informative continuous outcomes. However, it is important to note that responder endpoints with a more complex structure exist within rheumatology that cannot yet be accommodated by the software. In particular a common endpoint in osteoarthritis is the OARSI/OMERACT responder criterion [

Novel methods to analyse composite responder endpoints can now be easily applied. We encourage trialists to use the software demonstrated in future studies in rheumatology to improve efficiency and reduce biases arising from measurement error.

Project name: AugBin. Project home page(s): Shiny apps:

We thank the reviewers for their feedback which helped to improve the article and software.

M.M.M. wrote the R code underlying the software, A.B. provided trial dataset and data specific interpretation for the application included, M.G. and J.W. tested and revised the software. All authors contributed to writing and reviewing drafts of the manuscript. All authors read and approved the final manuscript.

MMM is funded by the NIHR Cambridge Biomedical Research Centre and JW is supported by funding from the Medical Research Council (MRC), grant code MC_UU_00002/6. None of the funding bodies had a role in the design of the study, analysis, interpretation of data or writing the manuscript.

Simulated example datasets and underlying R code for both web applications is available at

The authors were given consent to use the MUSE trial dataset in the application.

Not applicable.

The authors declare no conflicts of interest.

Systemic Lupus Erythematosus

SLE Responder Index

Physician's Global Assessment

Systemic Lupus Erythematosus Disease Activity Index

British Isles Lupus Assessment Group

Graphical User Interface

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.