Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2022 Oct 1.
Published in final edited form as: Arthritis Care Res (Hoboken). 2021 Mar 30;74(10):1623–1630. doi: 10.1002/acr.24606

Comparison of Responsiveness of British Isles Lupus Assessment Group 2004 Index, Systemic Lupus Erythematosus Disease Activity Index 2000, and British Isles Lupus Assessment Group 2004 Systems Tally

Chee-Seng Yee 1,, Caroline Gordon 2, David A Isenberg 3, Bridget Griffiths 4, Lee-Suan Teh 5, Ian N Bruce 6, Yasmeen Ahmad 7, Anisur Rahman 3, Athiveeraramapandian Prabu 2, Mohammed Akil 8, Neil McHugh 9, Christopher J Edwards 10, David D’Cruz 11, Munther A Khamashta 12, Vernon T Farewell 13
PMCID: PMC7613658  EMSID: EMS146364  PMID: 33787088

Abstract

Objective

To compare the responsiveness of the British Isles Lupus Assessment Group 2004 index (BILAG-2004) and the Systemic Lupus Erythematosus Disease Activity Index 2000 (SLEDAI-2K) disease activity indices and to determine whether there was any added value in combining BILAG-2004, BILAG-2004 system tally (BST), or simplified BST (sBST) with SLEDAI-2K.

Methods

This was a multicenter longitudinal study of SLE patients. Data were collected on BILAG-2004, SLEDAI-2K, and therapy on consecutive assessments in routine practice. The external responsiveness of the indices was assessed by determining the relationship between change in disease activity and change in therapy between 2 consecutive visits. Comparison of indices and their derivatives was performed by assessing the main effects of the indices using logistic regression. Receiver operating characteristic curves analysis was used to describe the performance of these indices individually and in various combinations, and comparisons of area under the curve were performed.

Results

There were 1,414 observations from 347 patients. Both BILAG-2004 and SLEDAI-2K maintained an independent relationship with change in therapy when compared. There was some improvement in responsiveness when continuous SLEDAI-2K variables (change in score and score of previous visit) were combined with BILAG-2004 system scores. Dichotomization of BILAG-2004 or SLEDAI-2K resulted in poorer performance. BST and sBST had similar responsiveness as the combination of SLEDAI-2K variables and BILAG-2004 system scores. There was little benefit in combining SLEDAI-2K with BST or sBST.

Conclusion

The BILAG-2004 index had comparable responsiveness to SLEDAI-2K. There was some benefit in combining both indices. Dichotomization of BILAG-2004 and SLEDAI-2K leads to suboptimal performance. BST and sBST performed well on their own; sBST is recommended for its simplicity and clinical meaningfulness.

Introduction

Systemic lupus erythematosus (SLE) is a complex multisystem disease, and assessment of this disease is challenging, given the multiple outcome domains to be considered. The 2 commonly used disease activity indices that allow the results from different cohorts of SLE patients to be compared in clinical trials or observational studies are the British Isles Lupus Assessment Group 2004 index (BILAG-2004) (15) and the Systemic Lupus Erythematosus Disease Activity Index 2000 (SLEDAI-2K) (68).

A strong correlation between the classic BILAG index and the original SLEDAI was demonstrated using patient vignettes, but there has been no direct comparison of the performance of BILAG-2004 and SLEDAI-2K using real-world clinical data (9,10). Various attempts have been made to combine SLEDAI (or its derivatives) with BILAG-2004 or classic BILAG indices in clinical trials, in the belief that a combination might be superior to either index on its own (1114). However, few data are available to support this presumption, and concerns exist about the impact of variable recording of the physician’s global assessment (PhGA) by different physicians (9) in composite responder indices such as the SLE Responder Index (SRI) and its derivatives (11,1316) and the BILAG Composite Lupus Assessment (BICLA) (12,17,18). These composite clinical trial end points focus on changes, in particular on patients showing specific levels of improvement in 1 index at the final trial visit as compared to baseline visit and require no worsening in the alternative index and PhGA. Both SRI and BICLA are currently used as end points in clinical trials of SLE, but trial results have been inconsistent, including some with promising results in phase II studies but negative results in phase III or with disappointing results generally (12,15,17,1921). One of the concerns with trials that failed was with the outcome measure used as the primary end point being not optimal (22). This study reports on the analysis comparing BILAG-2004 and SLEDAI-2K and tries to determine the best way of using these indices without PhGA in longitudinal studies.

We have previously demonstrated the external responsiveness of BILAG-2004 and SLEDAI-2K (4,23). Employing similar robust methodology (24), the analyses presented here examined whether the use of both indices improves the responsiveness of each alone using data from a large longitudinal study of SLE patients seen in routine practice. We also compared the performance of the BILAG-2004 systems tally (BST). BST is an alternative way of representing BILAG-2004 scores in a longitudinal assessment that combines the flexibility and simplification of overall numerical scoring of BILAG-2004 with the clinical intuitiveness of BILAG-2004 structure (25).

Patients and Methods

Data from a multicenter prospective longitudinal study in the UK, which was primarily designed to validate BILAG-2004, were used in this analysis (4). This same data set was used to demonstrate the external responsiveness of SLEDAI-2K and to develop BST and simplified BST (sBST) (23,25). All patients satisfied the revised American College of Rheumatology criteria for classification of SLE (26,27). This study received multicenter research ethical approval and was carried out in accordance with the Helsinki Declaration. Written consent was obtained from all patients.

This study has been described in detail previously (4). In summary, patients were followed up prospectively in routine clinical practice and data (BILAG-2004 index, SLEDAI-2K score, and treatment) were collected for all consecutive visits and physician encounters. Previously we demonstrated, based on receiver operating characteristic (ROC) curve analyses, that BST, sBST, and BILAG-2004 global numerical variables (combination of change in BILAG-2004 global numerical score [5] and the score from the previous visit), were comparably related to change in therapy and provided better discrimination than a model including variables for changes in all 9 BILAG-2004 system scores (25). In the analyses presented here, we included disease activity as assessed by SLEDAI-2K, BILAG-2004 individual system scores, and global BILAG-2004 numerical score.

Changes in disease activity and treatment between 2 consecutive visits were analyzed. Each observation for the analysis was derived from 2 consecutive visits. A robust definition for change in therapy between consecutive visits was used as the external reference for change in disease activity as described previously (see Supplementary Appendix A, available on the Arthritis Care & Research website at http://onlinelibrary.wiley.com/doi/10.1002/acr.24606) (35,23,25). Three categories of changes in therapy were defined: no change, increase in therapy, and decrease in therapy. All statistical analyses were performed using Stata for Windows, version 8, and R (28). Robust variance estimation was used to allow for correlation between multiple assessments from the same patients (29).

External responsiveness was used to compare the performance of the indices in this longitudinal study (24). It assessed the extent to which changes in the index over time relate to corresponding changes in therapy between 2 consecutive visits. Therefore, clinically meaningful change was assessed. Change in therapy was chosen as the external reference, as there was no better objective alternative, and this criterion has been used in multiple validation studies on BILAG-2004, SLEDAI-2K, and BST (35,23,25). The pros and cons of using change in therapy as the external reference were discussed previously (3).

Maximum-likelihood multinomial and binary logistic regression were used to assess external responsiveness, with change in therapy as the outcome variable and changes in disease activity (as determined by the indices) as the explanatory variables. For comparison purposes, the main effects of the indices were assessed within a common regression model. The baseline comparator for change in disease activity used in the analysis was minimal or no change in activity, while the baseline comparator for change in treatment was no change in therapy. The results were reported as odds ratios (ORs) with 95% confidence intervals (95% CIs), and Wald tests were used for model comparison where needed.

In the multinomial regression analyses, the baseline category of no change in therapy was compared with both increase in therapy and decrease in therapy. There was no direct comparison between increase in therapy and decrease in therapy. An OR value of >1 for a 1-unit increase in the variable defined by the index of interest, within the comparison between increase in therapy and no change in therapy, indicated that the increase in the index score was associated with an increase in therapy. Conversely, an OR of <1 for the same comparison implied that the increase in the index score was associated with no change in therapy (and not with a decrease in therapy) or equivalently an inverse association with an increase in therapy. Similar interpretation was applicable to the reported OR for the comparison between decrease in therapy and no change in therapy.

Various combinations of SLEDAI-2K and BILAG-2004 (including BST and sBST) as dichotomized or regarded as continuous variables were examined and compared to determine whether there was added value in combining both of these indices. For some analyses, the BILAG-2004 global numerical score was calculated based on the system scores using a coding scheme of A = 12, B = 8, C = 1, D/E = 0 (5). ROC curves analysis was used to describe the performance of these indices and the various combinations (30). Logistic regression was used to estimate the sensitivity, specificity, positive predictive value, negative predictive value, and area under the curve (AUC). The analyses were performed from 2 perspectives: deterioration in scores as a predictor of increase in therapy and improvement in scores as a predictor of decrease in therapy. Calculation of an asymptotic confidence interval for AUC and comparison of AUCs were performed using a nonparametric approach (31). AUC, with a value from 0 to 1, quantified the performance of the index, with the value of 1 corresponding to the index providing perfect discrimination.

Deterioration in the BILAG-2004 score was defined to have occurred if there was worsening in the score to grades A or B in any of the systems. The deteriorations were classified (in order of ranking) as: 1) major deterioration: change from C/D/E to A or from D/E to B, 2) minor A deterioration: change from B to A, and 3) minor B deterioration: change from C to B. A change from D/E to C was considered minor and not clinically significant. Therefore, such a change was excluded from the definition of deteriorations.

Improvement in the BILAG-2004 score was deemed to have occurred if there was reduction in the score in any system in the absence of any deterioration in the other systems. The improvements were classified (in order of ranking) as 1) major improvement: change from A to C/D or B to D, 2) minor A improvement: change from A to B, and 3) minor B/C improvement: change from B to C or C to D. These classifications were used to define BILAG-2004–based explanatory variables in regression analyses. The definitions and gradations above were based on the principle of intent-to-treat that underlay BILAG-2004 scoring, whereby active disease requiring therapy was graded A or B depending on the item, while grade C usually required symptomatic therapy (1). We accepted that at the individual patient and organ level, there may be variation in the severity of the disease items and the need for change in therapy within each grade.

BST and its simplified version, sBST, were counts of systems with specified changes in scores between 2 assessments (25). BST comprised 6 components: 1) the number of systems with major deterioration (change of B/C/D/E to A, or D/E to B), 2) the number of systems with minor deterioration (change of C to B), 3) the number of systems with persistent significant activity (no change from A or B), 4) the number of systems with major improvement (change of A to C/D, or B to D), 5) the number of systems with minor improvement (change of A to B, or B to C), and 6) the number of systems with persistent minimal or no activity (change of C/D/E to C/D/E).

Simplified BST had 3 components: 1) the number of systems with active/worsening disease (systems with major deterioration, minor deterioration, and persistent significant activity); 2) the number of systems with improving disease (systems with major improvement and minor improvement); and 3) the number of systems with persistent minimal or no activity.

Results

There were 347 SLE patients with 1,761 assessments that contributed 1,414 observations for this analysis. There was an increase in treatment in 22.7% of the observations, while 37.3% had therapy decreased, and in 40.0%, there was no change in treatment, as previously reported (4). The demographic characteristics and distribution of change in disease activity for each system are summarized in Supplementary Tables 1 and 2, available on the Arthritis Care & Research website at http://onlinelibrary.wiley.com/doi/10.1002/acr.24606.

Comparison of BILAG-2004 with SLEDAI-2K

To examine the combined performance of SLEDAI-2K and BILAG-2004, we undertook multinomial logistic regression analysis of change in therapy, using the changes in both BILAG-2004 and SLEDAI- 2K with and without their respective values at the previous visit. We had demonstrated previously that although change in the SLEDAI-2K score was significantly associated with changes in treatment, the strongest relationship was observed in a model that included both the change in the SLEDAI-2K score and the score at the previous visit as continuous variables (hereby referred to as SLEDAI-2K variables) (23).

In the analysis of external responsiveness reported here, changes in the individual system scores of BILAG-2004 and SLEDAI-2K variables (as a continuous variable) were included as explanatory variables for the outcome variable of change in therapy. Table 1 shows that SLEDAI-2K variables and individual BILAG-2004 system scores retained independent relationships with change in therapy. Consistent with our earlier work (23), if only the change in SLEDAI-2K score was included (i.e., the SLEDAI-2K score of the previous visit was omitted), the change in SLEDAI-2K score was no longer significantly associated with change in therapy (increase or decrease), while changes in BILAG-2004 system scores maintained their significant association with change in therapy (data not shown).

Table 1. External responsiveness of the combination of the BILAG-2004 and SLEDAI-2K indices with multinomial logistic regression (n = 1,414)*.

Change in score Increase in therapy Decrease in therapy
BILAG-2004 index system score
   Constitutional
      Increase 1.35 (0.87–2.08)
      Decrease 0.86 (0.27–2.71) 2.26 (1.25–4.06)
   Mucocutaneous
      Increase 7.52 (4.36–12.98) 0.63 (0.31–1.28)
      Decrease 0.88 (0.56–1.38) 1.49 (1.09–2.05)
   Neuropsychiatric
      Increase 1.85 (0.49–7.02)
      Decrease 0.98 (0.20–4.79) 1.96 (0.85–4.51)
   Musculoskeletal
      Increase 11.93 (5.32–26.76) 1.10 (0.42–2.88)
      Decrease 0.69 (0.44–1.08) 0.96 (0.69–1.33)
   Cardiorespiratory
      Increase 2.88 (0.96–8.60) 0.71 (0.24–2.05)
      Decrease 1.17 (0.50–2.75) 1.42 (0.82–2.47)
   Gastrointestinal
      Increase 7.74 (0.67–89.43) 0
      Decrease 0.64 (0.18–2.31) 1.14 (0.26–4.88)
   Ophthalmic
      Increase 1.32 (0.01–270.27) 0
      Decrease 4.25 (0.51–35.06) 1.47 (0.22–9.99)
   Renal
      Increase 1.08 (0.32–3.72) 3.14 (0.95–10.40)
      Decrease 0.63 (0.26–1.54) 1.76 (0.97–3.19)
   Hematologic
      Increase § §
      Decrease 0.95 (0.56–1.60) 0.90 (0.64–1.28)
Change in SLEDAI-2K score 1.17 (1.08–1.27) 0.93 (0.87–1.00)
Previous visit SLEDAI-2K score 1.18 (1.11–1.26) 0.96 (0.91–1.02)
*

Values are the odds ratio (95% confidence interval). BILAG-2004 = British Isles Lupus Assessment Group 2004 index; SLEDAI-2K = Systemic Lupus Erythematosus Disease Activity Index 2000; ∞ = infinity.

Compared to no change in therapy.

Compared to minimal change (including change of grade D/E to C) within each system.

§

No observation with increase in score.

When we undertook a multinomial logistic regression analysis of change in therapy using change in the numerical score of BILAG-2004 and in SLEDAI-2K, along with their respective values at the previous visit (see Supplementary Table 3, available on the Arthritis Care & Research website at http://onlinelibrary.wiley.com/doi/10.1002/acr.24606), we observed the expected relationships between the changes in the numerical scores and changes in therapy. Both pairs of variables, the 2 based on BILAG-2004 and the 2 based on SLEDAI-2K, added predictive power for an increase in therapy (P = 0.02 for the addition of SLEDAI-2K variables to BILAG-2004 numerical score variables, and P < 0.01 for the addition of BILAG-2004 numerical score variables to SLEDAI-2K variables by Wald test). For decrease in therapy, the SLEDAI-2K variables did not provide additional predictive power (P = 0.50 by Wald test).

As shown in Table 2, we observed that BST variables were related to changes in therapy in the expected manner, and that SLEDAI-2K variables provided additional predictive power for an increase in therapy (P = 0.007 based on the Wald test from separate logistic regression) but not for a decrease in therapy (P = 0.30 by Wald test). Similar results were obtained with sBST (see Supplementary Table 4, available on the Arthritis Care & Research website at http://onlinelibrary.wiley.com/doi/10.1002/acr.24606).

Table 2. External responsiveness of the combination of BILAG-2004 systems tally and SLEDAI-2K indices with multinomial logistic regression (n = 1,414)*.

Change in score Increase in therapy Decrease in therapy
BILAG-2004 systems tally
    Major deterioration 14.35 (8.51–24.21) 0.85 (0.48–1.52)
    Minor deterioration 5.72 (2.76–11.86) 1.04 (0.55–1.94)
    Persistent significant activity 5.54 (3.26–9.43) 0.65 (0.41–1.05)
    Major improvement 0.95 (0.58–1.56) 1.57 (1.11–2.23)
    Minor improvement 1.47 (0.89–2.43) 1.27 (0.91–1.79)
Change in SLEDAI-2K score 1.08 (1.01–1.17) 0.96 (0.90–1.03)
Previous visit SLEDAI-2K score 0.99 (0.91–1.07) 1.00 (0.94–1.07)
*

Values are the odds ratio (95% confidence interval). BILAG-2004 = British Isles Lupus Assessment Group 2004 index; SLEDAI-2K = Systemic Lupus Erythematosus Disease Activity Index 2000.

Compared to no change in therapy.

Compared to persistent minimal or no activity (change of grade C/D/E to C/D/E).

Comparison of performance of combinations of BILAG-2004 and SLEDAI-2K

Table 3 summarizes the results of further analyses using various combinations of information from BILAG-2004 and SLEDAI-2K. The table shows the AUC measures based on ROC curves derived from binary regression analyses of both increase in therapy and decrease in therapy versus no change in therapy. For completeness, we performed similar analyses of increase in therapy versus no increase in therapy and decrease in therapy versus no decrease in therapy.

Table 3. Area under the curve values from receiver operating characteristics curves analysis of the BILAG-2004 index, SLEDAI-2K, and combination of the 2 indices*.

Increase in therapy Decrease in therapy
Versus no change Versus no increase Versus no change Versus no decrease
BILAG-2004 index system scores 0.75 (0.70–0.79) 0.75 (0.71–0.78) 0.59 (0.56–0.62) 0.65 (0.62–0.67)
BST 0.82 (0.78–0.87) 0.83 (0.81–0.86) 0.57 (0.54–0.61) 0.66 (0.63–0.68)
Simplified BST 0.81 (0.78–0.84) 0.81 (0.78–0.84) 0.57 (0.54–0.60) 0.65 (0.63–0.68)
BILAG 2004 numerical score variables 0.84 (0.81–0.87) 0.85 (0.82–0.88) 0.58 (0.55–0.62) 0.67 (0.65–0.70)
SLEDAI-2K variables§ 0.75 (0.71–0.78) 0.76 (0.73–0.79) 0.56 (0.53–0.60) 0.63 (0.60–0.66)
BILAG 2004 index system scores plus SLEDAI-2K variables 0.80 (0.77–0.83) 0.81 (0.78–0.84) 0.60 (0.57–0.64) 0.67 (0.64–0.70)
BST plus SLEDAI-2K variables 0.84 (0.81–0.86) 0.84 (0.81–0.87) 0.59 (0.55–0.62) 0.67 (0.64–0.70)
Simplified BST plus SLEDAI-2K variables 0.82 (0.79–0.85) 0.83 (0.80–0.86) 0.58 (0.55–0.62) 0.67 (0.64–0.69)
BILAG-2004 numerical score variables plus SLEDAI-2K variables 0.84 (0.82–0.87) 0.85 (0.83–0.88) 0.59 (0.56–0.62) 0.68 (0.65–0.71)
*

Values are the area under the curve (95% confidence interval). BILAG-2004 = British Isles Lupus Assessment Group 2004 index; BST = BILAG-2004 systems tally; SLEDAI-2K = Systemic Lupus Erythematosus Disease Activity Index 2000.

9 separate changes in system scores.

Change in numerical score and previous visit numerical score.

§

Change in SLEDAI-2K score and previous visit SLEDAI-2K score.

The comparison of AUC measures from this exploratory analysis for increase in treatment versus no increase in treatment are summarized in Supplementary Table 5, available on the Arthritis Care & Research website at http://onlinelibrary.wiley.com/doi/10.1002/acr.24606, which provides the significance levels for the comparison of the various models. The P values should be regarded as illustrative, as no adjustment for multiplicity was performed. Similar results were obtained for analysis for increase in treatment versus no change in treatment (see Supplementary Table 6, available on the Arthritis Care & Research website at http://onlinelibrary.wiley.com/doi/10.1002/acr.24606). The analysis showed no evidence that either BILAG-2004 system scores or SLEDAI-2K variables were more predictive of changes in therapy individually than the other (P = 0.89 by Wald test). There was some improvement in the performance from the combination of both BILAG-2004 system scores and SLEDAI-2K variables (P < 0.001 for the addition of each to the other by Wald test). BST and sBST had comparable performance (P = 0.107 by Wald test) and were, respectively, similar to (P = 0.128 by Wald test) or slightly worse than (P < 0.001 by Wald test) BILAG-2004 numerical score variables (change in numerical score and previous visit numerical score). BST, sBST, and BILAG-2004 numerical score variables appeared to be more predictive of an increase in therapy compared to BILAG-2004 system scores (P < 0.001, P = 0.002, and P < 0.001, respectively, by Wald test) and SLEDAI-2K variables (P < 0.001, P = 0.013, and P <0.001, respectively, by Wald test). Furthermore, BST, sBST, and BILAG-2004 numerical score variables were comparable to or slightly better than the combination of BILAG-2004 system scores and SLEDAI-2K variables (P = 0.63, P = 0.26, and P = 0.03, respectively, by Wald test). Finally, the addition of SLEDAI-2K variables provided little improvement to the performance of BST, sBST, or BILAG-2004 numerical score variables (P = 0.60, P = 0.16, and P = 0.22, respectively, by Wald test).

Dichotomization of indices

Dichotomized versions of the BILAG-2004 and the SLEDAI-2K have been used for a variety of purposes. In the supplementary material using dichotomized variables to analyze deterioration of activity and improvement in activity (Supplementary Tables 7 and 8, respectively, available on the Arthritis Care & Research website at http://onlinelibrary.wiley.com/doi/10.1002/acr.24606), the results for clinically relevant dichotomizations were given for the 2 indices, separately and in combination. These were based on multinomial regressions with a single binary explanatory variable.

Two particular categorizations of changes in the combination of these measures that were of similar magnitude to those used in the definition of SRI (SLEDAI-2K score decrease of ≥4 and no BILAG-2004 deterioration) (11) and BICLA (all improvements in BILAG-2004 with no SLEDAI-2K score increase of ≥1) (12) were included in Supplementary Table 8, available on the Arthritis Care & Research website at http://onlinelibrary.wiley.com/doi/10.1002/acr.24606, examining improvement in disease activity, but without PhGA. The change was also between 2 consecutive visits (not between the start and end of study). The estimated sensitivities and specificities were 1.5% and 98.9%, respectively, for the SRI-like variable, and 48.2% and 70.0%, respectively, for the BICLA-like variable when used to predict a decrease in therapy (versus no decrease). The AUC values for these 2 variables were 0.50 and 0.59, respectively, compared with AUCs >0.65 for BST, sBST, and BILAG-2004 numerical variables (Table 3). Other dichotomized variables also did not perform as well as these numerical variables in relation to both decrease in therapy and increase in therapy.

Discussion

This multicenter observational study directly compared the responsiveness of the BILAG-2004 index with the SLEDAI-2K in longitudinal fashion and assessed the potential value of combining the 2 indices using a comprehensive range of approaches. Our analyses showed that there was some nonoverlapping relationship with change in therapy when both BILAG-2004 and SLEDAI-2K were included in the model, confirming that both indices had similar responsiveness. Responsiveness was optimal if both the change in the SLEDAI-2K score and the SLEDAI-2K score of the previous visit were included in the model as continuous variables. The use of only change in the SLEDAI-2K score was associated with inferior performance.

Outcome in clinical trials is determined by 3 factors: efficacy of intervention, study design, and effectiveness of the outcome measure used. Our discussion is focused on properties of the outcome measure that would affect its ability to differentiate the efficacy of the different treatment arms. Discussing the other factors is beyond the scope of the present work.

Many clinical trials in SLE have reported their results using various combinations of the SLEDAI-2K or the Safety of Estrogens in Lupus Erythematosus National Assessment (SELENA)–SLEDAI (and its variants) with the BILAG-2004 or the classic BILAG index as the primary end points (22). In the belimumab phase III trials, the SRI was used in which a response was defined as an improvement in the SELENA-SLEDAI score of at least 4 points with no new grade A and ≤1 new grade B classic BILAG system score, and no deterioration of PhGA (13,14). This combination was selected using the data set from the phase II trial of belimumab to derive the best separation in efficacy between belimumab and placebo, with the presumption that belimumab was effective (11). Using a similar combination of improvement in the SLEDAI-2K score of at least 4 points with no worsening of the BILAG-2004 system score to grade A or B, we found that this combination performed poorly when assessed using the reference of change in therapy. This finding was surprising as we would have expected these 2 indices to exert a greater role than PhGA, which is subject to variable reporting due to individual physicians scoring lupus manifestations differently from each other in the absence of a glossary, particularly in patients with >1 system involved (9). The indices used in this study were different from the original SRI (BILAG-2004 instead of classic BILAG index, SLEDAI-2K instead of SELENA-SLEDAI and no PhGA). The modified SRI used in the analysis was very similar to the indices used successfully in the phase 2 trial of ustekinumab (16), but which failed as the primary end point in phase 3 trials of anifrolumab (15) and Lupuzor (19). These trials used a modification of the SRI in which response was driven by a 4-point reduction in the SLEDAI-2K score with ≤1 new B grade in BILAG-2004 and ≤10% worsening of PhGA.

A different combination (BICLA) was used in other clinical trials, in which a response was defined as an improvement in the BILAG-2004 system score (in the absence of new grade A or B score) with no worsening of the SLEDAI-2K score (≥1) and no worsening of PhGA (12,15,17,18). The results of our study, shown in Supplementary Table 8, available on the Arthritis Care & Research website at http://onlinelibrary.wiley.com/doi/10.1002/acr.24606, supported the use of this combination of BILAG- 2004 and SLEDAI-2K indices, which although not successful in the epratuzumab phase 3 trial (17), was successful in phase 3 trials of anifrolumab as primary (TULIP-2 trial) and secondary end points (TULIP-1 trial) (15,18).

Currently, the combination of the 2 indices (BILAG-2004 with SLEDAI-2K) used in clinical trials involves dichotomizations of the outcome variables. Our data suggested that the benefit was minimal when combining these 2 indices in this specific way, and the value of PhGA was debatable (9). Dichotomization involves using a cutoff to determine whether a response is achieved (yes/no response). However, dichotomization of variables may result in loss of efficiency, as it does not allow for a graded response, and a partial response might be considered lack of response if the cutoff is not achieved (32). We demonstrated that dichotomization of both BILAG-2004 and SLEDAI-2K resulted in poorer responsiveness in our longitudinal study. With better efficiency and performance of the outcome measure used, fewer patients would be required in a study to demonstrate differences between groups, which then facilitates target recruitment and reduces the cost of running the study. In comparison to the use of a continuous outcome, the size of a trial may need to be increased by a factor of 30% if a binary outcome with a uniform distribution is used with a median cutoff, with greater gains for a normal distribution (32). By using BST or sBST, which are based on counts of systems with specified transitions in BILAG-2004 scores, the problem of dichotomization could be avoided.

Although BILAG-2004 numerical score variables and the combination of the SLEDAI-2K variables with BST or sBST had a slightly better performance than BST or sBST alone, BST and sBST performed better than BILAG-2004 system scores and SLEDAI-2K. In addition, there was difficulty with interpretation of the clinical meaningfulness of BILAG-2004 numerical score variables and the combination of SLEDAI-2K variables with BST or sBST. Our analyses supported the use of BST or sBST alone and suggested minimal advantage of combining SLEDAI-2K with BST or sBST. Consequently, there could be simplification in study methodology by using only 1 disease activity index (BILAG-2004), which would avoid confusion and reduce errors due to differences in BILAG-2004 and SLEDAI-2K glossaries.

One limitation of this study that might affect the applicability of the results to clinical trials was the time reference used to define change in disease activity. This study looked at the changes between consecutive visits. In contrast, clinical trials generally compare the disease activity between the beginning and the end of the study (and not between consecutive visits), which might be 1 year apart. With a longer time interval, a larger effect is far more likely to occur. However, comparing the outcome measures at only 2 time points (the beginning and the end of study) ignores the level of disease activity between these 2 time points. The use of counts or a continuous variable over the study period (such as flare rate) could overcome this disadvantage. Another limitation was that BST and sBST were developed using the same data set, which might have provided an advantage. Validation of our result with an independent data set is needed.

In conclusion, both BILAG-2004 and SLEDAI-2K have similar responsiveness longitudinally. There is some benefit in combining the 2 indices, but dichotomization of the indices leads to suboptimal performance. BST and sBST performed well on their own and the addition of SLEDAI-2K variables only resulted in minimal improvement. There is no significant difference with the responsiveness of BST or sBST. Given that sBST has only 3 components, we would recommend the use of sBST in longitudinal analysis of disease activity for its simplicity and clinical meaningfulness.

Supplementary Material

Supplementary File

Significance & Innovations.

  • Various ways of analyzing the British Isles Lupus Assessment Group 2004 index (BILAG-2004) and Systemic Lupus Erythematosus Disease Activity Index 2000 (SLEDAI-2K), and their derivatives, have been employed in longitudinal studies of systemic lupus erythematosus (SLE), especially in clinical trials. However, a direct comparison of these 2 indices and their various combinations has not been made to determine the best way of using them without the addition of a physician’s global assessment.

  • The results of this analysis provide guidance on the use of these indices as disease activity outcome measures in longitudinal studies of SLE. The key findings from this analysis are: 1) both the BILAG-2004 index and the SLEDAI-2K have similar responsiveness, and there is some improvement when they are combined; 2) dichotomization of the BILAG-2004 index and the SLEDAI-2K may reduce performance as an outcome measure; and 3) the simplified BILAG system tally may have an advantage due to its simplicity and clinical meaningfulness.

Acknowledgments

We would like to thank all the nurse specialists and doctors who contributed to this study at the participating centers.

Supported by Versus Arthritis (grant 16081), the Medical Research Council UK (grant U105261167), and Vifor Pharma/Aspreva Pharmaceuticals. The Birmingham SLE clinics were supported by Lupus UK. Some of the work for this study was carried out at the NIHR/Wellcome Trust Birmingham Clinical Research Facility. The work of Drs. Isenberg and Rahman was supported by the NIHR University College London Hospitals Biomedical Research Centre. Dr. Bruce’s work was supported by Versus Arthritis, NIHR Manchester Biomedical Research Unit, and NIHR Manchester Wellcome Trust Clinical Research Facility.

Role of the Study Sponsor

Vifor Pharma/Aspreva Pharmaceuticals had no role in the study design or in the collection, analysis, or interpretation of the data, the writing of the manuscript, or the decision to submit the manuscript for publication. Publication of this article was not contingent upon approval by Vifor Pharma/Aspreva Pharmaceuticals.

Footnotes

Author Contributions

All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be submitted for publication. Dr. Yee had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study conception and design. Yee, Gordon, Isenberg, Griffiths, Teh, Bruce, Ahmad, Rahman, Prabu, Akil, McHugh, Edwards, D’Cruz, Khamashta, Farewell.

Acquisition of data. Yee, Gordon, Isenberg, Griffiths, Teh, Bruce, Ahmad, Rahman, Prabu, Akil, McHugh, Edwards, D’Cruz, Khamashta, Farewell.

Analysis and interpretation of data. Yee, Gordon, Isenberg, Farewell.

References

  • 1.Isenberg DA, Rahman A, Allen E, Farewell V, Akil M, Bruce IN, et al. BILAG 2004: development and initial validation of an updated version of the British Isles Lupus Assessment Group’s disease activity index for patients with systemic lupus erythematosus. Rheumatology (Oxford) 2005;44:902–6. doi: 10.1093/rheumatology/keh624. [DOI] [PubMed] [Google Scholar]
  • 2.Yee CS, Farewell V, Isenberg DA, Prabu A, Sokoll K, Teh LS, et al. Revised British Isles Lupus Assessment Group 2004 Index: a reliable tool for assessment of systemic lupus erythematosus activity. Arthritis Rheum. 2006;54:3300–5. doi: 10.1002/art.22162. [DOI] [PubMed] [Google Scholar]
  • 3.Yee CS, Farewell V, Isenberg DA, Rahman A, Teh LS, Griffiths B, et al. British Isles Lupus Assessment Group 2004 Index is valid for assessment of disease activity in systemic lupus erythematosus. Arthritis Rheum. 2007;56:4113–9. doi: 10.1002/art.23130. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Yee CS, Farewell V, Isenberg DA, Griffiths B, Teh LS, Bruce IN, et al. The BILAG-2004 index is sensitive to change for assessment of SLE disease activity. Rheumatology (Oxford) 2009;48:691–5. doi: 10.1093/rheumatology/kep064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Yee CS, Cresswell L, Farewell V, Rahman A, Teh LS, Griffiths B, et al. Numerical scoring for the BILAG-2004 index. Rheumatology (Oxford) 2010;49:1665–9. doi: 10.1093/rheumatology/keq026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Gladman DD, Ibañez D, Urowitz MB. Systemic lupus erythematosus disease activity index 2000. J Rheumatol. 2002;29:288–91. [PubMed] [Google Scholar]
  • 7.Touma Z, Urowitz MB, Gladman DD. SLEDAI-2K for a 30-day window. Lupus. 2010;19:49–51. doi: 10.1177/0961203309346505. [DOI] [PubMed] [Google Scholar]
  • 8.Bombardier C, Gladman DD, Urowitz MB, Caron D, Chang CH, the Committee on Prognosis Studies in SLE Derivation of the SLEDAI: a disease activity index for lupus patients. Arthritis Rheum. 1992;35:630–40. doi: 10.1002/art.1780350606. [DOI] [PubMed] [Google Scholar]
  • 9.Wollaston SJ, Farewell VT, Isenberg DA, Gordon C, Merrill JT, Petri MA, et al. Defining response in systemic lupus erythematosus: a study by the Systemic Lupus International Collaborating Clinics group. J Rheumatol. 2004;31:2390–4. [PubMed] [Google Scholar]
  • 10.American College of Rheumatology Ad Hoc Committee on Systemic Lupus Erythematosus Response Criteria. The American College of Rheumatology response criteria for systemic lupus erythematosus clinical trials: measures of overall disease activity. Arthritis Rheum. 2004;50:3418–26. doi: 10.1002/art.20628. [DOI] [PubMed] [Google Scholar]
  • 11.Furie RA, Petri MA, Wallace DJ, Ginzler EM, Merrill JT, Stohl W, et al. Novel evidence-based systemic lupus erythematosus responder index. Arthritis Care Res (Hoboken) 2009;61:1143–51. doi: 10.1002/art.24698. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Wallace DJ, Kalunian K, Petri MA, Strand V, Houssiau FA, Pike M, et al. Efficacy and safety of epratuzumab in patients with moderate/severe active systemic lupus erythematosus: results from EMBLEM, a phase IIb, randomised, double-blind, placebo-controlled, multicentre study. Ann Rheum Dis. 2014;73:183–90. doi: 10.1136/annrheumdis-2012-202760. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Furie R, Petri M, Zamani O, Cervera R, Wallace DJ, Tegzová D, et al. A phase III, randomized, placebo-controlled study of belimumab, a monoclonal antibody that inhibits B lymphocyte stimulator, in patients with systemic lupus erythematosus. Arthritis Rheum. 2011;63:3918–30. doi: 10.1002/art.30613. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Navarra SV, Guzmán RM, Gallacher AE, Hall S, Levy RA, Jimenez RE, et al. Efficacy and safety of belimumab in patients with active systemic lupus erythematosus: a randomised, placebo-controlled, phase 3 trial. Lancet. 2011;377:721–31. doi: 10.1016/S0140-6736(10)61354-2. [DOI] [PubMed] [Google Scholar]
  • 15.Furie RA, Morand EF, Bruce IN, Manzi S, Kalunian KC, Vital EM, et al. Type I interferon inhibitor anifrolumab in active systemic lupus erythematosus (TULIP-1): a randomised, controlled, phase 3 trial. Lancet Rheumatol. 2019;1:e208–19. doi: 10.1016/S2665-9913(19)30076-1. [DOI] [PubMed] [Google Scholar]
  • 16.Van Vollenhoven RF, Hahn BH, Tsokos GC, Wagner CL, Lipsky P, Touma Z, et al. Efficacy and safety of ustekinumab, an IL-12 and IL-23 inhibitor, in patients with active systemic lupus erythematosus: results of a multicentre, double-blind, phase 2, randomised, controlled study. Lancet. 2018;392:1330–9. doi: 10.1016/S0140-6736(18)32167-6. [DOI] [PubMed] [Google Scholar]
  • 17.Clowse ME, Wallace DJ, Furie RA, Petri MA, Pike MC, Leszczyński P, et al. Efficacy and safety of epratuzumab in moderately to severely active systemic lupus erythematosus: results from two phase III randomized, double-blind, placebo-controlled trials. Arthritis Rheumatol. 2017;69:362–75. doi: 10.1002/art.39856. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Morand EF, Furie R, Tanaka Y, Bruce IN, Askanase AD, Richez C, et al. Trial of anifrolumab in active systemic lupus erythematosus. N Engl J Med. 2020;382:211–21. doi: 10.1056/NEJMoa1912196. [DOI] [PubMed] [Google Scholar]
  • 19.National Institutes of Health. US National Library of Medicine: ClinicalTrials.gov. A 52-week, randomized, double-blind, parallel-group, placebo-controlled study to evaluate the efficacy and safety of a 200-mcg dose of IPP-201101 plus standard of care in patients with systemic lupus erythematosus: study results. 2019. URL:https://clinicaltrials.gov/ct2/show/results/NCT02504645?view=results .
  • 20.Furie R, Khamashta M, Merrill JT, Werth VP, Kalunian K, Brohawn P, et al. Anifrolumab, an anti–interferon-α receptor monoclonal antibody, in moderate-to-severe systemic lupus erythematosus. Arthritis Rheumatol. 2017;69:376–86. doi: 10.1002/art.39962. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Zimmer R, Scherbarth HR, Rillo OL, Gomez-Reino JJ, Muller S. Lupuzor/P140 peptide in patients with systemic lupus erythematosus: a randomised, double-blind, placebo-controlled phase IIb clinical trial. Ann Rheum Dis. 2013;72:1830–5. doi: 10.1136/annrheumdis-2012-202460. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Merrill JT. For lupus trials, the answer might depend on the question. Lancet Rheumatology. 2019;1:e196–7. doi: 10.1016/S2665-9913(19)30098-0. [DOI] [PubMed] [Google Scholar]
  • 23.Yee CS, Farewell VT, Isenberg DA, Griffiths B, Teh LS, Bruce IN, et al. The use of Systemic Lupus Erythematosus Disease Activity Index-2K to define active disease and minimal clinically meaningful change based on data from a large cohort of systemic lupus erythematosus patients. Rheumatology (Oxford) 2011;50:982–8. doi: 10.1093/rheumatology/keq376. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Husted JA, Cook RJ, Farewell VT, Gladman DD. Methods for assessing responsiveness: a critical review and recommendations. J Clin Epidemiol. 2000;53:459–68. doi: 10.1016/s0895-4356(99)00206-1. [DOI] [PubMed] [Google Scholar]
  • 25.Yee CS, Gordon C, Isenberg DA, Griffiths B, Teh LS, Bruce IN, et al. The BILAG-2004 systems tally: a novel way of representing the BILAG-2004 index scores longitudinally. Rheumatology (Oxford) 2012;51:2099–105. doi: 10.1093/rheumatology/kes207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Tan EM, Cohen AS, Fries JF, Masi AT, McShane DJ, Rothfield NF, et al. The 1982 revised criteria for the classification of systemic lupus erythematosus. Arthritis Rheum. 1982;25:1271–7. doi: 10.1002/art.1780251101. [DOI] [PubMed] [Google Scholar]
  • 27.Hochberg MC, for the Diagnostic and Therapeutic Criteria Committee of the American College of Rheumatology Updating the American College of Rheumatology revised criteria for the classification of systemic lupus erythematosus [letter] Arthritis Rheum. 1997;40:1725. doi: 10.1002/art.1780400928. [DOI] [PubMed] [Google Scholar]
  • 28.Hornik K. R FAQ; frequently asked questions on R. 2020. URL:https://cran.r-project.org/doc/FAQ/R-FAQ.html .
  • 29.Williams RL. A note on robust variance estimation for cluster-correlated data. Biometrics. 2000;56:645–6. doi: 10.1111/j.0006-341x.2000.00645.x. [DOI] [PubMed] [Google Scholar]
  • 30.Zweig MH, Campbell G. Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clinical Chemistry. 1993;39:561–77. [PubMed] [Google Scholar]
  • 31.DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44:837. [PubMed] [Google Scholar]
  • 32.Farewell VT, Tom BD, Royston P. The impact of dichotomization on the efficiency of testing for an interaction effect in exponential family models. J Am Stat Assoc. 2004;99:822–31. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File

RESOURCES