Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Dec 26.
Published in final edited form as: Ann Rheum Dis. 2020 Nov 30;80(5):641–650. doi: 10.1136/annrheumdis-2020-219100

A New Composite Endpoint in Early Diffuse Cutaneous Systemic Sclerosis—Revisiting the Provisional American College of Rheumatology Composite Response Index in Systemic Sclerosis

Dinesh Khanna 1, Suiyuan Huang 1,2, Celia Lin 3, Cathie Spino 2
PMCID: PMC10750249  NIHMSID: NIHMS1950092  PMID: 33257497

Abstract

Objectives:

American College of Rheumatology Composite Response Index in Systemic Sclerosis (ACR-CRISS) is a composite end point to assess the likelihood of improvement in diffuse systemic sclerosis (dcSSc). ACR-CRISS is a weighted score and includes five core set measures: modified Rodnan skin score, FVC% predicted, HAQ-DI, patient and clinician global assessments.

Methods:

We analyzed core set measures from 354 participants who participated in three placebo-controlled trials. We generated ten development datasets, randomly selected from 2/3 of the participants, stratified by study and treatment group. The remaining participants (1/3 of the participants) formed the validation sets. Risk differences (RD) between active and placebo treatments were calculated by averaging over the replicate datasets; bootstrap 95% CIs for the risk differences (RD) to estimate the magnitude of treatment effects.

Results:

In the development sets (N=237), the proportion of participants in active group had statistically higher improvement in > one of five core set measures vs. the placebo group. For example, the proportion who improved by ≥ 20% in ≥ three core set measures was 49% in the active vs. 39% in the placebo; RD: 10%, 95%CI:5%−16%. In the validation sets (N=117), the proportion who improved by ≥ 20% in ≥ three core set measures was 50% in the active vs. 36% in the placebo (RD:15%, 95%CI:3%−26%). Similar trends were seen with larger percentage cut-offs.

Conclusion:

Revised CRISS, as assessed by the proportion of participants who improved by a certain percentage in ≥ three of five core set measures, is a potential new composite outcome measure.

Keywords: Systemic Sclerosis, Outcome Measures, Composite End Point

INTRODUCTION

Systemic sclerosis (SSc, scleroderma) is an immune-mediated rheumatic disease characterized by autoimmunity, vasculopathy, and fibrosis in the skin and internal organs.13 It has the highest case fatality of any rheumatic disease. One sub classification of this disease, diffuse cutaneous SSc (dcSSc), has a 10-year mortality rate of 50% and disease management is focused on organ-specific complications. Provisional American College of Rheumatology Composite Response Index in Systemic Sclerosis (ACR-CRISS) is a composite end point and was designed to capture the global or holistic evaluation of the likelihood of improvement in early SSc.4 It integrates worsening or incident cases of cardio-pulmonary-renal involvement and incorporates changes in five core set measures— modified Rodnan skin score (mRSS), percent predicted forced vital capacity (FVC%), health assessment questionnaire-disability index (HAQ-DI), and patient (PGA) and clinician (CGA) global assessments. ACR-CRISS was able to differentiate active therapies from placebo in recent trials,3 58 which showed statistically different and clinically important changes in the ACR CRISS, highlighting the importance of global assessment in a multisystem heterogeneous disease. However, ACR-CRISS is derived from a two-step algorithm with probabilities of improvement based on each core set measures that are weighted differently in each study. Thus, the endpoint has the potential for a single core set measure to drive overall response without clearly demonstrating a treatment benefit on one or more of the other core set measures. In addition, recent top line data (not published in medical journal) from lenabasum and autotaxin inhibitor trials showed ceiling effect of ACR-CRISS in the placebo and active therapy groups.9

To address these concerns, we explored the performance of 5 core set measures in the ACR-CRISS that were collected in 3 recent placebo-controlled RCTs. We used the concept that was first proposed by Paulus et al.,10 who developed a composite score based on statistical analysis to assess the activity of disease-modifying anti-rheumatic drugs (DMARDs) in rheumatoid arthritis (RA). These criteria, known as the Paulus criteria, required the improvement of 20% in at least 4 of 6 core set measures in RA and the criteria was able to differentiate the DMARD therapies from placebo in available RCTs. This later led to the development of ACR 20% criteria that has been adopted as an acceptable endpoint for regulatory approval for RA.11

The objective of the current analysis is to assess if a certain percentage of improvement in 5 core set measures, as incorporated in the ACR-CRISS, can differentiate the active medication group from placebo. Our hypotheses were that a greater proportion of participants on active medication will show statistically significant improvement compared to placebo for each of the 5 core set measures, and statistically significant proportion of participants will improve by a pre-defined percentage in > one core set measures (e.g., three of five core set measures with 20% improvement) favoring active medication group. We tested our hypotheses in three RCTs that assessed abatacept vs. placebo (in a phase 2 trial) and tocilizumab vs. placebo (in phase 2 and phase 3 trials) in early dcSSc using 95% confidence intervals (CIs) for the risk difference between treatments.

PATIENTS AND METHODS

Description of 3 trials

The three RCTs recruited patients with early dcSSc and mRSS was the primary outcome measure for all trials. The RCTs were double-blind, placebo-controlled with escape therapy allowed with immunosuppressive therapy, if there was worsening of dcSSc.

Abatacept Phase 2 trial (ASSET) randomly assigned 88 participants 1:1 to receive abatacept 125 mg subcutaneous (SC) or matching placebo, stratified by duration of dcSSc (clinicaltrials.gov NCT02161406) in a 52-week trial.3 12 The primary endpoint was the change from baseline to week 52 in mRSS. Key inclusion criteria were the disease duration of ≤36 months (defined as time from the first non−Raynaud phenomenon manifestation). For disease duration of ≤18 months, an mRSS ≥10 and ≤35 was required at the screening visit. For disease duration of >18 to ≤36 months, an mRSS of ≥15 and ≤45 was required.

Tocilizumab (TCZ) Phase 2 trial randomly assigned 87 participants 1:1 to receive weekly tocilizumab 162 mg or placebo SC for 48 weeks (clinicaltrials.gov NCT01532869).5 8 The primary endpoint was the change from baseline to week 48 in mRSS. The key inclusion criteria included of ≤60 months’ disease duration (from first non–Raynaud phenomenon manifestation) and a mRSS of 15–40 units at screening.

Tocilizumab Phase 3 trial randomly assigned 212 participants 1:1 to receive double-blind weekly tocilizumab 162 mg or placebo SC for 48 weeks (clinicaltrials.gov NCT02453256).68 The primary endpoint was the change from baseline to week 48 in mRSS. The key inclusion criteria included of ≤60 months’ disease duration (from first non–Raynaud phenomenon manifestation) and a mRSS of 10–35 units at screening.

Methods

We pooled participants from the three RCTs to estimate the magnitude of treatment effects and the impact of the varying thresholds for improvement in various sets of the core set measures (i.e., revised CRISS outcomes) to differentiate active and placebo treatment groups.

Statistical analysis

We defined improvement for four core set measures (mRSS, HAQ-DI, PGA, CGA) as the relative improvement from baseline to one year, varying the threshold of improvement in 5% increments from 10%, 15%, 20%, …, 50%, 55%, and 60% (i.e., at 11 different cut points). We also defined improvement for the pulmonary core set measure – percent predicted FVC (FVC%) – as 5% and 10% relative improvement from baseline to one year (i.e., at two cut points). We assessed whether improvement was seen in at least one, two, three, four, or all five core set measures (i.e., five levels of improvement).

We summarized the proportion of participants who demonstrated improvements for the five core set measures by treatment group, based on their relative improvement in four of the five core set measures and %FVC (i.e., 11 × 2 or 22 combinations of cut points for the five core set measures). In addition, for each cut point (e.g., 10% improvement in mRSS, HAQ-DI, Patient GA and Clinician GA and 5% improvement in %FVC), we calculated the proportion of participants with at least one, two, three, four or all five improvements by treatment group, resulting in an additional five summaries per cut point. We calculated the risk difference (RD; proportion of participants who improved in active medication group – proportion of participants who improved in placebo group) as a measure of the sensitivity of each of the core set measures and each revised CRISS outcome to treatment differences.

Development and validation sets

To assess the validity of our estimates, we divided the pooled data from the three RCTs into development and validation sets, using random split-sample validation.13 14 Because of potential differences among the three trials with respect to treatments, demographic and baseline characteristics, we generated ten development sets, randomly selected from 2/3 of the pooled patients, stratified by study and treatment group. The remaining participants (1/3 of the pooled patients) formed the validation sets. For each set (either development or validation), we did the following: (1) We calculated the proportion of participants who improved in each treatment group and the associated RD (resulting in ten estimates for each set); (2) We averaged the proportion of participants who demonstrated improvements by treatment group (over the ten sets); and (3) We used bootstrapping methods to estimate RD and it’s 95% CI (based on 100 bootstraps).13 14

We calculated the floor (defined as proportion of patients achieving ACR-CRISS of <0.005) and ceiling effects (defined as proportion of patients achieving ACR-CRISS of >0.995) of ACR CRISS. We also assessed the relationship between the proportion of participants who achieved at least 10%, 20%, 30%, 40%, 50%, and 60% improvement in at least three of five core set measures in revised CRISS and. ACR-CRISS by displaying box plots of ACR-CRISS for the improved and not improved groups based on revised CRISS. We summarized that associations using point-biserial correlations.

All analyses were conducted in SAS 9.4 (SAS Institute Inc.).

RESULTS

There were 387 participants in the three RCTs. Eighty percent of the pooled participants were women, with mean (SD) baseline age of 48.7 (12.4) years, mRSS of 22.1 (7.3), HAQ-DI of 1.2 (0.7), PGA (on 0–10 scale) of 5.4 (2.4), CGA (on 0–10 scale) of 5.8 (1.8), and FVC% of 82.5 (14.6)%; Table 1. Of the 387 participants, 33 (8.5%) met Step 1 (cardio-pulmonary-renal worsening) and were considered not improved and not included in the subsequent analysis as our goal was to analyze the performance of five core set measures. These included four in TCZ II (four in placebo and 0 in active medication groups), 19 in TCZ III (13 in placebo and six in active medication groups), and ten in ASSET (five in each treatment group; Table 2). We observed statistically significant higher PGA (P = 0.047) and higher FVC % predicted (P = 0.049) in placebo group at baseline. We also observed statistically significant heterogeneity among studies in race (P = 0.01), and baseline mRSS (P <.0001), PGA (P <.0001), and CGA (P <.0001).

Table 1:

Baseline demographics of participants in 3 randomized controlled trials

Overall
n = 387
Placebo
n = 195
Active
n = 192
P-Value ASSET
n = 88
TCZ 2
n = 87
TCZ 3
n = 212
P-Value
Age, years
 N 386 195 191 0.68a 88 87 211 0.50b
 mean (SD) 48.7 (12.4) 48.9 (12.7) 48.5 (12.2) 49.4 (12.6) 49.6 (12.3) 48.1 (12.4)
Female sex, n (%) 305 (79.81%) 161 (82.56%) 144 (75.00%) 0.07c 66 (75.00%) 67 (77.01%) 172 (81.13%) 0.44c
Race, n (%)
 White 327 (84.50%) 168 (86.15%) 159 (82.81%) 0.68c 72 (81.82%) 78 (89.66%) 177 (83.49%) 0.01d
 African American 17 (4.39%) 8 (4.10%) 9 (4.69%) 6 (6.82%) 6 (6.90%) 5 (2.36%)
 Asian 32 (8.27%) 13 (6.67%) 19 (9.90%) 5 (5.68%) 2 (2.30%) 25 (11.79%)
 Other 11 (2.84%) 6 (3.08%) 5 (2.60%) 5 (5.68%) 1 (1.15%) 5 (2.36%)
Baseline Core Set Measures
 mRSS (0–51)
  n 386 195 191 0.67a 88 87 211 <.0001b
  mean (SD) 22.1 (7.3) 21.9 (7.1) 22.4 (7.5) 22.5 (7.7) 26.0 (6.5) 20.4 (6.9)
 HAQ-DI (0–3)
  n 382 192 190 0.25a 88 86 208 0.057b
  mean (SD) 1.2 (0.7) 1.2 (0.7) 1.1 (0.7) 1.1 (0.7) 1.3 (0.7) 1.2 (0.7)
 Patient Global Assessment (0–10 VAS)
  N 381 192 189 0.046a 85 87 209 <.0001b
  mean (SD) 5.4 (2.4) 5.7 (2.3) 5.2 (2.4) 4.1 (2.4) 6.1 (2.0) 5.7 (2.3)
 Clinician Global Assessment (0–10 VAS)
  n 372 187 185 0.65a 86 87 199 <.0001b
  mean (SD) 5.8 (1.8) 5.7 (1.7) 5.8 (1.9) 4.8 (1.7) 6.2 (1.5) 6.0 (1.8)
 FVC % Predicted
  n 384 193 191 0.0487e 88 86 210 0.0918f
  mean (SD) 82.5 (14.6) 84.0 (15.1) 81.1 (14.1) 85.3 (15.1) 80.7 (13.6) 82.1 (14.8)
a

Wilcoxon Rank Sum test;

b

Kruskal-Wallis test;

c

Chi-Square test;

d

Fisher Exact test;

e

2-sample T-test;

f

ANOVA test; & Fisher Exact test; mRSS= modified Rodnan skin score, HAQ-DI= Health assessment questionnaire-disability index, VAS = visual analog scale, FVC= Forced vital capacity

Table 2:

Performance of the ACR-CRISS in the 3 randomized controlled trials, including floor and ceiling effects

Active, n=192 Placebo, n=195
Trials Meeting Step 1 Median Score at 12 monthsa ACR-CRISS ≥ 0.6 at 12 monthsa % with Ceiling effecta,b % with Floor effecta,b Meeting Step 1 Median Score at 12 monthsa ACR-CRISS ≥ 0.6 at 12 monthsa % with Ceiling effecta,b % with Floor effecta,b
Overall,
n=387
11 0.86 61.9% 32.1% 15.7% 22 0.19 46.5% 17.8% 32.6%
ASSET,
n=88
5 0.84 64.3% 35.7% 21.4% 5 0.04 41.9% 6.5% 41.9%
TCZ Phase 2,
n=87
0 0.25 44.4% 25.9% 29.6% 4 0.03 27.6% 10.3% 41.4%
TCZ Phase 3,
n=212
6 0.90 88.6% 32.9% 8.9% 13 0.74 56.5% 26.1% 24.6%
a

Step 1 participants not included;

b

Ceiling effect: ACR-CRISS > 0.995 and Floor effect: ACR-CRISS < 0.005

Performance of 5 core set measures: development sets

The proportion of participants (N=237, validation sets) who improved by ≥10% to ≥60% (in 5% increments) were numerically higher in the active therapy vs. placebo group for all four core set measures mRSS, HAQ-DI, PGA, and CGA and for FVC% at 5% and 10% relative improvement the majority of the time (Table 3 and Figure 1). PGA was not numerically higher in active, compared to placebo, for 10% and 20% thresholds and mRSS was not numerically higher for the 60% threshold. When we assessed the proportion of participants who improved by ≥one in five core set measures, these were numerically higher favoring active therapy vs. placebo group – except the proportion of participants with all five improvements was not numerically higher in active, compared to placebo, for the 60% threshold. As an example, the proportion of participants who improved by ≥ 20% in ≥one core set measure was 90% in active therapy vs. 81% in the placebo group, in ≥two core set measures was 74% in active therapy vs. 62% in the placebo group, in ≥three core set measures was 49% in active therapy vs. 39% in the placebo group, and in ≥four core set measures was 25% in active therapy vs. 17% in placebo group (Table 3 and Figure 1).

Table 3:

Proportion of participants who achieved a pre-defined percentage of improvement for each core set measure and ≥ 1 core set measures in Development Data Set

Improvement Measures PBO N = 116 Active N = 121 Risk Difference (95% CI) Improvement Measures PBO N = 116 Active N = 121 Risk Difference (95% CI)
10% mRSS 68.9 78.2 9.3 (5.2, 14) 15% mRSS 60.2 76.2 16 (11.6, 21.1)
HAQ-DI 38.3 52.1 13.7 (8.6, 18.5) HAQ-DI 35.7 46.0 10.2 (4.7, 17.5)
Patient GA 56.8 54.6 −2 (−10.2, 2.8) Patient GA 49.6 50.6 1.1 (−7.1, 5.6)
Clinician GA 64.5 78.1 13.7 (9, 20.1) Clinician GA 61.8 73.6 11.8 (5.8, 19.3)
FVC% 11.8 19.2 7.3 (4, 11.6) FVC% 11.8 19.2 7.3 (4, 11.6)
At least 1 improvement 87.7 96.5 8.9 (4, 11.7) At least 1 improvement 83.2 93.1 10.1 (5.9, 12.8)
At least 2 improvement 73.7 80.7 6.9 (2, 13.2) At least 2 improvement 64.4 78.0 13.5 (8.9, 19.1)
At least 3 improvement 46.6 56.4 9.7 (4.5, 15.9) At least 3 improvement 41.9 53.1 11.1 (5.7, 17)
At least 4 improvement 19.0 33.0 13.9 (9.9, 23.4) At least 4 improvement 17.8 28.3 10.6 (5, 20.6)
All 5 improvement 2.1 6.0 3.9 (2, 5.7) All 5 improvement 1.4 3.8 2.3 (0.9, 3.8)
20% mRSS 54.9 72.7 17.8 (11.9, 21.4) 25% mRSS 50.5 65.3 14.8 (9.3, 18.8)
HAQ-DI 35.2 40.5 5.3 (−0.2, 10.5) HAQ-DI 30.8 37.1 6.2 (1, 10.7)
Patient GA 47.4 46.9 −0.4 (−8.2, 3.6) Patient GA 42.3 43.6 1.3 (−5.1, 6.8)
Clinician GA 60.9 72.2 11.3 (4.7, 18.3) Clinician GA 54.0 68.3 14.3 (5.8, 23.1)
FVC% 11.8 19.2 7.3 (4, 11.6) FVC% 11.8 19.2 7.3 (4, 11.6)
At least 1 improvement 81.2 90.1 9 (4, 12.9) At least 1 improvement 75.1 87.2 12.2 (7.9, 15.9)
At least 2 improvement 61.6 74.4 12.8 (7.9, 19.1) At least 2 improvement 54.6 67.9 13.2 (7.9, 21.3)
At least 3 improvement 38.9 49.4 10.5 (4.9, 16.1) At least 3 improvement 34.2 45.1 10.8 (4.1, 15.9)
At least 4 improvement 17.1 24.9 7.8 (3, 14.9) At least 4 improvement 15.0 21.1 6 (0.2, 10.9)
All 5 improvement 1.4 3.8 2.3 (0.9, 3.8) All 5 improvement 1.4 3.8 2.3 (0.9, 3.8)
30% mRSS 45.2 53.7 8.5 (4, 13.1) 35% mRSS 39.7 47.8 8.1 (6.2, 11.2)
HAQ-DI 25.5 31.6 6.1 (2.4, 11.6) HAQ-DI 22.7 30.9 8.2 (3.5, 13.7)
Patient GA 35.5 39.8 4.4 (−0.6, 9.1) Patient GA 31.4 34.9 3.4 (−0.9, 7.3)
Clinician GA 50.8 61.4 10.6 (2.6, 19) Clinician GA 47.8 54.1 6.2 (0.9, 15.1)
FVC% 11.8 19.2 7.3 (4, 11.6) FVC% 11.8 19.2 7.3 (4, 11.6)
At least 1 improvement 70.3 85.6 15.4 (11.2, 20.7) At least 1 improvement 65.7 80.7 14.9 (10.9, 21.9)
At least 2 improvement 49.3 60.5 11.1 (5.2, 17.6) At least 2 improvement 46.4 54.4 7.9 (3.5, 14)
At least 3 improvement 29.2 34.5 5.2 (−1.4, 8.8) At least 3 improvement 23.5 29.5 5.9 (−1.2, 10.7)
At least 4 improvement 10.4 15.3 4.9 (2, 9.1) At least 4 improvement 8.7 13.3 4.5 (2.5, 8.2)
All 5 improvement 1.4 2.2 0.8 (−1.1, 2) All 5 improvement 1.4 2.2 0.8 (−1.1, 2)
40% mRSS 33.5 41.6 8.1 (4.6, 12.1) 45% mRSS 29.3 37.1 7.9 (4.2, 12.1)
HAQ-DI 22.1 26.6 4.5 (0.6, 9.9) HAQ-DI 18.8 23.1 4.3 (0.8, 6.7)
Patient GA 27.4 32.6 5.1 (−0.7, 10.3) Patient GA 26.5 29.9 3.4 (−3.8, 6.6)
Clinician GA 45.5 50.2 4.7 (−1, 14.2) Clinician GA 41.5 43.5 1.9 (−2.2, 8.1)
FVC% 11.8 19.2 7.3 (4, 11.6) FVC% 11.8 19.2 7.3 (4, 11.6)
At least 1 improvement 62.8 73.4 10.6 (6.5, 16.2) At least 1 improvement 60.2 70.9 10.7 (6.9, 16.3)
At least 2 improvement 41.9 48.9 6.9 (1.6, 11.7) At least 2 improvement 36.8 42.1 5.2 (0.2, 9.1)
At least 3 improvement 19.0 26.4 7.3 (2, 12) At least 3 improvement 15.9 21.8 5.8 (2.4, 9.2)
At least 4 improvement 7.9 12.9 5 (2.9, 7.3) At least 4 improvement 7.1 10.8 3.7 (1.9, 5.6)
All 5 improvement 1.4 2.2 0.8 (−1.1, 2) All 5 improvement 1.4 1.5 0.1 (−1.1, 1)
50% mRSS 26.5 30.8 4.3 (1.7, 8.3) 55% mRSS 21.5 24.4 2.8 (−1.9, 7)
HAQ-DI 18.1 22.5 4.5 (1.2, 7.3) HAQ-DI 16.4 18.5 2.1 (−1.8, 4.7)
Patient GA 24.0 27.0 2.9 (−3.8, 6.5) Patient GA 17.4 26.2 8.8 (3.3, 11.4)
Clinician GA 34.2 39.8 5.6 (2.5, 10.6) Clinician GA 29.1 32.9 3.9 (1.6, 7.7)
FVC% 11.8 19.2 7.3 (4, 11.6) FVC% 11.8 19.2 7.3 (4, 11.6)
At least 1 improvement 56.2 65.4 9.3 (5, 12.7) At least 1 improvement 49.0 60.9 12.1 (7.5, 16.4)
At least 2 improvement 32.1 37.6 5.4 (0.4, 10.3) At least 2 improvement 26.4 30.6 4.1 (−0.1, 9.5)
At least 3 improvement 12.7 21.1 8.4 (6.2, 11.9) At least 3 improvement 9.8 17.6 7.7 (3.4, 12)
At least 4 improvement 7.1 8.3 1.2 (−1.1, 3.7) At least 4 improvement 5.5 6.5 1 (−0.9, 2.8)
All 5 improvement 0.9 1.5 0.6 (−0.1, 1) All 5 improvement 0.9 1.0 0.1 (−1, 1)
60% mRSS 19.4 17.8 −1.7 (−6.7, 5.2)
HAQ-DI 15.6 17.8 2.3 (−1.9, 5.8)
Patient GA 15.2 20.2 5.1 (0.2, 8.6)
Clinician GA 27.3 29.4 2.1 (−1.9, 5.8)
FVC% 11.8 19.2 7.3 (4, 11.6)
At least 1 improvement 47.1 56.4 9.4 (5, 14.4)
At least 2 improvement 23.1 23.9 0.7 (−3.7, 7.8)
At least 3 improvement 8.4 13.6 5.1 (1.5, 10.2)
At least 4 improvement 5.5 5.8 0.4 (−1.9, 2)
All 5 improvement 0.9 0.5 −0.4 (−1, 0)

5% improvement is calculated for FVC% in all improvement level; Risk Difference = proportion of participants who improved in active medication group – proportion of participants who improved in placebo group

Figure 1:

Figure 1:

Risk Difference (proportion of participants who improved in active medication group – proportion of participants who improved in placebo group) for improvement ≥ 10% in 5% increments in the three randomized controlled trials in the pooled data set (except for FVC% predicted, which is ≥ 5% improvement)

Performance of 5 core set measures: validation sets

Similar to the development sets, we saw analogous trends in the validation sets (N=117) where the proportion of participants who improved by ≥10% to ≥60% (in 5% increments) were numerically higher in the active therapy vs. placebo group for all five core set measures mRSS, HAQ-DI, PGA CGA, and FVC% the majority of the time. In addition, the patterns were similar for the proportion of participants with at least one to all five core set measures on active therapy numerically larger than those in the placebo group. The magnitude of the effects was comparable between the development and validation sets; for example, the proportion of participants who improved by ≥ 20% in ≥ one core set measure was 93% in active therapy vs. 80% in the placebo group, in ≥ two core set measures was 76% in active therapy vs. 58% in the placebo group, in ≥ three core set measures was 50% in active therapy vs. 36% in the placebo group, and in ≥ four core set measures was 28% in active therapy vs. 14% in the placebo group (Table 4 and Supplementary Figure 1). We used the same method as described for development sets to calculate RD and 95% CI).

Table 4:

Proportion of participants who achieved a predefined percentage of improvement for each core set measure and ≥ 1 core set measures in the Validation Data Set

Improvement Measures PBO N = 57 Active N = 60 Rate Difference (95% CI) Improvement Measures PBO N = 57 Active N = 60 Rate Difference (95% CI)
10% mRSS 65.6 81.3 15.8 (6.1, 24.4) 15% mRSS 58.9 79.5 20.7 (10.2, 30.1)
HAQ-DI 33.7 54.5 21 (11.4, 30.8) HAQ-DI 30.9 45.3 14.6 (0.8, 25)
Patient GA 57.0 57.7 0.5 (−8.3, 16.2) Patient GA 48.8 53.9 4.9 (−3.7, 20.8)
Clinician GA 61.4 80.4 18.9 (5.7, 28.9) Clinician GA 57.8 76.8 18.9 (3.7, 31.3)
FVC% 11.1 20.4 9.5 (0.3, 15.7) FVC% 11.1 20.4 9.5 (0.3, 15.7)
At least 1 improvement 86.8 97.3 10.4 (4.3, 20) At least 1 improvement 82.1 94.4 12.1 (6.4, 20)
At least 2 improvement 69.1 80.7 11.8 (−1.7, 20.4) At least 2 improvement 59.9 78.4 18.6 (6.8, 27.3)
At least 3 improvement 44.0 59.5 15.7 (3, 25.8) At least 3 improvement 39.4 54.5 15.3 (3.1, 25.7)
At least 4 improvement 15.6 36.6 21 (1.5, 28.3) At least 4 improvement 14.2 32.4 18.1 (−2.5, 29.5)
All 5 improvement 1.8 9.3 7.5 (3.8, 11.1) All 5 improvement 1.2 6.0 4.9 (1.9, 7.7)
20% mRSS 53.4 76.8 23.5 (15.9, 35.7) 25% mRSS 48.1 70.2 22.3 (14.2, 33.3)
HAQ-DI 29.8 40.6 10.8 (0.9, 21.2) HAQ-DI 26.3 37.6 11.4 (2.9, 21.2)
Patient GA 46.9 49.3 2.2 (−5.5, 17) Patient GA 40.8 46.0 5.2 (−5.3, 17.6)
Clinician GA 57.3 75.3 17.9 (3.7, 31.6) Clinician GA 50.9 72.5 21.6 (3.8, 38.8)
FVC% 11.1 20.4 9.5 (0.3, 15.7) FVC% 11.1 20.4 9.5 (0.3, 15.7)
At least 1 improvement 80.1 92.7 12.4 (4.4, 22.1) At least 1 improvement 74.3 90.7 16.2 (8.6, 24.1)
At least 2 improvement 57.7 75.8 18.1 (4.8, 27.5) At least 2 improvement 49.8 71.4 21.8 (4.8, 31.9)
At least 3 improvement 35.6 50.3 14.8 (3.1, 25.7) At least 3 improvement 31.0 45.4 14.6 (4, 27.5)
At least 4 improvement 13.6 27.7 14.1 (−0.4, 23) At least 4 improvement 11.8 23.7 12 (2, 23.3)
All 5 improvement 1.2 6.0 4.9 (1.9, 7.7) All 5 improvement 1.2 6.0 4.9 (1.9, 7.7)
30% mRSS 42.6 60.3 17.7 (8.7, 26.8) 35% mRSS 37.5 52.6 15.2 (8.9, 19.3)
HAQ-DI 22.4 32.9 10.4 (−1.7, 17.3) HAQ-DI 20.0 32.3 12.3 (0.3, 21.2)
Patient GA 33.7 41.5 7.7 (−1.3, 17.9) Patient GA 27.6 37.6 10.1 (2, 18.5)
Clinician GA 48.4 65.3 16.9 (−0.4, 33) Clinician GA 45.5 58.7 13.3 (−4.6, 23.9)
FVC% 11.1 20.4 9.5 (0.3, 15.7) FVC% 11.1 20.4 9.5 (0.3, 15.7)
At least 1 improvement 70.1 88.1 17.9 (6.6, 25.9) At least 1 improvement 65.3 84.5 19.3 (4.6, 27.4)
At least 2 improvement 44.5 65.0 20.5 (7.1, 31.5) At least 2 improvement 40.4 57.8 17.5 (5, 25.8)
At least 3 improvement 27.0 37.5 10.7 (3.3, 23.6) At least 3 improvement 20.6 34.0 13.7 (4.4, 27.3)
At least 4 improvement 7.0 17.8 10.8 (2, 16.2) At least 4 improvement 6.4 14.2 7.7 (0, 12.1)
All 5 improvement 1.2 3.3 2.2 (−0.2, 5.8) All 5 improvement 1.2 3.3 2.2 (−0.2, 5.8)
40% mRSS 31.6 45.3 13.7 (5.1, 20.6) 45% mRSS 27.9 40.6 12.5 (3.6, 19.8)
HAQ-DI 19.2 29.2 10 (−1.7, 17.3) HAQ-DI 15.5 24.4 8.9 (3.8, 16.7)
Patient GA 23.3 34.2 11 (0, 23.1) Patient GA 23.1 29.6 6.5 (0, 21.2)
Clinician GA 41.2 55.9 14.8 (−4.4, 26) Clinician GA 38.0 50.5 12.6 (0.2, 19.7)
FVC% 11.1 20.4 9.5 (0.3, 15.7) FVC% 11.1 20.4 9.5 (0.3, 15.7)
At least 1 improvement 61.1 79.7 18.5 (6.8, 27.2) At least 1 improvement 58.5 76.9 18.4 (6.8, 25.3)
At least 2 improvement 35.4 53.1 18 (8, 29.5) At least 2 improvement 31.8 45.6 14 (6, 23.7)
At least 3 improvement 15.6 28.5 13.1 (3, 23.4) At least 3 improvement 12.0 24.3 12.4 (5.3, 19.3)
At least 4 improvement 6.0 13.0 6.9 (2, 10.9) At least 4 improvement 5.6 9.3 3.6 (−0.5, 7.2)
All 5 improvement 1.2 3.3 2.2 (−0.2, 5.8) All 5 improvement 1.2 2.7 1.6 (−0.2, 3.8)
50% mRSS 25.5 35.8 10.3 (1.7, 15.7) 55% mRSS 21.5 27.2 5.9 (−2.5, 15.2)
HAQ-DI 14.9 23.6 8.7 (2.2, 15.3) HAQ-DI 14.0 19.9 5.7 (0.2, 13.4)
Patient GA 21.8 27.6 5.7 (−1.9, 19.3) Patient GA 14.6 27.2 12.6 (7.4, 23.6)
Clinician GA 32.3 45.1 12.8 (2.7, 19.1) Clinician GA 26.8 37.7 10.8 (3, 15.5)
FVC% 11.1 20.4 9.5 (0.3, 15.7) FVC% 11.1 20.4 9.5 (0.3, 15.7)
At least 1 improvement 54.5 70.6 15.9 (8.8, 23.6) At least 1 improvement 47.1 66.0 18.8 (10.1, 28.8)
At least 2 improvement 29.2 41.0 12 (2, 22.7) At least 2 improvement 22.8 33.8 11.2 (0, 20.2)
At least 3 improvement 10.4 23.7 13.4 (6, 18.2) At least 3 improvement 8.2 19.0 10.9 (2, 19.4)
At least 4 improvement 5.6 8.5 2.8 (−2.6, 7.2) At least 4 improvement 5.0 6.4 1.3 (−2.6, 5)
All 5 improvement 0.2 2.7 2.5 (1.9, 4.1) All 5 improvement 0.2 2.0 1.8 (0, 3.6)
60% mRSS 19.6 21.0 1.5 (−12.5, 12.2)
HAQ-DI 13.6 19.3 5.5 (−1.5, 14.2)
Patient GA 12.7 19.2 6.3 (−1.2, 16.3)
Clinician GA 23.7 34.3 10.6 (3.1, 18.7)
FVC% 11.1 20.4 9.5 (0.3, 15.7)
At least 1 improvement 44.9 61.5 16.5 (6.5, 25.1)
At least 2 improvement 19.4 25.9 6.7 (−8, 15.7)
At least 3 improvement 7.0 15.3 8.4 (−2, 15.5)
At least 4 improvement 5.0 5.8 0.7 (−2.6, 5)
All 5 improvement 0.2 1.0 0.8 (−0.1, 2)

5% improvement is calculated for FVC% in all improvement level; Risk Difference = proportion of participants who improved in active medication group – proportion of participants who improved in placebo group

Development and validation sets using FVC cut off 10% improvement

The data were similar when we used an FVC% improvement of ≥ 10% instead of ≥ 5% [Supplementary Tables 1 and 2].

Performance of ACR-CRISS. vs. Revised CRISS

ACR-CRISS showed a ceiling effect (defined as proportion of patients achieving ACR-CRISS of >0.995) of 17.8% in the placebo group and 32.1% in the active therapy group (Table 2). In addition, there was a high floor effect (defined as proportion of patients achieving ACR-CRISS of <0.005) of 15.7% and 32.6% in active and placebo groups, respectively.

Figure 2 displays boxplots of ACR-CRISS between improved and non-improved groups based on revised CRISS with thresholds of ≥10%, ≥20%, …, ≥60%. At improvement thresholds of ≥ 10% and ≥ 20%, the median ACR-CRISS was 0.99 among those with improvement, versus 0.01 among those without (correlation coefficient of 0.63 and 0.62, respectively, p<0.001 for each comparison). The magnitude of difference is attenuated as threshold increases (correlation coefficients from 0.38 to 0.59, p<0.001 for each comparison), but differences remains statistically significant.

Figure 2:

Figure 2:

Concordance between ACR-CRISS vs. Revised CRISS from 10% to 60%, in 10% increments

DISCUSSION

SSc is multisystem heterogenous disease with variable disease course. Traditionally, clinical trials have focused on utilizing mRSS as the primary outcome measure in dcSSc due to its relationship to internal organ involvement in early disease and meeting the OMERACT filters. However, recent trials have shown marked heterogeneity in early disease despite enriching the trial population of disease duration, biomarkers, and/ or mRSS cut off.3 6 8 15 As an example, post-hoc analysis from the abatacept Phase 2 trial show that skin gene expression predicted differential responses in mRSS, FVC% and HAQ-DI.

ACR-CRISS was developed to address the limitations of mRSS and other outcome measures using well-established consensus and evidence-based input. ACR-CRISS, a global measure, is based on a probability score of 0.0 (no improvement) to 1.0 (marked improvement) and includes two steps. The first step assesses for worsening or incident cases of cardio-pulmonary-renal involvement and gives a score of 0.0. For those who do not meet Step 1, a weighted probability score is calculated that incorporates changes in five physical or functional areas — mRSS (assessment of skin), FVC% predicted (assessment of lungs), HAQ-DI (measure of patient function), PGA, and CGA. The weights are derived from physician consensus of ranking real patient profiles. The ACR-CRISS has worked well in recent four prospective trials including lenabasum, abatacept, and tocilizumab phase 2 and 3, statistically favoring active therapy from placebo (p< 0.05 for all analyses). Apart from lenabasum, all three trials had incorporated mRSS as the primary outcome measure and showed non-significant trends favoring the active therapy in mRSS.15 16 Despite the non-significant results on the mRSS, tocilizumab had robust influence on FVC% predicted with preservation of lung function in two separate trials and abatacept had clinically meaningful impact on HAQ-DI.

The positive results with the ACR-CRISS has been discussed in the scleroderma community where the researchers have queried the interpretation of the probability score. A cut off of ≥ 0.60 has been proposed as clinically meaningful cut point for the ACR CRISS.4 However, there is concern that the ACR CRISS score can be driven one core set measure, especially mRSS, since it has the highest weight in the probability score. In addition, recent topline data from lenabasum had the ACR-CRISS score of 0.887 in the placebo group at week 52, suggesting a ceiling effect and a similar trend was seen in a 6-month double blind trial of autotaxin inhibitor vs. placebo;9 both trials allowed background immunosuppressive therapy as part of the trial design.

To address this, we followed the principles laid by Paulus et al that were later modified to develop the ACR 20% criteria for ACR20,10 the gold standard for approval of drugs by regulatory agencies for RA and was later adopted for juvenile arthritis and psoriatic arthritis. Paulus et al analyzed four RCTs in RA,10 which were conducted as part of consortium with agreed set of core set measures but no pre-specified primary outcome measure. One of the rationales of the composite index in RA was lack of interpretation and comparison between different trials where similar outcome measures were incorporated, with certain measures showed statistically significant differences and others did not. In addition, the data were presented as mean and median changes over time and it was difficult for a clinician to assign clinical importance to this data. The trial design has improved over last 30 years where SSc trials are pre-specifying the appropriate primary outcome measure and statistical testing. From a purely statistical interpretation of the three RCTs analyzed for this report, the trials are negative. However these trials provide a platform to explore and develop new composite end points as all three RCTs included five core set measures and had one or more core set measures that favored active medication over placebo, similar to the process proposed by Paulus et al. Using pooled data from three clinical trials (and reinforced by analyses to support internal validity using development and validation sets), we showed that the three active therapies had a higher proportion of patients who improved in ≥ one core set measure compared to the placebo. The effect was consistent from 10% to 60% improvement in ≥one core set measure.

We found that but the active therapy group had consistent results but there were some differences in the placebo response between the three RCTs (data not shown) This can be explained by different inclusion and exclusion criteria and geographic locations where the trials were conducted. The abatacept and tocilizumab phase 2 trials were conducted in the North America and Europe whereas tocilizumab phase 3 trial was conducted in multiple countries throughout the World. The higher placebo response in the tocilizumab phase 3 may reflect expectations of the patients in different countries or other unexplained variables.

In our current analysis, the ACR CRISS showed both ceiling and floor effect, that may impact responsiveness to change.17 We believe that recent trials on background immunosuppressives may increase the ceiling effect (although this needs to be analyzed). We also acknowledge that the proposed revised CRISS measures are a dichotomous index. Although it is well known that dichotomizing a continuous outcome variable reduces power and precision, the current analysis of ACR CRISS indicates that there is bimodal distribution with discontinuity in values and we believe that the impact of this loss of power and precision in revised CRISS may be balanced by the improvement in clinical interpretation.

Although the different cut offs showed trends favoring active therapy, we propose two ways to consider incorporating the revised CRISS as starting point for future trials (Supplementary Figure 2). First, we can consider ACR CRISS 20 or 25% which translates into at least 20% or 25% improvement in mRSS, HAQ-DI, PGA, and CGA (with 5% or 10% improvement in FVC). This is based on the minimal clinically important differences (MCIDs) that are published in different rheumatic diseases, including SSc, for five core set measures. For mRSS, an improvement of 24% is considered as the MCID.18 For HAQ-DI, the published MCID estimate is 0.22 and the mean HAQ-DI scores in the three RCTs was 1.2— a relative change of 19%. A change of one unit (on a 0–10 scale) is considered as the MCID estimate for global assessments and the mean baseline scores for PGA and CGA in the three RCTs were 5.4 and 5.8, a 18–19% relative change. For the FVC%, we only evaluated 5% and 10% relative improvement as an improvement above is unreasonable in a fibrotic progressive lung fibrosis.19 20 In the three RCTs presented here, there are 15.6% of participants who improve by ≥ 5% and 7.8% who improve by ≥ 10%. In the SENCIS trial with established ILD, the percentage who improved by 5% and 10% were approximately 7.0% - 12.9%, respectively.21 In addition, the intra observer variability of FVC% was 5% in the SLS I and II and improvement beyond this can be considered clinically important.20 Second option is to limit the proportion of patients who improve in the placebo group to < 20% in the composite end point, similar to the Paulus criteria. Analyzing the development and validation sets, a cut of 40% for at least three of five core set measures achieved a placebo effect of <20%.

Based on our current analyses and review of the published and unpublished data from recent trials (such as lenabasum), we believe that revised CRISS may provide an anchor for clinical meaningfulness and provide assurance to the clinicians and regulators that the improvement in three of five core set measures (as an example) with improvement of ≥ 20% is driven by more than one core set measure (and not driven by mRSS, which has the greatest weight in the ACR-CRISS). This, in turn, will improve the interpretation of the data (similar to RA RCTs). Finally, we show a high floor and ceiling effect of ACR-CRISS and the revised CRISS has an advantage to limit the ceiling effect as different cut points can be chosen, as done in RA (such as ACR 20%, 50% and 70% response criteria).

For incorporation in an RCT, we propose that the researchers continue to include Step 1 score in the assessment of revised CRISS (Supplementary Figure 2). The Step 1 consists of cardio-pulmonary-renal involvement and consideration should be given to add significant gastrointestinal dysmotility requiring parenteral or enteral nutrition and significant digital ischemia requiring hospitalization, gangrene or amputation (as they are important end organ damage in early SSc). If a patient meets Step 1, they are considered as not-improved and give a percentage change of 0 for each core set item and included in the Step 2. For remaining patients who do not meet Step 1, an appropriate cut off should be proposed in the Step 2 that may range from 20–40% for at least three of five core set measures (as discussed above) but should be driven by future trials, with and without background immunosuppressive therapies.

The strengths of the current analysis include careful evaluation of three RCTs with individual-level data. Second, we carefully estimated treatment differences for various definitions using development and separate validation sets using patient-level data.

The limitations of this study include analysis of trials with negative primary end point of mRSS. Tocilizumab clinical trials showed a large favorable benefit on FVC and abatacept showed statistical improvements in HAQ-DI and CGA. In addition, all three RCTS showed trends favoring other core set measures and we considered it as an appropriate database for this exercise. In addition, there was apparent heterogeneity in the RD between the development and validation sets (Tables 2 and 3) that stratified sampling did not completely address; thus, our results should be validated in an independent cohort. Finally, all three RCTs were performed on no background immunosuppressive therapies and the response may be different in those with background immunosuppressive therapies.

In conclusion, we show that the proportion of patients who achieved a predefined percentage of improvement in ≥ one core set measures was higher in active therapy vs. placebo and propose a new composite outcome measure for early dcSSc, which addresses certain limitations of ACR-CRISS score. This composite measure should be considered preliminary and rigorously tested in the recently completed and ongoing clinical trials on background immunosuppressive therapy to assess its performance vs. ACR-CRISS.

Supplementary Material

1

Key Messages.

What is already known about this subject?

  • Systemic sclerosis is a multisystem heterogenous disease. American College of Rheumatology Composite Response Index in Systemic Sclerosis (ACR-CRISS) is a composite end point to assess the likelihood of improvement in diffuse cutaneous systemic sclerosis (dcSSc).

  • ACR-CRISS is a weighted score and includes 5 core set measures: modified Rodnan skin score, FVC% predicted, HAQ-DI, patient and clinician global assessments.

  • ACR-CRISS is a weighted score (making it difficult to interpret) and has high floor and ceiling effects.

  • We analyzed core set measures from 347 patients who participated in 3 placebo-controlled trials and assessed for proportion of participants who improve above a certain threshold, similar to ACR 20% criteria for rheumatoid arthritis.

What does this study add?

  • Participants who were on active medication had statistically higher improvement in >1 of 5 core set measures vs. the placebo group.

  • The proportion who improved by ≥ 20% in ≥3 core set measures was 49% in the active vs. 39% in the placebo; Risk Difference: 10%, 95%CI:5%−16%.

  • Same trends were seen for different cutpoints for ≥3 core set measures favoring active medication group

How might this impact on clinical practice or future developments?

  • The proposed new composite end point (Revised CRISS) may provide easy interpretation and reduce floor and ceiling effects in clinical trials of dcSSc.

Funding:

Dr. Khanna’s work was supported by the NIH/National Institute of Arthritis and Musculoskeletal and Skin Diseases (K24-AR-063120 & 1R01-AR070470-01A1).

Competing Interests:

Dinesh Khanna reports grants from NIH/NIAMS during the conduct of the study; grants from Immune Tolerance Network, grants and personal fees from Bayer, grants from Bristol Meyer Squib, grants from Horizon, grants from Pfizer, personal fees from Acceleron, personal fees from Acetlion, personal fees from Amgen, personal fees from Blade Therapeutics, personal fees from Boehringer Ingelheim, personal fees from CSL Behring, personal fees from Corbus, personal fees from Cytori, personal fees from Galapagos, personal fees from Genentech/Roche, personal fees from GSK, personal fees from Horizon, personal fees from Merck, personal fees from Mitsubishi Tanabe Pharma, personal fees from Regeneron, personal fees from Sanofi-Aventis, personal fees from United Therapeutics, other from Impact PH, personal fees from Eicos Sciences, Inc, personal fees and other from CiviBioPharma/Eicos Sciences, Inc. outside the submitted work.

Suiyuan Huang has nothing to report.

Celia Lin reports other from Genentech during the conduct of the study; other from Genentech outside the submitted work and owns stock in Roche.

Cathie Spino reports statistical consulting from Eicos Sciences, Inc. outside the submitted work.

REFERENCES

  • 1.Denton CP, Khanna D. Systemic sclerosis. Lancet 2017;390(10103):1685–99. doi: 10.1016/S0140-6736(17)30933-9 [published Online First: 2017/04/18] [DOI] [PubMed] [Google Scholar]
  • 2.Nagaraja V, Cerinic MM, Furst DE, et al. Current and future outlook on disease modification and defining low disease activity in systemic sclerosis. Arthritis & rheumatology (Hoboken, NJ) 2020. doi: 10.1002/art.41246 [published Online First: 2020/03/07] [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Khanna D, Spino C, Johnson S, et al. Abatacept in Early Diffuse Cutaneous Systemic Sclerosis: Results of a Phase II Investigator-Initiated, Multicenter, Double-Blind, Randomized, Placebo-Controlled Trial. Arthritis & rheumatology (Hoboken, NJ) 2020;72(1):125–36. doi: 10.1002/art.41055 [published Online First: 2019/07/26] [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Khanna D, Berrocal VJ, Giannini EH, et al. The American College of Rheumatology Provisional Composite Response Index for Clinical Trials in Early Diffuse Cutaneous Systemic Sclerosis. Arthritis & rheumatology (Hoboken, NJ) 2016;68(2):299–311. doi: 10.1002/art.39501 [published Online First: 2016/01/26] [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Khanna D, Denton CP, Lin CJF, et al. Safety and efficacy of subcutaneous tocilizumab in systemic sclerosis: results from the open-label period of a phase II randomised controlled trial (faSScinate). Ann Rheum Dis 2018;77(2):212–20. doi: 10.1136/annrheumdis-2017-211682 [published Online First: 2017/10/27] [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Khanna D, Lin CJF, Furst DE, et al. Tocilizumab in systemic sclerosis: a randomised, double-blind, placebo-controlled, phase 3 trial. Lancet Respir Med 2020;8(10):963–74. doi: 10.1016/S2213-2600(20)30318-0 [published Online First: 2020/09/01] [DOI] [PubMed] [Google Scholar]
  • 7.Khanna D, Lin C, Furst DE, et al. A randomised placebo-controlled phase 3 trial of tocilizumab in systemic sclerosis. Lancet Respiratory Medicine 2020 [DOI] [PubMed] [Google Scholar]
  • 8.Khanna D, Denton CP, Jahreis A, et al. Safety and efficacy of subcutaneous tocilizumab in adults with systemic sclerosis (faSScinate): a phase 2, randomised, controlled trial. Lancet 2016;387(10038):2630–40. doi: 10.1016/S0140-6736(16)00232-4 [published Online First: 2016/05/10] [DOI] [PubMed] [Google Scholar]
  • 9.Khanna D, Denton C, Furst D, et al. A Phase 2a Randomized, Double-blind, Placebo-controlled Study of Ziritaxestat in Early Diffuse Cutaneous Systemic Sclerosis (NOVESA) [abstract]. Arthritis & rheumatology (Hoboken, NJ) 2020;72 (supple 10) [DOI] [PubMed] [Google Scholar]
  • 10.Paulus HE, Egger MJ, Ward JR, et al. Analysis of improvement in individual rheumatoid arthritis patients treated with disease-modifying antirheumatic drugs, based on the findings in patients treated with placebo. The Cooperative Systematic Studies of Rheumatic Diseases Group. Arthritis Rheum 1990;33(4):477–84. doi: 10.1002/art.1780330403 [published Online First: 1990/04/01] [DOI] [PubMed] [Google Scholar]
  • 11.Felson DT, Anderson JJ, Boers M, et al. American College of Rheumatology. Preliminary definition of improvement in rheumatoid arthritis. Arthritis Rheum 1995;38(6):727–35. doi: 10.1002/art.1780380602 [published Online First: 1995/06/01] [DOI] [PubMed] [Google Scholar]
  • 12.Chung L, Spino C, McLain R, et al. Safety and efficacy of abatacept in early diffuse cutaneous systemic sclerosis (ASSET): open-label extension of a phase 2 double-blind randomised trial. Lancet Rheumatol 2020;(in press) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Steyerberg EW, Bleeker SE, Moll HA, et al. Internal and external validation of predictive models: a simulation study of bias and precision in small samples. J Clin Epidemiol 2003;56(5):441–7. doi: 10.1016/s0895-4356(03)00047-7 [published Online First: 2003/06/19] [DOI] [PubMed] [Google Scholar]
  • 14.Harrington D, D’Agostino RB Sr., Gatsonis C, et al. New Guidelines for Statistical Reporting in the Journal. N Engl J Med 2019;381(3):285–86. doi: 10.1056/NEJMe1906559 [published Online First: 2019/07/18] [DOI] [PubMed] [Google Scholar]
  • 15.Spiera R, Hummers L, Chung L, et al. Safety and Efficacy of Lenabasum in a Phase II, Randomized, Placebo-Controlled Trial in Adults With Systemic Sclerosis. Arthritis & rheumatology (Hoboken, NJ) 2020. doi: 10.1002/art.41294 [published Online First: 2020/04/27] [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Spiera R, Hummers L, Chung L, et al. OP0006 Safety and efficacy of lenabasum (JBT-101) in diffuse cutaneous systemic sclerosis subjects treated for one year in an open-label extension of trial jbt101-ssc-001. 2018;77(Suppl 2):52–52. doi: 10.1136/annrheumdis-2018-eular.3512 %J Annals of the Rheumatic Diseases [DOI] [Google Scholar]
  • 17.Hays RD, Hadorn D. Responsiveness to change: an aspect of validity, not a separate dimension. Qual Life Res 1992;1(1):73–5. doi: 10.1007/BF00435438 [published Online First: 1992/02/01] [DOI] [PubMed] [Google Scholar]
  • 18.Khanna D, Clements PJ, Volkmann ER, et al. Minimal Clinically Important Differences for the Modified Rodnan Skin Score: Results from the Scleroderma Lung Studies (SLS-I and SLS-II). Arthritis Res Ther 2019;21(1):23. doi: 10.1186/s13075-019-1809-y [published Online First: 2019/01/18] [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Distler O, Highland KB, Gahlemann M, et al. Nintedanib for Systemic Sclerosis-Associated Interstitial Lung Disease. N Engl J Med 2019;380(26):2518–28. doi: 10.1056/NEJMoa1903076 [published Online First: 2019/05/22] [DOI] [PubMed] [Google Scholar]
  • 20.Kafaja S, Clements PJ, Wilhalme H, et al. Reliability and minimal clinically important differences of forced vital capacity: Results from the Scleroderma Lung Studies (SLS-I and SLS-II). Am J Respir Crit Care Med 2018;197(5):644–52. doi: 10.1164/rccm.201709-1845OC [published Online First: 2017/11/04] [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Committee USFDAAA. OFEV® (nintedanib) Capsules for Systemic Sclerosis-associated Interstitial Lung Disease (SSc-ILD) 2019. [Available from: https://www.fda.gov/media/129748/download accessed 21 October 2020.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES