Abstract
Aims
Composite indices for quantifying rheumatoid arthritis (RA) disease activity such as the 28‐joint disease activity score (DAS28) are comprised of single parameters (‘metrics’) in various combinations. Population modelling methods were used to evaluate single metrics for their ability to reflect changes in disease activity with a view to understanding and improving composite indices.
Methods
A total of 11 single metrics of RA disease activity (tender and swollen joint counts, acute phase reactants and global health, pain and physical function assessments) were obtained from 203 patients with recent onset RA. Participants received combination disease‐modifying anti‐rheumatic drugs (DMARDs) according to a treat‐to‐target approach with a pre‐defined protocol for treatment intensification. Models describing each metric's magnitude and variability of change from baseline to a single ‘treated’ state in the population were developed using nonmem ®. Measures that displayed uniformly large changes between states across the population were ranked higher in terms of discriminatory capacity.
Results
Joint counts demonstrated a greater ability to discriminate changes in RA disease activity than others. Correlations between metrics demonstrated that erythrocyte sedimentation rate (ESR) had limited relationships with others for baseline scores and changes in RA disease activity (r generally < 0.2). However it appeared to be important in describing changes for those individuals where ESR levels were initially elevated.
Conclusion
It appears unlikely that a single group of metrics may be suitable to capture disease activity changes across all RA patients and defining the most appropriate metric(s) for individual patients will be an important area of future research.
Keywords: disease activity, disease modifying anti‐rheumatic drugs, population modelling, rheumatoid arthritis
What is Already Known about this Subject
Titrating doses of disease modifying anti‐rheumatic drugs (DMARDs) according to a patient's current level of disease activity is effective for achieving remission in rheumatoid arthritis (RA).
A single or composite measure that best reflects response to DMARD therapy (or lack thereof) is yet to be identified.
A previous population model of RA disease progression described high within‐patient random variability suspected to be due to fluctuations within the modelled composite measure itself.
What this Study Adds
Swollen and tender joint counts, physician's assessment of patient's global health and pain were found to distinguish the effect of DMARD therapy better than acute phase reactants or the patient's assessment of global health.
Correlations between RA disease activity metrics demonstrated that acute phase reactants have limited relationships with other metrics in terms of baseline and changes in disease activity.
Acute phase reactants may be important descriptors for changes in RA disease activity in those whose levels were elevated initially.
Introduction
The management of rheumatoid arthritis (RA) commonly employs combinations of disease modifying anti‐rheumatic drugs (DMARDs) and frequent assessments of disease activity to guide dose and regimen adjustments until a pre‐defined low disease activity state or remission has been achieved (the ‘treat‐to‐target’ approach) 1, 2, 3, 4.
The 28‐joint disease activity score (DAS28) 5, 6 is a composite index that incorporates the 28‐tender joint count (TJC28), 28‐swollen joint count (SJC28), erythrocyte sedimentation rate (ESR) and the patient's assessment of global health (PtGH) into a single summary score on a continuous scale 5. However, the parameters comprising the DAS28 have the potential to reflect changes in disease activity falsely due to patient subjectivity and lack of reproducibility of the assessment (such as PtGH) or can be elevated during bouts of acute illness unrelated to RA (such as ESR). These parameters and the like (such as the physician's assessments of patient's global health [PhGH], pain, fatigue, C‐reactive protein [CRP], etc.) have also been incorporated in other commonly used composite indices 5, 7, 8. While each has their strengths, all composite indices exhibit some degree of unexplained intra‐ and inter‐individual variability owing to the parameters used to construct them. Unwarranted intensification may occur on the basis of an elevated parameter, or conversely, intensification may be overlooked if there is a large enough decrease by a single one. Therefore, there is a need for robust clinical tools that accurately reflect current and predict future disease activity to guide an acceptable balance between reduction of disease activity and adverse drug events as a result of unnecessary regimen intensification.
Population models have the potential to provide platforms for predicting future disease activity in individual patients, thus offering clinicians tools to guide titration of therapy 9. The trajectory of DAS28 during combination DMARD therapy with the treat‐to‐target approach has previously been described using an empirical population model based on data from early RA patients over 12 months follow‐up 10. However, large and unexplained intra‐individual fluctuations in the DAS28 impaired the model's ability to forecast future disease activity, limiting its clinical utility 10. The aim of the present research was to revisit the single parameters comprising composite indices and identify those with a higher potential to reflect disease activity changes (resultant of DMARD therapy) falsely, and to evaluate the impact these parameters have on the ability of the DAS28 to discriminate changes in RA disease activity.
Methods
A series of models describing the magnitude and direction of change in scores for a number of single RA disease activity parameters from a ‘baseline’ clinic visit (just prior to the initiation of DMARD therapy) to a single follow‐up ‘treated’ clinic visit were developed. It was assumed that across the population the DMARD regimen prescribed was sufficiently effective to result in a decrease from high disease activity at baseline to lower disease activity at the treated clinic visit. We use the term ‘metric’ hereon to refer to a measureable item. The term ‘parameter’ is used in the strict modelling sense, although we recognise that in general use, parameter is also interchangeable with metric.
Study population
Data were obtained from early RA patients who attended the Early Arthritis Clinic (EAC) at the Royal Adelaide Hospital (RAH) between September 1998 and June 2014. Inclusion criteria required an age older than 18 years, a diagnosis of RA according to the revised American College of Rheumatology Criteria 11 and no prior use of DMARDs 12. Treatment with a standardised combination DMARD regimen was initiated according to a treat‐to‐target approach with pre‐defined triggers for treatment intensification as described by Proudman et al. which is depicted in Supplementary Figure 1 12. Ethics approval was obtained from the RAH Human Research and Ethics Committee (RAH Protocol Number 120 618) and the University of South Australia Human Research and Ethics Committee (UniSA Protocol Number 33 610) and all patients gave informed consent for inclusion in the cohort.
Disease activity and patient data
A total of 11 single metrics were assessed at baseline and subsequent follow‐up visits. As patients did not have the same follow‐up patterns, may have been lost to follow‐up or were in the process of being followed‐up, a treated visit was the patient's final clinic visit after at least 26 weeks of combination DMARD therapy of a 2 year period. Metrics used in the study were those included in the DAS28 [TJC28, SJC28, ESR (mm h–1) and PtGH (100 mm visual analogue scale)], the 56‐tender joint count (TJC56), 44‐swollen joint count (SJC44), CRP (mg l–1), PhGH, pain, fatigue (all measured on a 100 mm visual analogue scale) and modified health assessment questionnaire (mHAQ; 0–3). At baseline the patient's age, gender, height, smoking status, presence of the shared epitope, anti‐cyclic citrullinated peptide antibody (ACPA) and rheumatoid factor (RF) titres were recorded, and at each clinic visit patient weight, body mass index (BMI), systemic corticosteroid use and DMARDs administered (and their doses) were also recorded.
Base model development of single metrics
Model development used nonmem ® Version VII Level 2.0 (ICON Dev. Soln, Ellicott City, MD, USA) 13 with the Wings for nonmem (Version 720) interface (http://wfn.sourceforge.net/) and the G95 Fortran compiler. Population parameter estimation used the first order conditional estimation with interaction (FOCE‐I) method and individual parameter estimates (IPRE) were obtained using the Bayesian POSTHOC functionality of nonmem ®. Statistical and graphical output was generated using the R programming and statistical language (version 3.1.1) 14 and the doBy, ggplot2, grid, Hmisc, overlap, plyr, reshape2 and sfsmisc R packages 14, 15, 16, 17, 18, 19, 20, 21.
Structural models
Each metric was initially modelled independently. Baseline and treated scores were differentiated by a treatment variable, TRT (0 for baseline, and 1 for treated scores), in the input dataset and not by their actual assessment time. Structural models described population baseline and treated scores as the typical population change (denoted as ‘effect’ parameters) additive, proportional or a combination of the two effects, to the typical population value for baseline. For example, predicted scores described by an additive structural model were represented by:
| (1) |
where SCORETTRT is the population predicted score at treatment state, TRT, BASETRT is the parameter describing the typical baseline population value, and EFFADD is an effect parameter describing the typical population change between baseline and treated scores. For TRT = 0, effect parameters were set to 0. For TRT = 1, effect parameters took the estimated values to calculate the treated score as a function of the BASETRT parameter.
Random effect models
Population parameter variability (PPV) for parameters was assumed to be normally distributed:
| (2) |
where EFFADDj , TRT is the effect parameter describing the change in score from baseline to the treated state for the jth individual at treatment phase, TRT, θEFFADD is the typical population value, ηEFFADD, j is an independent random variable describing the variability in θEFFADD among individuals with a mean of 0 and variance, ω2.
Models with and without covariance for random effects were also investigated using an off‐diagonal variance–covariance matrix. A fixed additive residual error term with a variance of 0.00001 was incorporated, thus amalgamating unexplained variability with between subject variability as these are not separable when individuals only have one observation per treatment state.
Transformations
A logit transform was employed to constrain predicted scores within the defined upper and lower boundaries (UBT and LBT) as inherent in the metric design, i.e., 0 and 28 for TJC28. The logit transform turned normally distributed SCORETRT values into bounded skewed distributions for each score as appropriate (Equation (3)). An empirical method for scaling the transform's boundaries was employed to handle observations at the metric's boundaries and to improve model convergence (more detail available in the Supplementary Methods).
| (3) |
where SCORETRT is the score at treatment state, TRT, SCORETTRT is the predicted score in the logit domain, and UBM and LBM are the model's scaled upper and lower logit boundaries, respectively.
Covariate analyses
Baseline patient characteristics investigated for covariate relationships included weight, BMI, age, gender, presence of RF and ACPA, smoking status and carriage of the shared epitope. Owing to the temporal relationship in which DMARD doses are titrated in the treat‐to‐target approach and the tapering of oral corticosteroids, their relationships with the two treatment states were not tested. However, the effects of a physician's decision to administer a dose of intra‐articular (i.a.) or intra‐muscular (i.m.) corticosteroids (or both), at the time of clinic visit due to perceived disease activity flare‐ups or insufficient response to combination DMARD therapy were explored.
The effect of a categorical covariate on a parameter was represented as a binomial relationship. For example, the effect of gender (SEX) on the effect parameter at treatment phase, TRT, was described as:
| (4) |
where SEX has a value of 0 for males and 1 for females, and θSEX is the estimable parameter for the effect of female gender on EFFADDj , TRT when TRT = 1. When TRT = 0, EFFADDj , TRT = 0.
The effect of a continuous time‐dependent covariate on a parameter was represented as a power function referenced to the median of the observed data. For example, the effect of BMI on the BASETRT parameter was described as:
| (5) |
where BMIj , TRT is the measured BMI in the jth individual at treatment phase, TRT, θBASE is the typical population value for the BASETRT parameter, BMIS is the median BMI in the observed population and θBMI is the estimable parameter for the effect of BMI on BASEj, TRT.
All covariates were empirically tested on all parameters. All covariates shown to be significant from univariate analyses were combined to form the full covariate model. Selection of the final covariate model was conducted by eliminating covariates from the full covariate model in order of least to most significance, where a covariate remained in the model if its removal resulted in a significant worsening of the model's goodness‐of‐fit (Supplementary Methods).
Model selection and evaluation
Base and covariate model selection was based on a standard composite of numerical and graphical criteria (Supplementary Methods). Models were evaluated by simulation for their ability to represent the observed baseline and treated distributions of scores for their respective disease activity metric. The final parameter estimates for each model and the R programming language were used to simulate 1000 datasets based on the patients in the index dataset. Candidate models were evaluated on their ability to encompass the observed baseline and treated distribution densities within the 95% prediction intervals.
Estimating correlation between metrics
The final models for each metric were combined into one model – the ‘combined model’ – in order to estimate the correlation between each of the BASETRT and effect parameters. Values for population parameters and the residual error term were fixed to those of the final individual models during the estimation process. Relative standard errors for covariance terms were determined by 200 bootstraps of the index dataset.
Discriminatory capacity of disease activity metrics
Two summary variables were used to quantify key aspects of the transition between baseline and treated states in order to identify metrics possessing higher discriminatory properties. A plot was constructed of treatment effect variability versus treatment effect magnitude.
Treatment effect variability
The variability of the effect parameter (model estimate for ) unexplained by covariates was expressed as the coefficient of variation (%CV) for the population typical value of the effect parameter(s).
Treatment effect magnitude
Model parameter values in the logit domain (due to the applied logit transform) are difficult to interpret intuitively. Therefore, both baseline and the treated population scores for each metric were converted from the logit to the linear domain in order to determine a metric's treatment effect magnitude.
It was considered that the difference between zero and the population baseline value was the maximum possible magnitude of change for the population typical patient between the two treatment states for a given metric. The estimated magnitude of change for the population typical patient was then calculated as the difference in the converted population typical scores for baseline and the treated state expressed as a percentage of the metric's maximum (i.e. 100% was when the population treated value equalled zero [maximum treatment effect] and 0% when it equalled its population baseline value [no treatment effect]).
Simulations of composite indices
The impact of removing a single metric from the DAS28 5 on its discriminatory capacity between treatment states were explored by simulation. This was considered as proof of concept for the utility of the single metric models to assess the performance of composite indices.
Original DAS28 scores (equation (6)) were calculated in each individual for both treatment states of the observed dataset and the combined model's 1000 simulation dataset.
| (6) |
The combined model was evaluated on its ability to encompass the observed baseline and treated distribution densities for original DAS28 within its simulated 95% prediction intervals. The method was repeated for a reduced version of the original DAS28 (‘reduced DAS28’) with a poor discriminatory variable removed from the original DAS28 equation (as judged by our two summary variables described earlier).
The total area under the combined baseline and treated distribution densities (P total), and the area under the line of the baseline‐treated intersection (P intersection) for each simulation for each of the original and reduced DAS28 were determined using the sfsmisc package for R 16. The shared area was calculated as:
| (7) |
A decrease in the median shared area of the 1000 simulations from the original to reduced DAS28 suggested that removing the single metric improved the DAS28's ability to discriminate changes in disease activity.
Results
Model development
The overall dataset consisted of 203 early RA patients and the population's characteristics are summarised in Table 1. The distributions for baseline and treated scores for all metrics were described by a single effect parameter additive to the BASETRT parameter (equation (1)), with covariance between all random effects (Table 2).
Table 1.
Summary of population characteristics (n = 203)
| Characteristic | Baseline* | Treated* | Missing data (%) | |
|---|---|---|---|---|
| Follow‐up (weeks) | ‐ | 104 (97–105) | 0 | |
| DAS28‐ESR (0–9.4) | 5.4 (4.6–6.3) | 2.8 (1.9–4.1) | 0 | |
| Age (years) | 55 (44–65) | 57 (46–67) | 4 | |
| Weight (kg) | 73 (64–89) | 74 (65–91) | 13 | |
| BMI † ( kg m – 2 ) | 27 (24–31) | 27 (24–32) | 13 | |
| Gender | Female | 144 (71%) | ‐ | 0 |
| Smoking status | Never | 88 (43%) | ‐ | 0 |
| Current | 40 (20%) | ‐ | ||
| Past | 75 (37%) | ‐ | ||
| ACPA † | Positive | 124 (61%) | ‐ | 2.5 |
| Rheumatoid factor | Positive | 134 (66%) | ‐ | 1 |
| Shared epitope | Positive | 120 (59%) | ‐ | 6 |
| ≥ 2 DMARDs | ‐ | 182 (90%) | 0 | |
| Methotrexate | ‐ | 173 (85%) | 0 | |
| Sulfasalazine | ‐ | 138 (68%) | 0 | |
| Hydroxychloroquine | ‐ | 174 (86%) | 0 | |
| Leflunomide | ‐ | 49 (24%) | 0 | |
| Biological DMARDs † | ‐ | 8 (4%) | 0 | |
| Other DMARDs † | ‐ | 13 (6%) | 0 | |
| i.a./i.m. corticosteroids ‡ | 43 (21%) | 27 (13%) | 0 | |
| Oral corticosteroids § | ≥ 1 course | ‐ | 177 (87%) | 0 |
Continuous variables are represented as median (interquartile range), and categorical variables are represented as the number of individuals (percentage) with that characteristic.
BMI, body mass index; ACPA, anti‐cyclic‐citrullinated‐peptide antibodies; DMARD, disease modifying anti‐rheumatic drug.
i.a., intra‐articular, i.m., intra‐muscular, i.a./i.m. corticosteroid doses administered were converted to oral prednisolone equivalent with a median of 150 mg.
Oral corticosteroid doses administered were converted to oral prednisolone equivalent with a median of 0.14 mg day–1 over the entire 2 year period.
Table 2.
Final parameter estimates. Values are converted out of the logit domain into a linear scale and are substitutable into the following equations. Structural equation: SCORETRT = BASETRT + EFFADD, where EFFADD = 0 when TRT = 0 (baseline) and EFFADD takes the model estimated values when TRT = 1 (treated). Models for TJC28, TJC56, SJC28, SJC44, PhGH, mHAQ and pain incorporated the covariate effect on the EFFADD parameter: BASEj , TRT = θBASE + ηBASE , j , TRT, EFFADDj = θEFFADD ⋅ (1 + PREDEQDOSE ⋅ θPREDEQDOSE) + ηEFFADD , j. Models for ESR, CRP, PtGH and fatigue incorporated the same effect on the BASETRT parameter: BASEj , TRT = θBASE ⋅ (1 + PREDEQDOSE ⋅ θPREDEQDOSE) + ηBASE , j , TRT, EFFADDj = θEFFADD + ηEFFADD,j. PREDEQDOSE has a value of 0 for treatment states that did not result in the administration of i.a./i.m. corticosteroids and 1 for those that did. For ESR, CRP, PtGH and PhGH, the magnitude of the covariate effect is multifactorial
| Metric | Parameter | Parameter value (Linear scale) | Precision(%RSE) | PPV (SD) | Precision (%RSE) | Model bounds (LB M , UB M ) | True bounds (LB T , UB T ) |
|---|---|---|---|---|---|---|---|
| TJC28 | Base (POPTJC28BASE) | −0.616 (10.2) | 16.2 | 1.43 | 10.5 | −0.5, 30.5 | 0, 28 |
| Additive effect (POPTJC28EFFADD) | −1.97 (−8.6) | 6.5 | 1.72 | 10.1 | |||
| Corticosteroid effect on POPTJC28EFFADD | −0.862 (−0.789) | 16.7 | |||||
| TJC56 | Base (POPTJC56BASE) | −0.834 (16.6) | 10.9 | 1.3 | 12.4 | −0.5, 56.5 | 0, 56 |
| Additive effect (POPTJC56EFFADD) | −1.83 (−13.4) | 7.2 | 1.7 | 10 | |||
| Corticosteroid effect on POPTJC56EFFADD | −0.873 (−0.804) | 16.5 | |||||
| SJC28 | Base (POPSJC28BASE) | −0.826 (8.2) | 10.3 | 1.212 | 9.5 | −0.5, 28.5 | 0, 28 |
| Additive effect (POPSJC28EFFADD) | −2.03 (−7.1) | 6.1 | 1.640 | 8.2 | |||
| Corticosteroid effect on POPSJC28EFFADD | −0.711 (−0.565) | 17.9 | |||||
| SJC44 | Base (POPSJC44BASE) | −1.02 (11.3) | 7.4 | 1.082 | 10.9 | −0.5, 44.5 | 0, 44 |
| Additive effect (POPSJC44EFFADD) | −1.98 (−9.7) | 6.3 | 1.661 | 8.4 | |||
| Corticosteroid effect on POPSJC44EFFADD | −0.733 (−0.588) | 17.7 | |||||
| ESR | Base (POPESRBASE) | −1.53 (22.2) | 5.7 | 1.158 | 13.4 | 0.5, 122 | 0, 120 |
| Corticosteroid effect on POPESRBASE (TRT = 0) | −0.412 (0.610) | 24.2 | |||||
| Additive effect (POPESREFFADD) | −1.02 (−12.8) | 9.8 | 1.411 | 13.3 | |||
| CRP | Base (POPCRPBASE) | −3.63 (7.3) | 2.9 | 1.559 | 11.5 | 0, 282.5 | 0, 280 |
| Corticosteroid effect on POPCRPBASE (TRT = 0) | −0.251 (1.395) | 21.4 | |||||
| Additive effect (POPCRPEFFADD) | −1.17 (−5.2) | 9.8 | 1.655 | 10.3 | |||
| PtGH | Base (POPPTGHBASE) | −0.526 (37.2) | 22.4 | 1.493 | 14 | −0.5, 101.5 | 0, 100 |
| Corticosteroid effect on POPPTGHBASE (TRT = 0) | −1.87 (0.640) | 21.6 | |||||
| Additive effect (POPPTGHEFFADD) | −1.42 (−24.8) | 9.3 | 1.881 | 10 | |||
| PhGH | Base (POPPHGHBASE) | −0.0579 (48.3) | 116.8 | 0.963 | 9.4 | −0.5, 100.5 | 0, 100 |
| Additive effect (POPPHGHEFFADD) | −2.24 (−39.6) | 5.4 | 1.637 | 9 | |||
| Corticosteroid effect on POPPHGHEFFADD | −0.891 (0.847) | 14 | |||||
| mHAQ | Base (POPMHAQBASE) | −1.46 (0.53) | 6.2 | 1.281 | 9.3 | −0.05, 3.05 | 0, 3 |
| Additive effect (POPMHAQEFFADD) | −1.46 (−0.43) | 7.2 | 1.456 | 9.9 | |||
| Corticosteroid effect on POPMHAQEFFADD | −1.01 (1.016) | 18.3 | |||||
| Pain | Base (POPPAINBASE) | −0.00524 (50.4) | 2213.7 | 1.649 | 11.2 | −0.5, 102 | 0, 100 |
| Additive effect (POPPAINEFFADD) | −2.09 (−39.7) | 7.1 | 2.047 | 10.6 | |||
| Corticosteroid effect on POPPAINEFFADD | −0.897 (0.862) | 20 | |||||
| Fatigue | Base (POPFATBASE) | −0.451 (39.0) | 39.5 | 1.679 | 12 | −0.5, 101.5 | 0, 100 |
| Corticosteroid effect on POPFATBASE (TRT = 0) | −2.13 (0.613) | 26.9 | |||||
| Additive effect (POPFATEFFADD) | −1.29 (−24.4) | 11 | 2.017 | 10.3 |
The final models for all metrics included a physician's decision at any time to administer i.a. and/or i.m. corticosteroids as a covariate effect (PREDEQDOSE, had a value of 0 for treatment states that did not result in the administration of corticosteroids and 1 for those that did) that significantly improved the fit of each of the base models (P < 0.0001). Models for TJC28, TJC56, SJC28, SJC44, PhGH, mHAQ and pain, incorporated the covariate effect on the EFFADD parameter. Models for ESR, CRP, PtGH and fatigue, incorporated the same effect on the BASETRT parameter (Table 2).
Description of the final models
The final parameter estimates and their precision for the 11 disease activity metric models are presented in Table 2. All metrics were described as decreasing in score from baseline to the treated state for the population. PPV in both BASETRT and EFFADD parameters for all metrics was high.
Acute phase reactants such as ESR and CRP demonstrated little relationship to baseline or changes in scores with other RA disease activity metrics (r < 0.3) bar with each other (0.3 < r < 0.7, Figure 1). Comprehensive and reduced joint counts (i.e. TJC56 vs. TJC28) demonstrated strong correlations between their scores at baseline and the degree/direction of change (r > 0.7). Relative standard errors of the correlation estimates from 200 bootstraps are provided in Supplementary Figure 2.
Figure 1.

Correlation between individual metric parameters. Shown are the model estimated correlations (off‐diagonal elements of the variance–covariance matrix of the combined model) between BASETRT and EFFADD parameters describing the distributions of baseline and treated scores for all metrics. Squares in the grid are coloured, green for correlation > 0.7 or < –0.7, yellow within the range of 0.3 to 0.7 and –0.3 to –0.7 (inclusive), orange within 0.1 to 0.3 and –0.1 to –0.3 and red for between –0.1 and 0.1 (inclusive). Highly discriminatory metrics were correlated with others with similar properties (such as SJC28 and SJC44) and show little relationship with those that performed poorly (SJC28 and CRP)
Clinic visits from the treated state that resulted in the administration of i.a. and/or i.m. corticosteroids were associated with significantly higher scores, as measured by all 11 metrics, than those clinic visits that did not (described by PREDEQDOSE covariate effect on BASETRT [when TRT = 1] or on the EFFADD parameter). In this cohort, corticosteroids were administered to individuals experiencing a flare‐up in RA disease activity at the physician's discretion. As it was assumed that the prescribed DMARD protocol would sufficiently decrease the population's disease activity between treatment states, incorporation of PREDEQDOSE identified those individuals and clinic visits that did not fit this assumption. It was considered that individuals receiving corticosteroids confounded population estimates for changes in scores between treatment states, and therefore population typical values for not receiving corticosteroids at either treatment state were used in determining a metric's discriminatory properties.
The single metric models (Figure 2) displayed acceptable predictive performance when assessed by prediction‐corrected visual predictive checks 22. For the majority of the models, the observed distribution densities for both baseline and the treated state consistently lay within their respective 95% prediction intervals for the simulated data.
Figure 2.

Single metric simulations. Solid lines and shaded 95% prediction intervals represent the prediction‐corrected (‘Pred. corrected’) observed and model simulated distribution densities (n = 1000 simulations), respectively, for the baseline (red) and treated states (blue) for each of the 11 RA metrics. Both observed and simulated CRP values were log10‐transformed for visual comparison. The model predictions overlay the observed data with good agreement of both baseline and treated states for majority of the single metrics
Discriminatory capacity of disease activity metrics
Plotting estimated treatment effect variability vs. treatment effect magnitude allowed for the identification of metrics that demonstrated a higher capacity to distinguish between treatment states (Figure 3). For all metrics, estimates for treatment effect magnitude do not have their respective shaded clouds (depicting precision in parameter estimates) cross over the point of ‘no effect’, i.e. when treatment effect magnitude equals zero. Therefore, all metrics demonstrated a significant change in score from baseline to the treated state. PhGH, SJC28, SJC44, TJC28, TJC56, pain and mHAQ had a greater ability to achieve scores of zero in the treated state with less variability. The treatment effect magnitude for these metrics ranged from 78.8% (pain) to 87.2% (SJC28) with their treatment effect variability from 78.9% (PhGH) to 99.7% (mHAQ). When compared with ESR, CRP, PtGH and fatigue (treatment effect magnitude 58% [ESR] to 68.4% [CRP] and treatment effect variability 132.5% [PtGH] to 156.4% [fatigue]), the former were suspected to have a greater capacity to discriminate between high and low disease activity in the population.
Figure 3.

Discriminatory capacity of disease activity metrics. Treatment effect variability is the coefficient of variation (%CV) for the population's change in score. Treatment effect magnitude is the population's change in score expressed as a percentage of the population typical score for baseline. Shaded bubbles are the 95% confidence intervals for each metric's point on the treatment effect variability‐magnitude plane, and black circles are the point estimate. Metrics with lower values for treatment effect variability (PhGH, SJC28, SJC44) suggest higher uniformity in the population's change from baseline to the treated state. All metrics demonstrated a significant change in score for the population. Metrics approaching the maximum treatment effect of 100% (i.e. SJC28, SJC44, TJC28), are most responsive to change score with respect to DMARD therapy and time
Simulations of composite indices
ESR was identified as a variable with a lower ability to discriminate between treatment states (Figures 1, 3), and as such, was removed from the original DAS28 to form the reduced DAS28 (equation (8)).
| (8) |
Composite index scores calculated by simulated single metrics appropriately represented the distributions of scores in both treatment states as determined by the observed data (Figure 4). The median shared area of the original DAS28 baseline and treated distributions for the simulated population was 56.4%. Simulations of the reduced DAS28 exhibited an increase in the area shared by the distributions by 5.6%. Simulations for both composite indices were facetted for different baseline ESR levels, < 25 mm h–1 (‘normal’), 25–50 mm h–1 or >50 mm h–1, to understand if the increase in the median shared area from the original DAS28 to the reduced DAS28 was common to all groups. The ability of the reduced DAS28 to distinguish between the states was more impaired in subpopulations with elevated baseline ESR (Figure 4).
Figure 4.

Comparison of original DAS28 and reduced DAS28. A) Original DAS28, B) original DAS28 facetted for baseline ESR, C) reduced DAS28 [ESR removed from original DAS28 equation] and D) reduced DAS28 facetted for baseline ESR. The proportion of simulated individuals in each facet are; 0.51 (baseline ESR < 25 mm h–1), 0.19 (baseline ESR 25 – 50 mm h–1) and 0.30 (baseline ESR >50 mm h–1). Solid lines and shaded 95% prediction intervals represent the observed and model simulated distribution densities (n = 1000 simulations), respectively, for the baseline (red) and treated states (blue). MSA = median shared area (%). The model predictions overlay the observed data with good agreement. The ability of the reduced DAS28 to distinguish between treatment states is most impaired in subpopulations with elevated baseline ESR
Discussion
This population modelling‐based evaluation has demonstrated a novel method for describing the relationships between 11 RA disease activity metrics of varying continuous scales and skewed distributions. Acute phase reactants (ESR and CRP) were identified as metrics that were least informative at describing changes in RA disease activity, while joint counts and the PhGH demonstrated a greater ability to discriminate changes. Simulations showed that the impact of the DAS28's ability to describe changes in RA disease activity when ESR was removed were dependent on the baseline ESR level.
Previous evaluations of single RA disease activity metrics were often based on their ability to reflect outcomes such as radiographic progression or their correlation with other reference measures such as composite indices or the (modified) health assessment questionnaire 23. However, irreversible radiographic progression has become less prevalent in patients with more intensive contemporary treatment strategies 3, 12, 24. Previous literature has also been limited by intra‐subset comparisons such as solely between acute phase reactants (ESR vs. CRP) or of different patient reported outcomes. Therefore, the present study focused on evaluating a single metric's ability to reflect response effectively, or lack thereof, to DMARD therapy, and its relationship with 10 other metrics for RA disease activity.
Acute phase reactants have been shown to be important predictors of radiographic progression 25, 26, but appeared to be less discriminatory when distinguishing changes in RA disease activity compared with tender and swollen joint counts, global health, pain and physical function assessments in this analysis (Figure 3). Whilst correlation between baseline and changes in scores between metrics in most cases was not exceptionally high (i.e. r > 0.8), ESR and CRP had the most limited correlational relationships with other metrics (Figure 1). Previous work supports this discordance between acute phase reactants and clinical disease activity metrics (TJC28, SJC28, PtGH and PhGH as part of the clinical disease activity index, CDAI 8), where approximately 50% of patients with active RA (CDAI > 2.8) did not have ESR or CRP elevated at baseline. However, those with much higher RA disease activity (CDAI > 22) were more likely to have both acute phase reactants elevated 27. Notably, ESR demonstrated the smallest treatment effect magnitude, and the degree and direction of that change in ESR was highly discordant among individuals in the study cohort compared with other metrics (Figure 3). It is important to recognise that a common target for ESR in RA management is less than the upper limit of the age‐ and gender‐dependent normal range (up to 25 mm h–1) and not zero. As a result, the small treatment effect magnitude of ESR was consistent with changes within the normal range (Table 2, baseline levels are in the normal range).
Simulations using the combined model demonstrated that individuals with initially elevated ESR levels (>25 mm h–1) had more substantial ESR changes compared with those with baseline ESR < 25 mm h–1, and thus required more than just the clinical metrics (i.e. reduced DAS28) to discriminate changes in their RA disease activity (Figure 4). Individuals with normal baseline ESR levels were more likely to have the baseline original DAS28 dominated by scores from clinical metrics and any change in ESR was suspected to be of a small magnitude. Assuming that baseline scores represent a higher level of disease activity, ESR may only be important in discriminating changes in disease activity for the subset of individuals where it was initially elevated at baseline and not for those in whom it was initially normal.
The application of population modelling methods for describing relationships between outcome metrics is unique and could be extended to other therapeutic areas that report multiple outcomes or wish to distinguish between prognostic and predictive markers. Final parameter estimates for describing baseline scores and the change between the two treatment states obtained from population modelling (Table 2) were derived from maximum likelihood estimation rather than conventional statistical methods (such as calculating the mean and SE). This method allowed for the identification of an integrative set of parameter values and covariate effects that most likely gave rise to the population's observed scores. The modelling approach also provided a quantitative description for the differences in scores observed at clinic visits where corticosteroids were administered. The empirical method for handling continuous bounded data was sufficient as shown by the model's visual descriptions in Figure 2. This suggests that the process was transparent for various degrees of skewed distributions and scales exhibited by the metrics. Disease progression models would have captured the population's (and an individual's) time course of improvement or fluctuation over the 2 year period. However, the body of work required to test a large number of models in order to identify 11 final models (one for each of the single metrics) was considered not feasible in addressing this study's aims.
Despite being less discriminatory compared with the other single metrics, ESR appeared to be important in describing changes in RA disease activity as determined by the DAS28 for those individuals where ESR levels were initially elevated. Considering that RA is a heterogeneous disease, it is likely that there will not be a single composite index that can concisely capture changes in disease activity for all RA patients and defining the most appropriate metric(s) for individual patients will be an important area of future research.
Competing Interests
All authors have completed the Unified Competing Interest form at http://www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare no support from any organization for the submitted work, no financial relationships with any organizations that might have an interest in the submitted work in the previous 3 years and no other relationships or activities that could appear to have influenced the submitted work.
The authors acknowledge that the Australian Centre for Pharmacometrics is an initiative of the Australian Government as part of the National Collaborative Research Infrastructure Strategy.
Author Contributions
All individuals listed as authors contributed substantially to the concept and preparation of this manuscript. JW, MDW, SMP, DJRF and RNU were primarily responsible for study design. Data collection was primarily performed by SMP. Data analysis was primarily performed by JW, DJRF and RNU. Manuscript primarily drafted by JW. Manuscript revised by JW, MDW, SMP, DJRF and RNU.
Supporting information
Supporting Information
Supporting Information
Supporting Information
Wojciechowski, J. , Wiese, M. D. , Proudman, S. M. , Foster, D. J. R. , and Upton, R. N. (2016) A model‐based evaluation of single metrics for discriminating changes in rheumatoid arthritis disease activity. Br J Clin Pharmacol, 81: 1046–1057. doi: 10.1111/bcp.12891.
References
- 1. Singh JA, Furst DE, Bharat A, Curtis JR, Kavanaugh AF, Kremer JM, Moreland LW, O'Dell J, Winthrop KL, Beukelman T, Bridges SL Jr, Chatham WW, Paulus HE, Suarez‐Almazor M, Bombardier C, Dougados M, Khanna D, King CM, Leong AL, Matteson EL, Schousboe JT, Moynihan E, Kolba KS, Jain A, Volkmann ER, Agrawal H, Bae S, Mudano AS, Patkar NM, Saag KG. 2012 update of the 2008 American College of Rheumatology recommendations for the use of disease‐modifying antirheumatic drugs and biologic agents in the treatment of rheumatoid arthritis. Arthritis Care Res 2012; 64: 625–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Verstappen SMM, Jacobs JWG, van der Veen MJ, Heurkens AHM, Schenk Y, ter Borg EJ, Blaauw AAM, Bijlsma JWJ, Utrecht Rheumatoid Arthritis Cohort Study Group . Intensive treatment with methotrexate in early rheumatoid arthritis: aiming for remission. Computer Assisted Management in Early Rheumatoid Arthritis (CAMERA, an open‐label strategy trial). Ann Rheum Dis 2007; 66: 1443–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Grigor C, Capell H, Stirling A, McMahon AD, Lock P, Vallance R, Kincaid W, Porter D. Effect of a treatment strategy of tight control for rheumatoid arthritis (the TICORA study): a single‐blind randomised controlled trial. Lancet 2004; 364: 263–9. [DOI] [PubMed] [Google Scholar]
- 4. Smolen JS, Aletaha D, Bijlsma JW, Breedveld FC, Boumpas D, Burmester G, Combe B, Cutolo M, de Wit M, Dougados M, Emery P, Gibofsky A, Gomez‐Reino JJ, Haraoui B, Kalden J, Keystone EC, Kvien TK, McInnes I, Martin‐Mola E, Montecucco C, Schoels M, van der Heijde D, Committee TTE. Treating rheumatoid arthritis to target: recommendations of an international task force. Ann Rheum Dis 2010; 69: 631–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Prevoo MLL, van't Hof MA, Kuper HH, van Leeuwen MA, van de Putte LBA, van Riel PLCM. Modified disease activity scores that include twenty‐eight‐joint counts: development and validation in a prospective longitudinal study of patients with rheumatoid arthritis. Arthritis Rheum 1995; 38: 44–8. [DOI] [PubMed] [Google Scholar]
- 6. Smolen JS, Landewe R, Breedveld FC, Dougados M, Emery P, Gaujoux‐Viala C, Gorter S, Knevel R, Nam J, Schoels M, Aletaha D, Buch M, Gossec L, Huizinga T, Bijlsma JW, Burmester G, Combe B, Cutolo M, Gabay C, Gomez‐Reino J, Kouloumas M, Kvien TK, Martin‐Mola E, McInnes I, Pavelka K, van Riel P, Scholte M, Scott DL, Sokka T, Valesini G, van Vollenhoven R, Winthrop KL, Wong J, Zink A, van der Heijde D. EULAR recommendations for the management of rheumatoid arthritis with synthetic and biological disease‐modifying antirheumatic drugs. Ann Rheum Dis 2010; 69: 964–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Smolen JS, Breedveld FC, Schiff MH, Kalden J, Emery P, Eberl G, van Riel PL, Tugwell P. A simplified disease activity index for rheumatoid arthritis for use in clinical practice. Rheumatology 2003; 42: 244–57. [DOI] [PubMed] [Google Scholar]
- 8. Aletaha D, Nell VP, Stamm T, Uffmann M, Pflugbeil S, Machold K, Smolen JS. Acute phase reactants add little to composite disease activity indices for rheumatoid arthritis: validation of a clinical activity score. Arthritis Res Ther 2005; 7: R796–806. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Mould DR, Upton RN, Wojciechowski J. Dashboard systems: implementing pharmacometrics from bench to bedside. AAPS J 2014; 16: 925–37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Wojciechowski J, Wiese MD, Proudman SM, Foster DJ, Upton RN. A population model of early rheumatoid arthritis disease activity during treatment with methotrexate, sulfasalazine and hydroxychloroquine. Br J Clin Pharmacol 2015; 79: 777–88. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Aletaha D, Neogi T, Silman AJ, Funovits J, Felson DT, Bingham CO 3rd, Birnbaum NS, Burmester GR, Bykerk VP, Cohen MD, Combe B, Costenbader KH, Dougados M, Emery P, Ferraccioli G, Hazes JM, Hobbs K, Huizinga TW, Kavanaugh A, Kay J, Kvien TK, Laing T, Mease P, Menard HA, Moreland LW, Naden RL, Pincus T, Smolen JS, Stanislawska‐Biernat E, Symmons D, Tak PP, Upchurch KS, Vencovsky J, Wolfe F, Hawker G. 2010 Rheumatoid arthritis classification criteria: an American College of Rheumatology/European League Against Rheumatism collaborative initiative. Arthritis Rheum 2010; 62: 2569–81. [DOI] [PubMed] [Google Scholar]
- 12. Proudman SM, Keen HI, Stamp LK, Lee AT, Goldblatt F, Ayres OC, Rischmueller M, James MJ, Hill CL, Caughey GE, Cleland LG. Response‐driven combination therapy with conventional disease‐modifying antirheumatic drugs can achieve high response rates in early rheumatoid arthritis with minimal glucocorticoid and nonsteroidal anti‐inflammatory drug use. Semin Arthritis Rheum 2007; 37: 99–111. [DOI] [PubMed] [Google Scholar]
- 13. Beal S, Sheiner LB, Boeckmann A, Bauer RJ, Sheiner LB, Boeckmann A, Bauer RJ, NONMEM User's Guides . (1989–2009), Icon Development Solutions, Ellicott City, MD, USA, 2009.
- 14. R Core Team . R: A language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2014.
- 15. Wickham H. ggplot2: Elegant Graphics for Data Analysis. New York: Springer; 2009. [Google Scholar]
- 16. Maechler M. sfsmisc: Utilities from Seminar fuer Statistik ETH Zurich . R package version 1.0–27 ed2015.
- 17. Højsgaard S, Halekoh U, Robinson‐Cox J, Wright K, Leidi AA. doBy: Groupwise statistics, LSmeans, linear constrasts, utilities. R package version 4.5–13 ed2014.
- 18. Harrell FE Jr, Dupont C. Hmisc: Harrell Miscellaneous. R package version 3.14–6 ed2014.
- 19. Wickham H. The split‐apply‐combine strategy for data analysis. J Stat Soft 2011; 40: 1–29. [Google Scholar]
- 20. Meredith M, Ridout M. overlap: Estimates of coefficient of overlapping for animal activity patterns. R package version 0.2.4 ed2014.
- 21. Wickham H. Reshaping data with the reshape package. J Stat Soft 2007; 21: 1–20. [Google Scholar]
- 22. Bergstrand M, Hooker AC, Wallin JE, Karlsson MO. Prediction‐corrected visual predictive checks for diagnosing nonlinear mixed‐effects models. AAPS J 2011; 13: 143–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. van der Heijde D, van't Hof MA, van Riel PLCM, van Leeuwen MA, van Rijswijk MH, van de Putte LBA. Validity of single variables and composite indices for measuring disease activity in rheumatoid arthritis. Ann Rheum Dis 1992; 51: 177–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Stenger A, Van Leeuwen M, Houtman P, Bruyn G, Speerstra F, Barendsen B, Velthuysen E, Van Rijswijk M. Early effective suppression of inflammation in rheumatoid arthritis reduces radiographic progression. Rheumatology 1998; 37: 1157–63. [DOI] [PubMed] [Google Scholar]
- 25. van Leeuwen MA, van Rijswijk MH, van der Heijde DMFM, Meerman GJT, van Riel PLCM, Houtman PM, van de Putte LBA, Limburg PC. The acute‐phase response in relation to radiographic progression in early rheumatoid arthritis: a prospective study during the first three years of the disease. Rheumatology 1993; 32 (suppl 3): 9–13. [DOI] [PubMed] [Google Scholar]
- 26. Navarro‐Compan V, Gherghe AM, Smolen JS, Aletaha D, Landewe R, van der Heijde D. Relationship between disease activity indices and their individual components and radiographic progression in RA: a systematic literature review. Rheumatology 2015; 54: 994–1007. [DOI] [PubMed] [Google Scholar]
- 27. Kay J, Morgacheva O, Messing SP, Kremer JM, Greenberg JD, Reed GW, Gravallese EM, Furst DE. Clinical disease activity and acute phase reactant levels are discordant among patients with active rheumatoid arthritis: acute phase reactant levels contribute separately to predicting outcome at one year. Arthritis Res Ther 2014; 16: R40. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supporting Information
Supporting Information
Supporting Information
