1. INTRODUCTION
Non-surgical interventional procedures for conditions associated with back pain are commonly used in the United States (US). [12; 13; 22; 30; 34] Lumbar spinal stenosis is a specific spine syndrome often associated with back pain.[16] Four of 5 US clinical practice guidelines recommend the use of lumbar epidural steroid injections (LESIs) for symptoms due to lumbar spinal stenosis.[1] However, the average treatment effect size of ESIs for lumbar spinal stenosis in randomized controlled trials (RCTs) is modest.[15]
One approach to obtaining larger magnitude treatment effects in lumbar spinal stenosis and other conditions associated with back pain may be to leverage the nonrandom variation in treatment effect in groups of treated patients, also referred to as “heterogeneity of treatment effect”.[19] While RCTs estimate average treatment effects, contemporary predictive approaches to heterogeneity of treatment effect estimate individualized treatment effects, reflecting the average treatment effect within a subgroup of patients.[20]. If subgroups can be identified in which large-magnitude individualized treatment effects are expected, a treatment can be preferentially offered to those who are most likely to benefit from it, while others can be offered an alternative. The conventional approach to identifying such subgroups in RCTs is by conducting secondary analyses that examine effect modification by one factor or another; i.e., “1-variable-at-a-time” analyses.[19; 33] These are often underpowered, and prone to false positive results due to multiple statistical comparisons.[6; 19] Also, such analyses ignore the fact that patients have multiple interrelated characteristics that can affect treatment outcomes simultaneously and that ideally would be taken into consideration together. The Predictive Approaches to Treatment effect Heterogeneity (PATH) Statement was recently developed by a multidisciplinary expert panel and provides a framework for applying predictive heterogeneity of treatment effect approaches to RCT data.[19] “Risk-modeling” approaches suggested by the PATH Statement use multivariable models accounting for various patient characteristics to estimate predicted risk for an outcome and stratify patients in RCTs into subgroups defined by different individualized treatment effects.[19; 20] By approaching effect modification from a perspective where many factors are accounted for simultaneously, risk modeling approaches account for the diverse characteristics of patients and can personalize care.
A recent evidence synthesis recommends that LESI for chronic lumbar radiculopathy be considered for limited use in selected patients, but no recommendations are provided on how to select patients.[11] Furthermore, no recommendations are provided regarding the use of LESI for lumbar spinal stenosis, a related spine condition. [4; 11] Ideally, patients with lumbar spinal stenosis and their providers would have empirical data with which to make informed decisions, based on patient-centered estimates of benefits versus risks. The overarching goal of this study was to use a risk-modeling approach to provide individualized estimates of the effect of LESI on back-related functional limitations in subgroups of patients with lumbar spinal stenosis defined by predicted risk. The first aim was to develop and validate a predictive model for back-related functional limitations. The second aim was to use the model to examine risk-based heterogeneity of treatment effect among patients receiving LESIs for lumbar spinal stenosis.
2. METHODS
2.1. Study overview and design.
Risk-modeling approaches to predictive heterogeneity of treatment effect generally involve 2 stages: (1) selecting a multivariable regression model that predicts risk of an outcome while ignoring treatment assignment; and (2) using the model to stratify patients in an RCT, estimate treatment effects within strata, and examine variation in treatment effects across strata.[19] Because there was no widely-accepted model for predicting future back-related functional measures in older adults using the variables available in our existing datasets, we undertook new model development. For Aim 1, we developed and validated a model that used baseline patient characteristics at the time of a new episode of care for back pain among older adults in primary care to predict back-related functional limitations at follow-up, as measured by the Roland-Morris Disability Questionnaire (RMDQ), in the Back pain Outcomes using Longitudinal Data (BOLD) cohort study (Figure 1, steps A-C).[18] To remain consistent with the PATH framework guidance that risk-modeling approaches use models that are “developed ‘blinded’ to treatment assignment”,[19] we ignored whether participants received LESI in the model development phase; 4.5% of patients in BOLD received LESI during the first 3 months of follow-up. For Aim 2, we used the validated predictive model in an independent sample of patients from the Lumbar Epidural Steroid injections for spinal Stenosis (LESS) RCT to stratify patients into subgroups (quartiles) defined by their predicted back-related functional limitations following LESI (Figure 1, step D).[15] Treatment effects of LESI vs. control on RMDQ scores were estimated within quartiles of predicted RMDQ scores at follow-up, and the statistical significance of effect modification according to the predicted RMDQ scores was evaluated. We hypothesized that there would be significant treatment effect heterogeneity among patients receiving LESIs for lumbar spinal stenosis. We followed the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) guidelines for model development and reporting.[7] The institutional review boards at the University of Washington and all sites approved the BOLD and LESS studies, and all participants provided informed consent.
Figure 1.

(A) A predictive model was developed in the Back pain Outcomes using Longitudinal Data (BOLD) cohort study testing set. (B) The model developed in A was then applied in the BOLD testing set. (C) The model developed in A was then applied in the control group of the Lumbar Epidurals Steroid Injections for spinal Stenosis (LESS) randomized controlled trial (RCT). (D) The model developed in A was then used to stratify patients from the LESS trial into subgroups defined by quartiles of predicted risk scores.
2.2. Study samples.
The BOLD cohort was comprised of 5239 patients age 65 years or older who presented to primary care or the emergency department for a new episode of care for back pain between September 2010 and March 2013 at 3 large integrated health care systems in the US: Harvard Vanguard (Boston), Henry Ford Health System (Detroit), Kaiser-Permanente (Northern California). New episodes were defined as those in which the patient had no health care visits for back pain in the preceding 6 months. BOLD participants with back pain included those with or without lower extremity pain due to various causes; 21% had clinical diagnoses of “sciatica”, lumbosacral radiculopathy, and/or lumbar disc herniation; and 6% had a diagnosis of lumbar spinal stenosis.[8] Participants were excluded for prior lumbar surgery and “red flag” spine conditions such as spondyloarthropathy, infection, malignancy, etc.[18] LESS was a double-blind RCT involving 400 patients age 50 years or older with moderate-to-severe leg pain, symptoms consistent with neurogenic claudication, and lumbar central canal spinal stenosis on magnetic resonance imaging or computed tomography who presented to interventional pain clinics for LESI between April 2011 and August 2013 at 16 sites in the US. Other inclusion criteria were pain intensity greater than 4 on a 0 to 10 numerical rating scale (NRS), predominant lower extremity pain, and a RMDQ score of 7 or more. Exclusion criteria included red flag conditions, prior lumbar surgery, or epidural steroid injections within the prior 6 months. Extensive details of BOLD[17; 18] and LESS[14; 15] recruitment, participants, and treatments received have been reported previously; the BOLD and LESS samples do not overlap.
2.3. Data collection.
In the BOLD study, patient-reported measures were collected using phone interviews or mailed questionnaires within 3 weeks of a patient’s initiation of a new episode of care for back pain. Assessments were conducted at baseline and 3, 6, 12, and 24 months later. In the LESS study, patient-reported measures were collected using phone, in-person interviews or mailed questionnaires after LESI for symptoms of lumbar spinal stenosis. Assessments were conducted at baseline, 3 weeks, 6 weeks, and 3-, 6-, and 12-months post-randomization.
2.4. Baseline measures.
Baseline characteristics assessed in both studies included sociodemographics (age, sex, self-reported ethnicity [Hispanic or Latino] and race [American Indian or Alaska Native, Asian, black or African American, Native Hawaiian or Other Pacific Islander, and white], current employment [working vs. not], educational attainment [high school or less, some college, four-year college graduate or more advanced education], marital status [married/living with partner vs. not]; clinical factors (body mass index [BMI], kg/m2 calculated using height and weight derived from the patients’ electronic medical records], smoking status (never, past smoker, current smoker); and having a pending legal claim pertaining to backpain or health. Back-related functional limitations were evaluated using the RMDQ, which was modified in both the BOLD and LESS studies to indicate limitations related either to back pain or lower extremity pain. RMDQ scores range from 0 to 24, with higher scores indicating greater functional limitations.[15; 18] General pain-related interference was assessed using the validated Brief Pain Inventory (BPI) Interference scale.[29] The scale consists of 7 ratings (0–10) of how much back pain interferes with the following: general activity, mood, ability to walk, normal work, relations with other people, sleep, and enjoyment of life. Low back pain and leg intensity were assessed using 0–10 numeric rating scales (NRSs), in which higher values indicate greater pain intensity.[10] Quality of life was measured using the European Quality of Life 5 Dimension (EQ5D) quality of life index (0–1, with 0 being death and 1 being perfect health) (European Quality of Life 5 Dimension Index [EQ5D-Index]).[24] The EQ5D visual analog scale (0–100, with 0 being “a health state worse than death” and 100 being “perfect health”) was also evaluated (European Quality of Life 5 Dimension Visual Analog Scale [EQ5D-VAS]). The occurrence of self-reported falls in the past 3 weeks was assessed using items from the Behavioral Risk Factor Surveillance System (BRFSS) survey.[35] Psychological distress was evaluated using the Patient Health Questionnaire-4 (PHQ-4) measure of anxiety and depressive symptoms.[23] Patient recovery expectations evaluated using a 0 to 10 NRS in which patients rated their confidence that their back and/or leg pain would be completely gone or much better in 3 months on a scale from 0 = ‘not at all confident’ to 10 = ‘extremely confident’.[18]
2.5. Outcome measures.
The study outcome was back-related functional limitations at follow-up measured as measured by the RMDQ. Because the overall goal of the study was to provide personalized estimates of the effect of LESI in lumbar spinal stenosis, where the average treatment effect of LESI (vs. epidural anesthetic injections) is modest,[15] we decided a priori to examine effects on the RMDQ at the earliest post-randomization time point in the LESS RCT, when the benefits of LESI over control were expected to be greatest. The time point chosen was 3 weeks post-randomization, by which time immediate pain relief due to local anesthetic epidural injection should have abated, but short-term pain relief from ESI may be maintained. Additionally, due to the design of the LESS trial where participants could elect to receive a repeat epidural injection after 3 weeks post-randomization if they wished, only the 3-week time point represents a truly randomized comparison unconfounded by interim improvements and subsequent patient decisions to have a repeat LESI. This approach is consistent with recommendations from the PATH Statement to examine effect modification in situations (treatment, target population, post-treatment time point of assessment, etc.) where a significant treatment effect is expected.[20] Because follow-up data at 3 weeks was not available in the BOLD cohort, model development and validation was conducted using the RMDQ measured at 3-month follow-up, the earliest post-baseline assessment. Although this created a difference between the post-baseline time points of RMDQ assessment used in the model development/validation conducted in the BOLD cohort (3-month follow-up) and the treatment effect estimation conducted in the LESS RCT (3-week follow-up), we expected minimal detrimental consequences from this disparity due the fact that most improvements in back pain studies occur very early in the follow-up period, irrespective of pain chronicity.[2]
2.6. Statistical analysis: Model development and validation (Aim 1).
We used the BOLD cohort to develop and validate multivariable predictive models to predict RMDQ scores at 3-month follow-up (Figure 1). Due to higher pain NRS and RMDQ scores in the LESS RCT as compared to the BOLD cohort, to make the BOLD and LESS participants more comparable, we excluded participants with baseline RMDQ scores ≤2 and back pain NRS scores ≤2. After this exclusion, we randomly assigned BOLD participants to a training set or a testing set with a 4:1 ratio. We selected the 18 candidate predictor variables for model inclusion (listed in Section 2.3 and Table 1) based on their consistent associations with outcomes in the spine and low back pain literature,[5; 9; 21; 25–28; 31] the clinical experience of the study team members, and their availability in both the BOLD and LESS studies. Dummy variables were used for categorical variables, resulting in 25 variables included in the model (including more than 1 level for some predictors). We developed predictive models using linear regression with the least absolute shrinkage and selection (LASSO) regularization. LASSO is a machine learning method that penalizes the absolute value of the model coefficients and selects a reduced set of the known covariates for use in a model. We conducted a 5-fold cross validation on the training set to select the optimal regularization hyperparameter (lambda). We evaluated predictive performance and how well the model explained variation in outcomes using the coefficient of determination (R2). We used complete case analysis and excluded patients who had any data missing from selected baseline predictors and pain intensity measurements. We then assessed the performance of the predictive model using the BOLD testing set, comparing R2 in the BOLD training set with that in the testing set. As a further test of the external validity of the predictive model, we then assessed the performance of the model among LESS participants who did not receive LESI.
Table 1.
Characteristics of Participants in the Study Samples
| BOLD Trial Training Set (N=2543) | BOLD Trial Testing Set (N=651) | LESS Trial (N=183) | |
|---|---|---|---|
| Age | |||
| Mean (SD) | 73.9 (6.77) | 73.9 (6.99) | 67.2 (9.54) |
| Median [Min, Max] | 73.0 [65.0, 101] | 73.0 [65.0, 96.0] | 68.0 [50.0, 87.0] |
| Sex | |||
| Female | 1716 (67.5%) | 410 (63.0%) | 118 (64.5%) |
| Male | 827 (32.5%) | 224 (34.4%) | 65 (35.5%) |
| Body Mass Index | |||
| Mean (SD) | 29.4 (6.31) | 29.7 (6.27) | 31.2 (7.07) |
| Median [Min, Max] | 28.5 [14.6, 59.1] | 28.8 [16.9, 64.2] | 30.3 [18.7, 61.5] |
| Hispanic | |||
| No | 2404 (94.5%) | 588 (90.3%) | 174 (95.1%) |
| Yes | 139 (5.5%) | 46 (7.1%) | 9 (4.9%) |
| Race | |||
| Black or African American | 430 (16.9%) | 102 (15.7%) | 73 (39.9%) |
| White | 1829 (71.9%) | 458 (70.4%) | 100 (54.6%) |
| Other | 284 (11.2%) | 74 (11.4%) | 10 (5.5%) |
| Working status | |||
| No | 2305 (90.6%) | 568 (87.3%) | 132 (72.1%) |
| Working | 238 (9.4%) | 66 (10.1%) | 51 (27.9%) |
| Education | |||
| High School or less | 805 (31.7%) | 190 (29.2%) | 69 (37.7%) |
| Some college | 811 (31.9%) | 190 (29.2%) | 58 (31.7%) |
| Four year college graduate or more | 927 (36.5%) | 254 (39.0%) | 56 (30.6%) |
| Marital | |||
| No | 1064 (41.8%) | 261 (40.1%) | 92 (50.3%) |
| Married/Living with partner | 1479 (58.2%) | 373 (57.3%) | 91 (49.7%) |
| Lawyer | |||
| No | 2530 (99.5%) | 629 (96.6%) | 178 (97.3%) |
| Yes | 13 (0.5%) | 5 (0.8%) | 5 (2.7%) |
| Smoking status | |||
| Never Smoked | 1365 (53.7%) | 339 (52.1%) | 72 (39.3%) |
| Quit smoking over a year ago | 1003 (39.4%) | 255 (39.2%) | 80 (43.7%) |
| Current smoker, or quit less than a year ago | 175 (6.9%) | 40 (6.1%) | 31 (16.9%) |
| RDQ score(baseline) | |||
| Mean (SD) | 11.8 (5.22) | 11.9 (5.38) | 16.3 (3.90) |
| Median [Min, Max] | 12.0 [3.00, 24.0] | 12.0 [3.00, 24.0] | 17.0 [7.00, 23.0] |
| NRS back pain(baseline) | |||
| Mean (SD) | 6.01 (2.19) | 5.89 (2.21) | 7.36 (2.09) |
| Median [Min, Max] | 6.00 [2.00, 10.0] | 6.00 [2.00, 10.0] | 8.00 [2.00, 10.0] |
| NRS leg pain(baseline) | |||
| Mean (SD) | 4.01 (3.30) | 3.99 (3.39) | 7.58 (1.67) |
| Median [Min, Max] | 4.00 [0, 10.0] | 4.00 [0, 10.0] | 8.00 [3.00, 10.0] |
| How confident is the patient that the back of leg pain will be completely gone or will be much better 3 mo from now? | |||
| Mean (SD) | 5.09 (3.64) | 5.12 (3.65) | 7.61 (1.80) |
| Median [Min, Max] | 5.00 [0, 10.0] | 5.00 [0, 10.0] | 8.00 [3.00, 10.0] |
| Patient Health Questionnaire 4(baseline) | |||
| Mean (SD) | 1.88 (2.63) | 1.89 (2.55) | 2.52 (2.51) |
| Median [Min, Max] | 1.00 [0, 12.0] | 1.00 [0, 12.0] | 2.00 [0, 11.0] |
| Times of fallen in past 3 weeks(baseline) | |||
| No | 2332 (91.7%) | 578 (88.8%) | 173 (94.5%) |
| One or more falls in the past e wk | 211 (8.3%) | 56 (8.6%) | 10 (5.5%) |
| Brief Pain Inventory score(baseline) | |||
| Mean (SD) | 4.02 (2.23) | 4.00 (2.32) | 6.39 (2.08) |
| Median [Min, Max] | 3.86 [0, 10.0] | 4.00 [0, 10.0] | 6.50 [0, 10.0] |
| EQ5D score 0–100(baseline) | |||
| Mean (SD) | 71.8 (18.0) | 71.3 (18.0) | 70.6 (19.0) |
| Median [Min, Max] | 75.0 [0, 100] | 75.0 [0, 100] | 75.0 [5.00, 100] |
| EQ5D score 0–1(baseline) | |||
| Mean (SD) | 0.716 (0.166) | 0.715 (0.166) | 0.556 (0.199) |
| Median [Min, Max] | 0.778 [−0.0384, 1.00] | 0.778 [0.0402, 1.00] | 0.597 [0.165, 0.827] |
Secondary analyses were also conducted after using multiple imputation to account for missing data in the LESS study with respect to the variables included in the model, as an alternative approach to complete-case analysis. Multiple imputation was done using the “mice” package in the R statistical environment, and the RMDQ outcome was kept as part of the variable set that was used to impute missing covariates. First, we examined patterns of missing data, and based on this, we assumed a missing completely at random (MCAR) missingness structure. Then, we imputed the missing data and used the imputed data for analyses in LESS. No multiple imputation was performed for missing data in the BOLD study with respect to the variables included in the model, because missing data in this instance would only be expected to worsen the performance of the predictive model developed in BOLD, which would bias towards the null when testing the predictive model in the LESS sample; this would be a more conservative approach, which seemed appropriate in this context where discovery is the goal.
2.7. Statistical analysis: Model application in the LESS RCT (Aim 2)
Next, we applied the fitted model that was developed in the BOLD training set in the LESS sample to generate risk scores, dividing patients into 4 strata defined by quartiles of risk scores (a stratification approach that was chosen a priori) (Figure 1). This approach is recommended by the PATH statement as the first approach to be used for evaluating treatment effect heterogeneity.[20] We evaluated the average treatment effect of LESI on 3-week RMDQ scores within each quartile of predicted 3-month risk using the model below, which includes an interaction term for quartile of predicted risk × treatment group. Consistent with the analytic approach used in the LESS primary results,[15] baseline RMDQ and study site were adjusted for as covariates.
We calculated 95% confidence intervals for the treatment effect in each quartile of predicted risk. We used a Wald test and the (“robust”) Huber Sandwich Estimator to evaluate the statistical significance of the overall interaction term between risk quartile and treatment. To examine treatment effect heterogeneity in the LESS RCT data, we used the predicted RMDQ scores resulting from the original variables and weights from the model developed in the BOLD testing dataset. Although the model (developed and validated in BOLD) was subsequently applied among the LESS control group participants as a form of external validation, no updating was conducted of the model in the LESS sample. Consequently, application of the model for examination of treatment effect heterogeneity in LESS did not represent a second use of the same dataset (i.e., there is no potential circularity).
We also evaluated whether the treatment effect of LESI on 3-week RMDQ scores varied linearly across levels of predicted RMDQ scores using the model below, an alternative approach recommended by the PATH statement,[20] which includes an interaction term for risk score × treatment group. The baseline value of the outcome variable (RMDQ) and study site were included as covariates to maintain consistency with the modeling approach used in the primary analyses for the LESS RCT.
We used a Wald test to evaluate the statistical significance of the interaction term between risk scores and treatment group.
We conducted post hoc sensitivity analyses to examine whether applying greater restrictions to make the BOLD and LESS participants more comparable would have improved model performance for risk stratification and yielded different results, by excluding from model development those BOLD participants with baseline RMDQ scores ≤2 and NRS scores ≤4.
No sample size calculations were conducted, as this analysis used the largest cohort study of older adults with back pain and the largest RCT of LESI using patient-reported outcomes conducted to date.
3. RESULTS
3.1. Baseline characteristics of the study samples.
After exclusions to make the BOLD and LESS samples more comparable (see Section 2.6), the BOLD sample included 3908 of 5249 patients and the LESS sample included 381 out of 400 randomized patients. After excluding additional patients due to missing data for the predictors included in the models, our final study samples were 3256 patients in the analyses of the BOLD cohort and 328 in analyses of the LESS trial. Table 1 summarizes baseline characteristics of the BOLD training, BOLD testing, and LESS samples, and Supplemental Tables 1 and 2 characterize missingness. Patients in the BOLD development and BOLD validation samples were generally comparable for most baseline characteristics. Compared with patients from BOLD, those in the LESS trial were slightly younger, had higher BMI, and were more likely to be male and non-working. Additionally, patients in the LESS trial had higher values for the RMDQ, pain NRS, recovery expectations, PHQ, and BPI, and lower values for EQ5D scores, reflecting greater baseline severity of functional limitations, pain, and all other baseline assessments of these outcome variables.
3.2. Model development, validation, and testing in an external sample
Predictive models using LASSO regularization for 3-month RMDQ outcomes in the BOLD testing set selected 16 of the 25 candidate variables, including age, sex, BMI, race (white), being disabled, employment status other than being employed or disabled, “other” employment, college education/graduate degree, ever smoking, recovery expectations, and baseline scores for the RMDQ, BPI, back pain NRS, leg pain NRS, EQ5D-VAS, EQ5D-Index, and the PHQ4. Hispanic ethnicity, race (other), some college education, full-time employment, part-time employment, marital status, having a pending legal claim, and prior smoking. The full predictive model including the variables, regression coefficients, and intercept are provided in Supplemental Table 3. The multivariable model indicated that higher RMDQ scores at 3-month follow-up (greater disability) were associated with greater age and BMI, female sex, factors related to race and employment, not having a college or graduate degree, current smoking, lower expectations of recovery, higher baseline RMDQ scores, and other indicators of higher baseline symptom burden (depression, pain, etc.). R2 values for the model in the BOLD training set, BOLD testing set and LESS patients receiving the control treatment were 0.38, 0.32, 0.34, respectively, indicating that the model had consistent performance across samples with good explained variation and an R2 value that was actually larger in LESS (0.34) than in the BOLD testing set (0.32). As mentioned previously, while the outcome modelled in the training and testing sets was 3-month RMDQ scores, that modelled in the LESS sample was 3-week RMDQ scores.
3.3. Risk-modeling across quartiles of predicted RMDQ scores.
The model was then used to generate risk scores among all LESS participants, ignoring LESI treatment status (Figure 1). Participants were stratified into quartiles of predicted 3-week RMDQ scores, ranging from quartile 1 (lowest predicted risk) to quartile 4 (highest predicted risk), and within-quartile treatment effects of LESI were estimated (Figure 2). The treatment effect of LESI on 3-week RMDQ scores was minimal to absent in quartile 1 (average treatment effect [ATE]: −0.8 RMDQ points; 95%CI, −2.5 to 1.0) and quartile 2 (ATE: 0.03 RMDQ points; 95%CI, −1.9 to 2.0). Average treatments effects on 3-week RMDQ scores were larger in quartile 3 (ATE:−3.7 RMDQ points; 95%CI, −6.1 to −1.4) and quartile 4 (RMDQ −3.3 points; 95%CI, −5.4 to −1.2) and were statistically significant, indicating benefit from LESI for participants in these quartiles of predicted risk. The Wald test indicated a statistically significant difference in treatment effect (p=0.03) across risk quartiles. Characteristics of patients by quartiles of predicted RMDQ scores are presented in Supplemental Table 4.
Figure 2.

Between-group treatment effects of LESI on RMDQ scores at 3-week follow-up, according to quartiles of predicted RMDQ scores.*
*Quartile 1 includes participants with the lowest predicted RMDQ scores at 3-week follow-up (lower levels of disability), and Quartile 4 includes participants with the highest predicted RMDQ scores at 3-week follow-up (higher levels of disability).
3.4. Risk-modeling assuming linear interactions between predicted RMDQ scores and treatment effect.
In analyses of linear interactions between risk scores and treatment effect (Supplemental Figure 1), the magnitude of the estimated LESI treatment effect on 3-week RMDQ scores was larger as risk scores increased (−0.27 RMDQ points for those randomized to LESI at 3 weeks per 1-point greater predicted RMDQ score, 95% CI, −0.55 to 0.01) suggesting greater benefit of LESI for patients with higher risk scores. However, the linear interaction term was not statistically significant (p = 0.12).
3.5. Secondary analyses using multiple imputation
When using multiple imputation as an alternative to complete-case analysis, the R2 value for the predictive model in LESS patients receiving the control treatment was 0.29, smaller than that with complete-case analysis (0.34). In the imputed dataset, the Wald test no longer indicated a statistically significant difference in treatment effect (p=0.07) across risk quartiles, although a similar pattern of effects was seen as in the primary analyses, with minimal to absent effects in quartiles 1 and 2, and larger effects in quartiles 3 and 4 (data not shown).
3.6. Post hoc analyses
When applying an additional restriction to the subset of the BOLD cohort included in model development (excluding those with NRS scores ≤4), R2 values for the model in the BOLD training set, BOLD testing set and LESS patients receiving the control treatment were 0.37, 0.32, 0.34, respectively, indicating that the model had similar performance across samples to the primary analyses. When using this model to estimate LESI treatment effects across quartiles of predicted 3-week RMDQ scores, results were very similar to the primary analyses, with a Wald test indicating a statistically significant difference in treatment effect (p=0.02) across quartiles of predicted risk.
4.0. Discussion
In this study, we developed and validated a model for predicting functional limitations at follow-up among older adults with back pain. This predictive model showed good performance in an external validation sample of patients in an RCT of LESI for patients with symptomatic lumbar spinal stenosis (R2=0.34) despite differences between the development and validation samples with respect to the pain conditions studied (non-specific back pain with or without neuropathic lower extremity pain vs. symptomatic lumbar spinal stenosis) and the time point of follow-up (3 months vs. 3 weeks). We then used the model in a risk-modeling approach as recommended by the PATH statement to examine heterogeneity of the LESI treatment effect in the LESS RCT, estimating individualized treatment effects across quartiles of predicted risk informed by various patient characteristics at the same time (in contrast to conventional 1-variable-at-a-time analyses). A risk-modeling approach found statistically significant heterogeneity of the LESI treatment effect across quartiles of predicted RMDQ scores (p=0.03), but not when assuming linear interactions between predicted risk and LESI treatment effect.
To our knowledge, this is the first study of patients with back pain or lumbar spinal disorders to use a multivariable risk-modeling approach to examine heterogeneity of treatment effect guided by the PATH Statement framework.[19] In an earlier examination of effect modification in the LESS RCT using a 1-variable-at-a-time approach, we found that only 1 baseline variable (quality of life) among 21 candidate variables was significantly associated with greater benefit of LESI vs. lidocaine-only injections at 3-week follow-up.[33] A limitation of this previous study is that statistically significant associations may have emerged by chance given the multiple statistical comparisons made.[19] One interpretation of the current study’s findings is that a contemporary multivariable approach to estimate individualized treatment effects identified a subgroup with LESI treatment effects within the range of commonly used definitions for clinically relevant between-group effect sizes of 2.5–4.0 RMDQ points, [3; 32] represented by quartiles 3 and 4 in which the average treatment effects of LESI were 3.7 and 3.3 RMDQ points at 3-week follow-up, respectively. Conversely, a subgroup was identified in which LESI effects were minimal to absent, represented by quartiles 1 and 2 in which the average treatment effects of LESI were 0.8 and 0 RMDQ points, respectively. In other words, treatment effects of LESI on RMDQ scores were largest in those patients with the highest baseline risk for worse outcomes on the RMDQ. Clinical differences in these patient subgroups exist with regards to multiple patient characteristics, as shown analytically in Supplemental Table 3, and descriptively in Supplemental Table 4. While prior studies have identified many of these individual characteristics as potentially important effect modifiers of various treatments for back pain or lumbar spinal disorders, the current study illustrates how a risk-modeling approach can be used to account for multiple patient characteristics at the same time and identify subgroups of patients with large-magnitude treatment effects.
An alternate interpretation, however, is that the current study’s findings did not show convincing heterogeneity of effect, because no significant effect of the linear interaction of predicted risk with LESI treatment (p=0.12) was found. Figure 2 provides a possible explanation for why analyses assuming a linear interaction of predicted risk with treatment did not find a statistically significant interaction, as Figure 2 does not show a monotonic increase in the size of the estimated treatment effect of LESI by quartile (i.e., there is no stepwise increase in the size of the estimated treatment effect as one proceeds from left to right in the figure). Instead, Figure 2 shows no treatment effects in patients with low predicted disability (quartiles 1 and 2) and clinically relevant treatment effects in patients with high predicted disability (quartiles 3 and 4), suggesting that there may be a specific threshold of relevance. If that is the case, assuming a linear interaction of predicted risk with LESI treatment may be inappropriate. Future studies in other samples are needed to clarify if the interaction between LESI treatment effect and predicted risk has a specific threshold of relevance; if so, this threshold might be leveraged to offer LESI to those patients who are most likely to benefit. While our secondary analyses using multiple imputation did not show statistically significant heterogeneity of treatment effect across quartiles of predicted RMDQ scores (p=0.07), it should be noted that there was only a minor decrement in statistical significance when using multiple imputation compared to the complete-case analysis (p=0.03), which is likely explained by the reduction in explained variance of the predictive model when using multiple imputation (R2=0.29) as compared to the complete-case analysis (R2=0.34). Further studies in other samples will be needed to fully understand the utility of multivariable risk-modeling approaches to back pain treatments using the PATH framework, and modifications or updates to our model (Supplemental Table 3) that increase R2 in new samples may have higher yield. Nevertheless, the current findings suggest that future models developed through a multivariable risk-modelling approach can have potential clinical utility.
The predictive model we developed and validated using data from an observational study of older adults with non-specific back pain (R2=0.38 and 0.32, respectively) performed comparably well using data from an independent RCT of LESI for patients with symptomatic lumbar spinal stenosis (R2=0.34), even though the latter study had a different design and follow-up timepoint, and the patients had a different spine condition and greater severity with respect to all pain-related outcomes. The model’s external validity may have been due to the rigorous approach used in development, which included cross-validation during model training, LASSO penalization to minimize overfitting, testing in an independent sample, and the use of a large sample of 3256 individuals across the training and testing sets. Additionally, performance of the model across different conditions may have been facilitated by the fact that the model’s 16 variables were not specific to any one condition or treatment. For this reason, it might also perform well in studies of other back-related conditions and treatments, and we suggest that it be tested in other research contexts. On the other hand, it is possible that a model incorporating variables specific to lumbar spinal stenosis (clinical features, imaging, etc.), that was both developed and validated in patients with lumbar spinal stenosis, might have achieved greater predictive performance when applied to examinations of treatment effect heterogeneity in the LESS RCT. We note that, although the model developed in the current study may have utility for non-specific prediction of RMDQ outcomes irrespective of treatment, its true novelty pertains to its ability to identify subgroups of patients who will respond better to one treatment as compared to another, and this was the primary purpose for which it was developed. Accordingly, we suggest that the model’s ability to identify a high-risk subgroup patients likely to benefit from back pain treatments be tested in future confirmatory RCTs. If the model is shown in other contexts to be replicable with respect to non-specific prediction or identifying patients who are likely to benefit from treatment, it may have eventual clinical utility in patient care. Although this study examined treatment effect heterogeneity in the largest RCT to date of LESI for lumbar spinal stenosis, a limitation is that a pooled dataset of large, well-conducted RCTs of this treatment was not available for examining effect modification, as has been suggested may increase the likelihood that risk-modeling approaches are worthwhile.[20] This was not possible, as multiple large RCTs of LESI for lumbar spinal stenosis do not yet exist. Another limitation of this study was that only a subset of all BOLD and LESS participants were analyzed due to missing data for 1 or more variables, and a complete-case analytic approach used. This also underscores the need for attempted replication of the model in other contexts.
In conclusion, this study developed and validated a predictive model for back-related functional limitations at follow-up and used this model to estimate individualized treatment effects of LESI (vs. lidocaine-only epidural injections) in an RCT. This risk-modeling approach identified subgroups of patients in which there was a clinically relevant treatment effect of LESI on back-related functional limitations, and subgroups in which there was no meaningful effect. Multivariable risk-modeling approaches may have promise for identifying subgroups of patients with large-magnitude treatment effects.
Supplementary Material
ACKNOWLEDGEMENTS
Dr. Suri is a Staff Physician at the VA Puget Sound Health Care System in Seattle, Washington. Research reported in this publication was supported by the University of Washington Clinical Learning, Evidence And Research (CLEAR) Center for Musculoskeletal Research Methodologic and Resource Cores. CLEAR is a Core Center for Clinical Research (CCCR) funded by P30AR072572 from the National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS) of the National Institutes of Health. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
CONFLICT OF INTEREST
Dr Jarvik reported receiving royalties from Springer Publishing and Wolters/ Kluwer/UpToDate and receiving travel reimbursement from GE–Association of University Radiologists outside the submitted work. Dr Friedly reported receiving grants from the Department of Defense and salary support from the American Academy of Physical Medicine and Rehabilitation for serving as editor in chief outside the submitted work. Dr Suri reported receiving salary support from the American Academy of Physical Medicine and Rehabilitation for serving as deputy editor outside the submitted work. None of the other authors has potential conflicts of interest to report.
DATA AVAILABILITY
Deidentified versions of the BOLD and LESS study datasets can be made available for approved research purposes by contacting the CLEAR Center (https://theclearcenter.org/about/resource-core/). Additionally, the CLEAR Center Resource Core can be contacted if external investigators prefer to have the CLEAR center re-run the model development and validation stages using variables available in the external investigators’ own datasets, with sharing of the resulting summary results and avoiding the need for sharing individual-level data.
REFERENCES
- [1].Anderson DB, Luca K, Jensen RK, Eyles JP, Van Gelder JM, Friedly JL, Maher CG, Ferreira ML. A critical appraisal of clinical practice guidelines for the treatment of lumbar spinal stenosis. Spine J 2021;21(3):455–464. [DOI] [PubMed] [Google Scholar]
- [2].Artus M, van der Windt DA, Jordan KP, Hay EM. Low back pain symptoms show a similar pattern of improvement following a wide range of primary care treatments: a systematic review of randomized clinical trials. Rheumatology (Oxford) 2010;49(12):2346–2356. [DOI] [PubMed] [Google Scholar]
- [3].Braten LCH, Rolfsen MP, Espeland A, Wigemyr M, Assmus J, Froholdt A, Haugen AJ, Marchand GH, Kristoffersen PM, Lutro O, Randen S, Wilhelmsen M, Winsvold BS, Kadar TI, Holmgard TE, Vigeland MD, Vetti N, Nygaard OP, Lie BA, Hellum C, Anke A, Grotle M, Schistad EI, Skouen JS, Grovle L, Brox JI, Zwart JA, Storheim K, group AIMs. Efficacy of antibiotic treatment in patients with chronic low back pain and Modic changes (the AIM study): double blind, randomised, placebo controlled, multicentre trial. BMJ (Clinical research ed 2019;367:l5654. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Chou R, Loeser JD, Owens DK, Rosenquist RW, Atlas SJ, Baisden J, Carragee EJ, Grabois M, Murphy DR, Resnick DK, Stanos SP, Shaffer WO, Wall EM. Interventional therapies, surgery, and interdisciplinary rehabilitation for low back pain: an evidence-based clinical practice guideline from the American Pain Society. Spine (Phila Pa 1976) 2009;34(10):1066–1077. [DOI] [PubMed] [Google Scholar]
- [5].Chou R, Shekelle P. Will this patient develop persistent disabling low back pain? Jama 2010;303(13):1295–1302. [DOI] [PubMed] [Google Scholar]
- [6].Christley RC. Power and Error: Increased Risk of False Positive Results in Underpowered Studies. Open Epidemiology Journal 2010;3:16–19. [Google Scholar]
- [7].Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med 2015;162(1):55–63. [DOI] [PubMed] [Google Scholar]
- [8].Deyo RA, Bryan M, Comstock BA, Turner JA, Heagerty P, Friedly J, Avins AL, Nedeljkovic SS, Nerenz DR, Jarvik JG. Trajectories of symptoms and function in older adults with low back disorders. Spine (Phila Pa 1976) 2015;40(17):1352–1362. [DOI] [PubMed] [Google Scholar]
- [9].Dionne CE, Dunn KM, Croft PR. Does back pain prevalence really decrease with increasing age? A systematic review. Age Ageing 2006;35(3):229–234. [DOI] [PubMed] [Google Scholar]
- [10].Farrar JT, Young JP Jr., LaMoreaux L, Werth JL, Poole RM. Clinical importance of changes in chronic pain intensity measured on an 11-point numerical pain rating scale. Pain 2001;94(2):149–158. [DOI] [PubMed] [Google Scholar]
- [11].Foster NE, Anema JR, Cherkin D, Chou R, Cohen SP, Gross DP, Ferreira PH, Fritz JM, Koes BW, Peul W, Turner JA, Maher CG, Lancet Low Back Pain Series Working G. Prevention and treatment of low back pain: evidence, challenges, and promising directions. Lancet 2018. [DOI] [PubMed] [Google Scholar]
- [12].Friedly J, Chan L, Deyo R. Increases in lumbosacral injections in the Medicare population: 1994 to 2001. Spine 2007;32(16):1754–1760. [DOI] [PubMed] [Google Scholar]
- [13].Friedly J, Chan L, Deyo R. Geographic variation in epidural steroid injection use in medicare patients. J Bone Joint Surg Am 2008;90(8):1730–1737. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Friedly JL, Bresnahan BW, Comstock B, Turner JA, Deyo RA, Sullivan SD, Heagerty P, Bauer Z, Nedeljkovic SS, Avins AL, Nerenz D, Jarvik JG. Study protocol- Lumbar Epidural steroid injections for Spinal Stenosis (LESS): a double-blind randomized controlled trial of epidural steroid injections for lumbar spinal stenosis among older adults. BMC musculoskeletal disorders 2012;13:48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Friedly JL, Comstock BA, Turner JA, Heagerty PJ, Deyo RA, Sullivan SD, Bauer Z, Bresnahan BW, Avins AL, Nedeljkovic SS, Nerenz DR, Standaert C, Kessler L, Akuthota V, Annaswamy T, Chen A, Diehn F, Firtch W, Gerges FJ, Gilligan C, Goldberg H, Kennedy DJ, Mandel S, Tyburski M, Sanders W, Sibell D, Smuck M, Wasan A, Won L, Jarvik JG. A randomized trial of epidural glucocorticoid injections for spinal stenosis. N Engl J Med 2014;371(1):11–21. [DOI] [PubMed] [Google Scholar]
- [16].Hartvigsen J, Hancock MJ, Kongsted A, Louw Q, Ferreira ML, Genevay S, Hoy D, Karppinen J, Pransky G, Sieper J, Smeets RJ, Underwood M, Lancet Low Back Pain Series Working G. What low back pain is and why we need to pay attention. Lancet 2018. [DOI] [PubMed] [Google Scholar]
- [17].Jarvik JG, Comstock BA, Bresnahan BW, Nedeljkovic SS, Nerenz DR, Bauer Z, Avins AL, James K, Turner JA, Heagerty P, Kessler L, Friedly JL, Sullivan SD, Deyo RA. Study protocol: the Back Pain Outcomes using Longitudinal Data (BOLD) registry. BMC musculoskeletal disorders 2012;13:64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Jarvik JG, Gold LS, Comstock BA, Heagerty PJ, Rundell SD, Turner JA, Avins AL, Bauer Z, Bresnahan BW, Friedly JL, James K, Kessler L, Nedeljkovic SS, Nerenz DR, Shi X, Sullivan SD, Chan L, Schwalb JM, Deyo RA. Association of early imaging for back pain with clinical outcomes in older adults. JAMA 2015;313(11):1143–1153. [DOI] [PubMed] [Google Scholar]
- [19].Kent DM, Paulus JK, van Klaveren D, D’Agostino R, Goodman S, Hayward R, Ioannidis JPA, Patrick-Lake B, Morton S, Pencina M, Raman G, Ross JS, Selker HP, Varadhan R, Vickers A, Wong JB, Steyerberg EW. The Predictive Approaches to Treatment effect Heterogeneity (PATH) Statement. Ann Intern Med 2020;172(1):35–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Kent DM, van Klaveren D, Paulus JK, D’Agostino R, Goodman S, Hayward R, Ioannidis JPA, Patrick-Lake B, Morton S, Pencina M, Raman G, Ross JS, Selker HP, Varadhan R, Vickers A, Wong JB, Steyerberg EW. The Predictive Approaches to Treatment effect Heterogeneity (PATH) Statement: Explanation and Elaboration. Ann Intern Med 2020;172(1):W1–W25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Kent PM, Keating JL. Can we predict poor recovery from recent-onset nonspecific low back pain? A systematic review. Man Ther 2008;13(1):12–28. [DOI] [PubMed] [Google Scholar]
- [22].Kim LH, Vail D, Azad TD, Bentley JP, Zhang Y, Ho AL, Fatemi P, Feng A, Varshneya K, Desai M, Veeravagu A, Ratliff JK. Expenditures and Health Care Utilization Among Adults With Newly Diagnosed Low Back and Lower Extremity Pain. JAMA Netw Open 2019;2(5):e193676. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Kroenke K, Spitzer RL, Williams JB, Lowe B. The Patient Health Questionnaire Somatic, Anxiety, and Depressive Symptom Scales: a systematic review. General hospital psychiatry 2010;32(4):345–359. [DOI] [PubMed] [Google Scholar]
- [24].Obradovic M, Lal A, Liedgens H. Validity and responsiveness of EuroQol-5 dimension (EQ-5D) versus Short Form-6 dimension (SF-6D) questionnaire in chronic pain. Health Qual Life Outcomes 2013;11:110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Parreira P, Maher CG, Steffens D, Hancock MJ, Ferreira ML. Risk factors for low back pain and sciatica: an umbrella review. Spine J 2018;18(9):1715–1721. [DOI] [PubMed] [Google Scholar]
- [26].Peul WC, Brand R, Thomeer RT, Koes BW. Influence of gender and other prognostic factors on outcome of sciatica. Pain 2008;138(1):180–191. [DOI] [PubMed] [Google Scholar]
- [27].Pincus T, Vogel S, Burton AK, Santos R, Field AP. Fear avoidance and prognosis in back pain: a systematic review and synthesis of current evidence. Arthritis and rheumatism 2006;54(12):3999–4010. [DOI] [PubMed] [Google Scholar]
- [28].Rundell SD, Sherman KJ, Heagerty PJ, Mock CN, Dettori NJ, Comstock BA, Avins AL, Nedeljkovic SS, Nerenz DR, Jarvik JG. Predictors of Persistent Disability and Back Pain in Older Adults with a New Episode of Care for Back Pain. Pain medicine (Malden, Mass 2016. [DOI] [PubMed] [Google Scholar]
- [29].Stanhope J Brief Pain Inventory review. Occup Med (Lond) 2016;66(6):496–497. [DOI] [PubMed] [Google Scholar]
- [30].Starr JB, Gold L, McCormick Z, Suri P, Friedly J. Trends in lumbar radiofrequency ablation utilization from 2007 to 2016. Spine J 2019;19(6):1019–1028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Taylor JB, Goode AP, George SZ, Cook CE. Incidence and risk factors for first-time incident low back pain: a systematic review and meta-analysis. The spine journal : official journal of the North American Spine Society 2014;14(10):2299–2319. [DOI] [PubMed] [Google Scholar]
- [32].Team UBT. United Kingdom back pain exercise and manipulation (UK BEAM) randomised trial: effectiveness of physical treatments for back pain in primary care. BMJ (Clinical research ed 2004;329(7479):1377. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [33].Turner JA, Comstock BA, Standaert C, Heagerty PJ, Jarvik JG, Deyo RA, Wasan AD, Nedeljkovic SS, Friedly JL. Can Patient Characteristics Predict Benefit from Epidural Corticosteroid Injections for Lumbar Spinal Stenosis Symptoms? Spine J 2015. [DOI] [PubMed] [Google Scholar]
- [34].Virk SS, Phillips FM, Khan SN. Factors Affecting Utilization of Steroid Injections in the Treatment of Lumbosacral Degenerative Conditions in the United States. Int J Spine Surg 2018;12(2):139–148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [35].Yore MM, Ham SA, Ainsworth BE, Kruger J, Reis JP, Kohl HW 3rd, Macera CA. Reliability and validity of the instrument used in BRFSS to assess physical activity. Medicine and science in sports and exercise 2007;39(8):1267–1274. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Deidentified versions of the BOLD and LESS study datasets can be made available for approved research purposes by contacting the CLEAR Center (https://theclearcenter.org/about/resource-core/). Additionally, the CLEAR Center Resource Core can be contacted if external investigators prefer to have the CLEAR center re-run the model development and validation stages using variables available in the external investigators’ own datasets, with sharing of the resulting summary results and avoiding the need for sharing individual-level data.
