Abstract
Introduction
High-impact chronic pain (HICP) significantly affects the quality of life for millions of US adults, imposing substantial economic/healthcare burdens. Disproportionate effects are observed among racial/ethnic minorities and older adults.
Methods
We leveraged the National Health Interview Survey (NHIS) from 2016 (n=32 980), 2017 (n=26 700) and 2021 (n=28 740) to validate and develop analytical models for HICP. Initial models (2016 NHIS data) identified correlates associated with HICP, including hospital stays, diagnosis of specific diseases, psychological symptoms and employment status. We assessed the models’ generalisability and drew comparisons across time. We constructed five validation scenarios to account for variations in the availability of predictor variables across datasets and different time frames for pain assessment questions. We used logistic regression with Least Absolute Shrinkage and Selection Operator (LASSO) and random forest techniques. We assessed model discrimination, calibration and overall performance using metrics such as area under the curve (AUC), calibration slope and Brier score.
Results
Scenario 1, validating the NHIS 2016 model against 2017 data, demonstrated excellent discrimination with an AUC of 0.89 (95% CI 0.88 to 0.90) for both LASSO and random forest models. Subgroup-specific performance varied, with the lowest AUC among adults aged ≥65 years (0.81, 95% CI 0.78 to 0.82) and the highest among Hispanic respondents (0.91, 95% CI 0.88 to 0.94). Model calibration was generally robust, although underfitting was observed for Hispanic respondents (calibration slope: 1.31). Scenario 3, testing the NHIS 2016 model on 2021 data, showed reduced discrimination (AUC: 0.82, 95% CI 0.81 to 0.83) and overfitting (calibration slopes <1). De novo models based on 2021 data showed comparable discrimination (AUC: 0.86, 95% CI 0.85 to 0.87) but poorer calibration when validated against older datasets.
Conclusion
These findings underscore the potential of these models to guide personalised medicine strategies for HICP, aiming for more preventive rather than reactive healthcare. However, the model’s broader applicability requires further validation in varied settings and global populations.
Keywords: Public Health, Community Health, Cross-Sectional Studies
WHAT IS ALREADY KNOWN ON THIS TOPIC
High-impact chronic pain (HICP) significantly affects adults in the USA, with over 80% unable to work and one-third having difficulty with self-care activities. Previously developed models using the 2016 National Health Interview Survey (NHIS) have identified key factors associated with HICP, but these models lacked external validation to confirm their generalisability across different populations and time periods.
WHAT THIS STUDY ADDS
This study validates and extends previous models by testing them on NHIS data from 2017 to 2021. The models, including de novo models developed with 2021 data, show good predictive performance across different datasets and time periods, confirming the robustness of the key factors associated with HICP. The study also highlights the impact of changes in case definitions and demographic shifts on model performance.
HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY
The validated models provide healthcare professionals with reliable tools to identify individuals at risk for HICP, allowing for targeted interventions. By recognising modifiable factors such as socioeconomic status, emergency department visits and educational attainment, these models can guide the development of preventive and personalised medicine strategies. This approach can improve chronic pain management and reduce the burden on vulnerable populations.
Introduction
An estimated 11 million adults in the USA live with high-impact chronic pain (HICP) or chronic pain associated with functional limitations in work, life and social activities.1 More than 80% of people with HICP are unable to work for a living, and one in three has trouble with self-care activities such as getting dressed and bathing.2 Recently, we developed models for identifying patients with HICP using the 2016 National Health Interview Survey (NHIS),3 a valuable resource that provides comprehensive health information about the civilian non-institutionalised population in the USA. Our models capture various health-related variables, making it a potential source for deriving insights into HICP. We evaluated the accuracy and other model performance metrics of the models overall and by sex, age and race/ethnicity using NHIS data. The most important factors associated with HICP were the number of hospital day stays in the previous year, diagnosis of arthritis and rheumatism, psychological symptoms, employment status and the number of visits to the emergency department in the previous year. While previous models of HICP have demonstrated strong predictive performance within their development sample, they have not undergone validation across different time periods or independent populations. Our study aims to assess the temporal validity of these models by testing their predictive performance using NHIS datasets from later years (2017 and 2021), thus examining their robustness and generalisability over time. We acknowledge that true external validation in independent data sources remains an important next step to further confirm the models’ applicability across broader populations and settings.
Model validation studies in independent populations are essential to test generalisability.4 This is because model accuracy and calibration may be overestimated in the development population and model validation in independent, external datasets is necessary to identify a model’s utility in diverse settings and contexts. The National Academy of Medicine report on pain and the Department of Health and Human Services National Pain Strategy emphasised the need for epidemiological studies of chronic pain and HICP in the USA, particularly in subgroups susceptible to under management of pain—and to make that data actionable.5 6 In addition to helping better identify those with HICP, by elucidating the factors contributing to our models, we can further identify those which are modifiable. The modifiable factors can serve as targets for intervention which can then be tested for impact on HICP.
Our current study aims to validate and extend previously developed models of HICP using the NHIS data. Specifically, we hypothesise that the original HICP models based on NHIS 2016 data will demonstrate robust predictive performance when applied to later datasets from NHIS 2017 and 2021, despite variations in predictor availability and population characteristics. Additionally, we hypothesise that a de novo model developed with NHIS 2021 data—using an updated HICP case definition with a shorter recall period—will exhibit comparable or improved discrimination and calibration metrics relative to the 2016 models. To rigorously assess model performance, we developed five validation scenarios, each testing distinct aspects of model robustness: generalisability across different time periods, handling of missing and imputed variables and the impact of differing case definitions. These scenarios allow for a comprehensive evaluation of the models’ external validity, providing insights into their adaptability across varying datasets and potential utility in diverse clinical and public health contexts.
Methods
Conceptual framework
Figure 1 illustrates the conceptual framework proposing five generalisability scenarios for HICP models. We tested a variety of scenarios to determine how the HICP model would perform in different scenarios, such as in different time periods, missing variables, imputed variables, varying case definitions and de novo models. Specifically, we asked a few questions that could aid the generalisability of future models for HICP. (1) How well does the development model validate when a variable is missing in the validation dataset and the validation dataset is comprised of respondents from a later time period (scenario 1)? (2) How well does the development model perform when a missing variable is imputed in the validation population (scenario 2)? (3) How well does the development model perform when several variables are missing in the validation population (scenario 3)? (4) How well does a model developed using data from a more recent cross-sectional sample perform when applied to an earlier cross-sectional sample (scenarios 4 and 5)?
Figure 1. Conceptual framework illustrating five generalisability scenarios for high-impact chronic pain (HICP) models. The scenarios explore the use of temporal validity, handling of missing variables, variable imputation, updated case definitions and the creation of de novo models across development and validation datasets derived from the National Health Interview Survey (NHIS) datasets of different years. Detailed annotations clarify dataset-specific considerations such as the inclusion/exclusion of variables and recall periods.
Data sources and outcomes
The data sources used in this analysis were the 2016, 2017 and 2021 NHIS datasets, which are cross-sectional surveys with multistage area probability design.7 The NHIS surveys are conducted annually among the civilian non-institutionalised US population. The HICP question in the NHIS survey (the primary outcome variable) was collected from adults aged 18 years. In the NHIS 2016 and 2017, the following question was asked to the respondents: ‘In the past 6 months, how often did you have pain?’. In the NHIS 2021, the question was: ‘In the past 3 months, how often did you have pain?’. The responses were never, some days, most days, every day, refused, not ascertained and don’t know. ‘Refused’, ‘not ascertained’ and ‘don’t know’ responses were considered invalid and dropped from the final analytic sample. Participants who responded ‘never’ were classified as ‘no HICP’. Participants who responded ‘some days, most days or every day’ were further asked the following question in the NHIS 2016 and 2017: ‘over the past 6 months, how often did pain limit your life or work activities?’. In the NHIS 2021, the question was: ‘over the past 3 months, how often did pain limit your life or work activities?’. The responses were: ‘never’, ‘some days’, ‘most days’, ‘every day’, ‘refused’, ‘not ascertained’ and ‘don’t know’. Again, refused, not ascertained and don’t know responses were considered invalid and dropped from the final analytic sample. Participants who responded ‘never’ or ‘some days’ were classified as ‘no HICP’. Participants who responded ‘most days’ or ‘every day’ were classified as ‘HICP’.1 8
Study variables
We considered the same list of variables used in developing the 2016 models. Online supplemental table 1 shows the availability and completeness of the variables across the three datasets. The 2016 dataset included 43 variables, and the 2017 dataset included 42 variables coded exactly as the 2016 variables. The 2021 dataset included 31 variables coded the same as the 2016 dataset. However, 11 variables from 2016 are not available in 2021, and three variables are coded differently from 2016.
Descriptive analyses
We assessed the distribution of variables using descriptive statistics and plots. We reported the mean (SD) for continuous variables and frequency (percentage) for categorical variables.
Missing values in covariates
The NHIS 2016, 2017 and 2021 datasets were missing 13.6%, 13.7% and 7.2% of the values in the variables, respectively. The variable ‘Crohn’s disease’ was unavailable in the NHIS 2017 dataset. Additional to the Crohn’s disease variable, the following variables were unavailable in the 2021 NHIS: good neighbourhood, number of surgeries in past 12 months, ulcer, emphysema, heart condition/disease, trouble falling asleep, heavy drinker, hospital stays days, better health status and physical activity. We imputed missing values of the covariates using multiple imputation techniques under the missing at random assumption. In scenario 1, NHIS 2016 and 2017 datasets were merged, and predictors containing missing values, including Crohn’s disease, were imputed. In scenario 2, we excluded Crohn’s disease from the NHIS 2016 dataset, and predictors containing missing values were imputed. In scenario 3, we excluded Crohn’s disease from NHIS 2017 dataset, and predictors containing missing values were imputed. In scenario 4, in addition to the Crohn’s disease variable, the following variables contained 100% missing values in the 2021 NHIS dataset: good neighbourhood, number of surgeries in past 12 months, ulcer, emphysema, heart condition/disease, trouble falling asleep, heavy drinker, hospital stays days, better health status and physical activity. These predictors were excluded, and other predictors containing missing values were imputed. Patterns of missingness are shown in online supplemental figures 1–3).
Deal with missing values for the models using the 2017 dataset
We considered two scenarios: imputing missing covariates with or without Crohn’s disease. First, we imputed missing values for all variables including Crohn’s disease. We combined the NHIS 2016 and 2017 datasets and imputed 20 datasets using multiple imputation techniques.9 Variables used to build the imputation model included only those selected as relevant predictors for HICP, along with HICP outcome status and geographical strata information, rather than all variables available in the dataset. This approach ensured that only pertinent variables contributed to the imputation process, aligning with the study’s focus on targeted predictors. We followed the multiple imputation with prediction averaging approach, that is, we made predictions for each validation/testing dataset and then averaged the values over 20 datasets.10 11 Second, we imputed missing values in covariates without imputing Crohn’s disease. For that, we imputed 20 datasets using multiple imputation techniques, separately for the NHIS 2016 and NHIS 2017 datasets.
Deal with missing values for the models on the 2021 dataset
We imputed missing values for all variables, but excluding 11 variables with 100% missing (Crohn’s disease, good neighbourhood, number of surgeries in past 12 months, ulcer, emphysema, heart condition/disease, trouble falling asleep, heavy drinker, hospital stays days, better health status and physical activity). We imputed 20 datasets using multiple imputation techniques, separately for 2016 and 2021 datasets.9 Variables used to build the imputation model included all variables, HICP outcome status and the geographical strata information. Again, we followed the multiple imputation with prediction averaging approach.10 11 We used the prediction average approach to combine the predictions from multiple imputed data. In this approach, individual predictions from imputed datasets are averaged to produce a single prediction per case, which is then used to report model performance (eg, area under the curve (AUC), calibration slope, and Brier score in our study).12 13
The imputation approach differed between the 2017 and 2021 datasets due to differences in the extent of missing data. In the 2017 dataset, only one predictor variable (Crohn’s disease) was unavailable, and we included it in the imputation to maintain consistency with the 2016 dataset. However, in the 2021 dataset, 11 predictor variables were unavailable. Including such many missing variables in the imputation process could have compromised imputation accuracy and introduced potential bias. Therefore, we opted to exclude these 11 variables from the imputation in the 2021 dataset to preserve the integrity of the model.
External validation
For external validation of the prediction models, we considered the models developed based on the NHIS 2016 dataset3 and the NHIS 2017 as the first validation/testing dataset and 2021 as the second validation/testing dataset. First, we built the model and obtained the prediction probabilities for the first imputed testing dataset(s). Next, we repeated the process for each of the 20 imputed datasets. Finally, we averaged the prediction probabilities over 20 datasets to get a single column of prediction probabilities. We considered five scenarios as shown in figure 1.
Model building
We used the same modelling strategies that we used to build the internally validated models.3 Briefly, we fitted separate logistic regression with Least Absolute Shrinkage and Selection Operator (LASSO) (a parametric model) and random forest (a nonparametric model) for predicting HICP. We accounted for survey weights as design weights in both models to obtain population-level predictions.14 The hyperparameters of the LASSO model were chosen using fivefold cross-validation, while the hyperparameters of the random forest model were tuned using the out-of-sample error technique.15 We included the Optimism-corrected odds in online supplemental figures 4-7).
Model evaluation
We evaluated the performance of the models for the overall sample and within the sociodemographic subgroups in terms of discrimination (how well the method can separate individuals with or without HICP) and calibration (agreement between the observed and predicted risk of HICP).16 We calculated the AUC as a measure of discrimination and the calibration slope as the calibration measure.16 We also calculated the Brier score for overall model performance, where the Brier score is composed of discrimination and calibration.16 A higher AUC value indicates good discrimination, the calibration slope of approximately one indicates a well-calibrated model (less than one means overfitting and greater than one means underfitting) and a smaller Brier score indicates a better model. We also estimated the sensitivity or true-positive rate for the global and subgroup-specific thresholds calculated at the various percentiles of predictions in the datasets. We also estimated the positive predictive value at different percentiles of predictions. We reported the optimism-corrected log-odds ratios from the best LASSO models regarding AUC and calibration (global and subgroup-specific). We used the fivefold cross-validation with the prediction averaging approach described above, where hyperparameters of the model were also chosen using fivefold cross-validation.3
Results
The flow chart depicting the selection of the study participants in NHIS 2016, 2017 and 2021 datasets is shown in figure 2. After excluding missing information on pain information, our analytic sample size was 32 980 for NHIS 2016, 26 700 for NHIS 2017 and 28 740 for NHIS 2017 (online supplemental table 2). Scenario 1 was the best-performing scenario regarding model discrimination and calibration. In scenario 1, the NHIS 2016 model without the Crohn’s disease variable was validated in the NHIS 2017 data without the Crohn’s disease variable (figure 1). HICP prevalence in NHIS 2016 was 9.42% compared with 8.81% in NHIS 2017 (online supplemental table 3). The distribution of age and sex was comparable in both samples, with those aged ≥65 years comprising ~24% and females comprising ~53–54%. Non-Whites comprised ~32% of both samples. There were no differences in other sociodemographic factors, psychosocial distribution, comorbidities, health utilisation and behavioural variables. The global and subgroup-specific models performed well. The AUC of the global LASSO model was 0.89 (95% CI 0.88 to 0.90), and the random forest model was also 0.89 (95% CI 0.88 to 0.89) (table 1). The AUCs did not vary by subgroup, except that older respondents had lower AUC compared with younger respondents (~0.81). The AUCs were highest among Hispanic respondents (0.91–0.92). These findings were consistent with the other calibration metrics. The random forest models are mostly underfitted (slope >1), while Lasso models are mostly well-calibrated (slope ~1) (online supplemental table 4). However, the models with Hispanic respondents were underfitted, with the calibration slopes being as high as 1.31. All models had comparable Brier scores except for adults aged ≥65 years (online supplemental table 5). The optimism-corrected log-odds for the global and subgroup-specific models are shown in online supplemental table 6.
Figure 2. Flowchart depicting the selection process for study participants across NHIS datasets from 2016, 2017 to 2021. It includes the total population for each year, the exclusion of individuals due to missing information on pain questions, and the final numbers analysed, categorised into those with and without HICP. The chart demonstrates the attrition at each step and the distribution of participants with HICP within the analysed datasets. HICP, high-impact chronic pain; NHIS, National Health Interview Survey.
Table 1. Area under the curve (95% CI) for the overall and within sociodemographic subgroups for scenario one where models were developed on the imputed NHIS 2016 datasets (excluding the Crohn’s disease variable) and tested on the imputed NHIS 2017 datasets (excluding the Crohn’s disease variable).
Models on all data | Subgroup specific models | |||
LASSO | Random forest | LASSO | Random forest | |
All data | 0.888 (0.879–0.897) | 0.885 (0.876–0.894) | N/A | N/A |
Sex | ||||
Male | 0.889 (0.875–0.904) | 0.884 (0.870–0.899) | 0.887 (0.873–0.902) | 0.883 (0.869–0.898) |
Female | 0.885 (0.874–0.897) | 0.885 (0.873–0.896) | 0.886 (0.874–0.898) | 0.882 (0.870–0.894) |
Age | ||||
<65 years | 0.909 (0.899–0.920) | 0.908 (0.897–0.918) | 0.911 (0.900–0.921) | 0.907 (0.897–0.917) |
≥65 years | 0.808 (0.789–0.826) | 0.801 (0.783–0.820) | 0.810 (0.792–0.828) | 0.796 (0.777–0.815) |
Race/ethnicity | ||||
White | 0.881 (0.870–0.892) | 0.878 (0.867–0.890) | 0.880 (0.869–0.891) | 0.878 (0.867–0.889) |
Black | 0.886 (0.861–0.910) | 0.883 (0.858–0.908) | 0.883 (0.858–0.909) | 0.878 (0.853–0.904) |
Hispanic | 0.919 (0.895–0.942) | 0.910 (0.886–0.935) | 0.907 (0.881–0.932) | 0.910 (0.885–0.935) |
LASSOLeast Absolute Shrinkage and Selection OperatorNHISNational Health Interview Survey
In scenario 2, the NHIS 2016 model with the Crohn’s disease variable was validated in the NHIS 2017 data with the Crohn’s disease variable (figure 1). The AUCs for the global and subgroup specific models ranged between 0.80 and 0.92. As above, model discrimination was lowest in respondents aged ≥65 years (0.80, 95% CI 0.78 to 0.82) (online supplemental table 7). The models were underfitted among Hispanic respondents (online supplemental table 8). The models had similar Brier scores, except for the model for adults aged ≥65 years (online supplemental table 9).
In scenario 3, the NHIS 2016 model without 11 variables was validated in the NHIS 2021 data without 11 variables (figure 1). The NHIS 2021 dataset had a higher proportion of male respondents than the NHIS 2016 dataset (51.6% vs 46.0%) and a higher proportion of US-born respondents (92.4% vs 84.4%) (online supplemental table 10). The global and subgroup-specific models performed well (table 2). The AUC of the global LASSO model was 0.89 (95% CI 0.88 to 0.90), and the random forest model was also 0.89 (95% CI 0.88 to 0.89) (table 2). In addition, the NHIS 2021 dataset had a higher proportion of high school/General Educational Development graduates compared with the NHIS 2016 dataset (30.2% vs 24.2%) and a higher proportion of hourly employees (50.0% vs 33.1%). We observed a lower prevalence of psychological symptoms in the NHIS 2021 respondents (11.2% vs 16.8%). The 2021 NHIS respondents had a higher number of visits (2+ visits) to the emergency department in the previous year (18.6% vs 8.1%). Otherwise, the two datasets were similar in terms of race/ethnicity, marital status, geographical distribution, health insurance coverage, sexual orientation and comorbidities. The NHIS 2016 model showed good discrimination in the NHIS 2021 datasets (table 2). However, these metrics were lower than the ones observed in scenario 1. For example, the AUC of the global LASSO model was 0.82 (95% CI 0.81 to 0.83), and the random forest model was also 0.81 (95% CI 0.80 to 0.82) (table 2). As above, the AUC for adults ≥65 years was lowest (0.76). The model was overfitted in the combined and subgroup specific datasets (calibration slopes <1) (online supplemental table 11). However, the Brier scores were generally low and close to 0.05, indicating overall performance (online supplemental table 12).
Table 2. Area under the curve (95% CI) for the overall and within sociodemographic subgroups for scenario 3 where models were developed on the imputed NHIS 2016 datasets (excluding 11 predictors) and tested on the imputed NHIS 2021 datasets (excluding 11 predictors).
Models on all data | Subgroup specific models | |||
LASSO | Random forest | LASSO | Random forest | |
All data | 0.819 (0.808–0.830) | 0.811 (0.800–0.822) | N/A | N/A |
Sex | ||||
Male | 0.834 (0.818–0.850) | 0.824 (0.807–0.840) | 0.829 (0.813–0.845) | 0.823 (0.806–0.839) |
Female | 0.805 (0.790–0.819) | 0.800 (0.785–0.814) | 0.802 (0.787–0.817) | 0.797 (0.782–0.812) |
Age | ||||
<65 years | 0.828 (0.815–0.841) | 0.821 (0.808–0.834) | 0.828 (0.816–0.841) | 0.821 (0.808–0.835) |
≥65 years | 0.756 (0.735–0.777) | 0.746 (0.725–0.766) | 0.756 (0.735–0.777) | 0.737 (0.716–0.758) |
Race/ethnicity | ||||
White | 0.819 (0.806–0.832) | 0.811 (0.798–0.824) | 0.820 (0.807–0.833) | 0.811 (0.797–0.824) |
Black | 0.838 (0.809–0.866) | 0.830 (0.801–0.859) | 0.833 (0.804–0.862) | 0.822 (0.793–0.852) |
Hispanic | 0.793 (0.763–0.823) | 0.785 (0.754–0.815) | 0.787 (0.757–0.818) | 0.779 (0.748–0.810) |
LASSOLeast Absolute Shrinkage and Selection OperatorNHISNational Health Interview Survey
In scenario 4, a de novo model was created using the NHIS 2021 data without eleven variables. This model was validated using the NHIS 2016 model without eleven variables (figure 1). The de novo NHIS 2021 model performed well in terms of discrimination but performed poorly in terms of calibration when validated on the NHIS 2016 datasets. The models showed good discrimination, with AUCs ranging from 0.78 to 0.89 (online supplemental table 13). Arthritis and rheumatism, poverty status, number of times in the emergency department, region and educational attainment were the top ranked variables in the global de novo model using NHIS 2021 data (figure 3). However, the model was underfitted in the combined and subgroup specific datasets (calibration slopes >1) (online supplemental table 14). Conversely, the Brier scores were comparable with cenario 3 (online supplemental table 15).
Figure 3. Ranking of variables associated with HICP for NHIS 2021, stratified by demographic and clinical subgroups. The variables are ranked globally and across categories such as sex, age groups and race/ethnicity. Arthritis and rheumatism, poverty status and emergency room visits consistently rank highly, with subgroup-specific variations highlighted. The table provides a comprehensive view of the most influential predictors of HICP across diverse populations. COPD, chronic obstructive pulmonary disease; ED, emergency department; ER, emergency room; HICP, high-impact chronic pain; NHIS, National Health Interview Survey.
In scenario 5, the de novo 2021 model without eleven variables was validated using the NHIS 2017 model without eleven variables (figure 1). The de novo NHIS 2021 model performed well in terms of discrimination when validated on the NHIS 2017 datasets. The AUC was 0.86 for the overall model and ranging from 0.77 to 0.89 for the subgroup specific models (online supplemental table 16). However, the models were poorly calibrated, either underfitted or overfitted (online supplemental table 17). The Brier scores were comparable with scenario 4 (online supplemental table 18).
Discussion
In our cross-sectional study, we report on the external validation of prediction models for HICP using the NHIS data. We found that the demographic composition of the three datasets and their associations with HICP were comparable. Future studies are needed to confirm the replicability of the identified correlates of HICP in longitudinal studies.17 In clinical and public health settings, identifying key correlates such as arthritis, poverty status, number of emergency department visits, educational attainment and health insurance status in HICP models would provide a crucial foundation for targeted interventions. Clinicians can adapt these models to proactively identify individuals at elevated risk, allowing customised treatment plans. For arthritis-related pain, tailored interventions may include adopting a biopsychosocial model of pain management and rehabilitation strategies.18 Addressing poverty status involves implementing socioeconomic support programmes to alleviate financial barriers to healthcare access, ensuring that all individuals have the means to manage their chronic pain effectively.19 Strategies to reduce emergency department visits could involve community-based initiatives and preventive care measures, including individualised care plans and pain management policies in acute care settings.20 Recognising the influence of educational attainment on HICP requires tailored educational initiatives to empower individuals with the knowledge necessary for effective pain management.21 By integrating these findings into clinical and public health practices, stakeholders can work collaboratively to develop holistic approaches that address the multifaceted nature of HICP. As noted in the initial derivation paper, the models demonstrated reduced reliability among participants aged ≥65 years.3 This trend persisted in the current study. This discrepancy suggests that age-related factors, such as comorbidities or differences in pain perception and reporting, might influence the predictive accuracy of HICP models. Future research could explore whether age-specific adaptations to HICP case definitions or predictive factors might improve model performance for older populations. Clinically, this finding highlights the importance of considering demographic variability when applying HICP models in practice, as older adults may require tailored approaches to more accurately identify and manage chronic pain.
In the original 2016 NHIS analysis, the most important correlates of HICP were number of hospital stay days, presence of arthritis and rheumatism, psychological symptoms, number of times in the emergency department, and poverty status.3 These correlates are associated with ill-health. In the de novo models created using the 2021 NHIS dataset, no variable for the number of hospital stays existed. However (with some notable exceptions), most of the other variables that ranked in the top ten in the 2016 NHIS dataset were also ranked highest in 2021. Surprisingly, psychological symptoms only ranked 15th in the 2021 NHIS dataset, a time that reflects the post-COVID-19 pandemic era. We observed a lower prevalence of psychological symptoms in 2021 compared with 2016 (11.2% vs 16.8%, respectively). Emerging trends indicate that the COVID-19 pandemic has shaped mental health. In the USA, trends in average anxiety and depression scores showed a 15% rise and peak during the early months of the pandemic (August–December 2020) and a decrease of 25% between December and June 2022.22 In addition, acute and post-acute symptoms of the infection often include fatigue, anxiety, depression and other neuropsychiatric sequelae.23 In a recent longitudinal study, Ziadni et al described the course of pandemic-related stressors and patient-reported outcomes among patients receiving care at tertiary pain clinics between May 2020 and June 2022, noting a lack of adverse physical and mental health during the study period.24 However, the impact of pandemic-related stressors on HICP is currently unknown and needs to be a topic of future inquiry.
Poverty status and region ranked higher (second and fourth, respectively) in the 2021 NHIS dataset compared with the 2016 NHIS dataset (sixth and 10th, respectively). We observed a lower prevalence of respondents living below 100% of the federal poverty level (FPL) in 2021 compared with 2016 (9.8% vs 15.2%, respectively). In addition, there was an increase in respondents who were hourly employees in 2021 compared with 2016 (50.0% vs 33.1%, respectively). As expected, we found a dose response association between poverty level and HICP; the prevalence increases from 3.9% in respondents living above the 400% FPL to 14.2% among respondents who lived below 100% of the FPL in 2021.25 One of the lasting legacies of the COVID-19 pandemic is its disproportionate impact on the economically disadvantaged and ‘forgotten vulnerable’ who are often living in overcrowded housing and employed in occupations that may not allow remote work, increasing their risk of exposure to COVID-19.26 Employment losses were vast during the pandemic, with more than 30 million jobs eliminated at the beginning of the pandemic; in addition, preventive and chronic disease care may have been involuntarily disrupted, for example, cancelled or delayed.27 28 Taken together, the direct impact of socioeconomic factors on HICP, in particular, in the post-COVID era is still unknown. Future studies are needed to disentangle how these factors impact the risk of chronic pain and disability.
To our knowledge, this is the first empirical comparison of the construct validity of HICP case definitions by differing recall periods. We observed that the change in HICP case definition did not impact the performance of the NHIS 2016 model in the 2021 NHIS dataset. In scenario 3, we observed good discriminative performance regardless of whether the old definition (6-month recall period in the 2016 NHIS) was validated in a more recent period (2021 NHIS). We found that the models were overfitted in scenario 3. This could be due to the difference between the characteristics of the variables in the development and the validation datasets and the different HICP definitions. Since the characteristics of most of the variables and the HICP definition in the 2021 NHIS differ from those of the 2016 NHIS, the associations between the variables in the 2016 NHIS and 2021 NHIS are expected to differ. As such, we observed overfitting even if we used models that can give us sparse models (eg, LASSO) and methods that can provide optimism-corrected prediction (eg, cross-validation). Conversely, in scenario 4, the new definition (3-month recall period in the 2021 NHIS) was validated in the older period (2016 NHIS). We found that the models were underfitted in scenario 4. Underfitting can occur when a model is too simple, especially when only a few variables is applied to a large dataset. Like overfitting, underfitted models are not reliable for predictions since these models cannot establish the predominant trend in the data. The case definition for HICP changed from a 6-month recall in 2016 to a 3-month recall period in 2019, when the estimated prevalence was 7.4%. The estimated prevalence of HICP was even lower in 2021 NHIS (6.9%) compared with 2016 (8.0%). Whether this reduction is due to the change in case definition is unknown and should be the subject of future research inquiry.
This study has a few limitations. Despite the strengths and generalisability of our models, there are also limitations inherent to the data. First, there were a few missing variables in the 2017 and 2021 NHIS datasets. We were able to impute the one unmeasured variable in 2017, but we were unable to impute the eleven unmeasured variables in 2021, especially the highest ranked variable in the original 2016 model (ie, number of hospitalisations). Despite the absence of these variables, the model showed good discrimination and decent calibration on validation. Second, NHIS followed a cross-sectional study design, which limits the temporality between the variables and HICP. Future follow-up studies should build on the evidence accumulated by the current study. Third, HICP and most of the variables’ information is self-reported. Thus, there could be over- or under-reporting of HICP and misclassification in the variables. Fourth, we did not incorporate any interaction terms in our models. However, our LASSO models without interaction terms perform as good as the random forest models that can automate the complex interactions between the variables. Fifth, it is certainly possible that some important risk factors for HICP are missing, for example, history of violent injury and genetic risk factors. Sixth, the variables of HICP in the original model also ranked high in the 2017 and 2021 NHIS datasets despite the absence of 11 variables in the 2021 NHIS dataset. Seventh, Rubin’s rule could not be applied to LASSO-based prediction models with multiple imputed datasets, as LASSO selects different predictors for each imputation, nor to random forest models, which lack coefficients to pool. Instead, we used the prediction average approach, a recommended method for machine-learning-based prediction models with multiple imputed datasets, effectively addressing these challenges while acknowledging potential limitations in capturing variability across imputations. Lastly, the generalisability of the models could be limited to the US population. Future research should use data from other countries to externally validate the models.
Despite the study limitations, our study’s five validation scenarios underscore the importance of evaluating model performance under different conditions, highlighting the adaptability and generalisability of our HICP models across diverse datasets and temporal contexts. Scenario 1, which tested the 2016 model with the 2017 dataset, established the model’s baseline generalisability with minimal missing data, resulting in high discrimination and calibration scores. Scenarios 2 and 3 emphasised the model’s robustness when handling datasets with varying levels of missing variables, allowing us to understand how imputed values impact predictive accuracy. Scenarios 4 and 5, involving a de novo model based on the 2021 dataset, provided insights into model performance with more recent data and a shorter recall period for HICP, revealing areas of underfitting and overfitting. Together, these scenarios illustrate that the model’s performance remains consistent across sociodemographic subgroups, though lower discrimination was noted among older adults. This approach of using multiple scenarios allows us to more comprehensively assess model applicability in real-world settings, reinforcing its potential utility for identifying high-risk individuals in diverse patient populations. The medical and research community continues its pursuit of improving patient care through personalised medicine. Using predictive tools, combined with a clinician’s expertise, can prove invaluable in mitigating the adverse impacts of HICP and meet the HHS Healthy People 2030 goal of reducing HICP in society.29
To translate the model into clinical practice, further testing is essential to validate its performance in real-world healthcare settings. One approach could involve conducting pilot studies in pain clinics or specialty clinics with a high prevalence of HICP, such as rheumatology clinics, to evaluate the model’s predictive accuracy and its ease of integration into existing EHR systems. Additionally, incorporating clinician feedback during testing could refine the model’s usability and ensure that predictions align with clinical insights. For implementation, the model could be embedded in EHR platforms to flag patients at higher risk for HICP, enabling early, personalised interventions. Training for healthcare providers on interpreting model outputs would further support practical application. Ultimately, widespread clinical adoption would require larger-scale validation and assessment of how model-informed interventions impact patient outcomes, particularly in reducing pain and improving quality of life.
supplementary material
Footnotes
Funding: Dr. Sean Mackey (K24-NS-126781 and R61-NS-11865) and Dr. Titilola Falasinnu (T32-DA-035165) received funding from the NIH National Institute on Drug Abuse. Dr. Sean Mackey also acknowledges support from the Redlich Pain Research Endowment (Grant/Award Number: Not Applicable). Dr. Titilola Falasinnu (K01-AR-079039) received funding from the NIH National Institute of Arthritis and Musculoskeletal and Skin Diseases. Dr. Kenneth Weber II (K23-NS-104211) received funding from the NIH National Institute of Neurological Disorders and Stroke. The authors report no conflicts of interest.
Provenance and peer review: Not commissioned; externally peer-reviewed.
Patient consent for publication: Not applicable.
Ethics approval: Not applicable.
Data availability free text: Data obtained from NHIS: https://www.cdc.gov/nchs/nhis/index.html.
Patient and public involvement: Patients and/or the public were not involved in the design, conduct, reporting or dissemination plans of this research.
Data availability statement
Data are available in a public, open access repository.
References
- 1.Dahlhamer J, Lucas J, Zelaya C, et al. Prevalence of Chronic Pain and High-Impact Chronic Pain Among Adults - United States, 2016. MMWR Morb Mortal Wkly Rep. 2018;67:1001–6. doi: 10.15585/mmwr.mm6736a2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Pitcher MH, Von Korff M, Bushnell MC, et al. Prevalence and Profile of High-Impact Chronic Pain in the United States. J Pain. 2019;20:146–60. doi: 10.1016/j.jpain.2018.07.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Falasinnu T, Hossain MB, Weber KA, II, et al. The Problem of Pain in the United States: A Population-Based Characterization of Biopsychosocial Correlates of High Impact Chronic Pain Using the National Health Interview Survey. J Pain. 2023;24:1094–103. doi: 10.1016/j.jpain.2023.03.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Steyerberg EW, Vergouwe Y. Towards better clinical prediction models: seven steps for development and an ABCD for validation. Eur Heart J. 2014;35:1925–31. doi: 10.1093/eurheartj/ehu207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Institute of Medicine, Board on Health Sciences Policy, Committee on Advancing Pain Research, Care, and Education . Relieving Pain in America: A Blueprint for Transforming Prevention, Care, Education, and Research. National Academies Press; 2011. [PubMed] [Google Scholar]
- 6.National Pain Strategy: A Comprehensive Population Health-level Strategy for Pain. 2015
- 7.CDC . National Health Interview Survey; 2024. [6-Dec-2024]. NHIS questionnaires, datasets, and documentation.https://www.cdc.gov/nchs/nhis/documentation/index.html Available. Accessed. [Google Scholar]
- 8.Zelaya CE, Dahlhamer JM, Lucas JW, et al. Chronic Pain and High-impact Chronic Pain Among U.S. Adults, 2019. NCHS Data Brief. 2020:1–8. [PubMed] [Google Scholar]
- 9.Hoogland J, van Barreveld M, Debray TPA, et al. Handling missing predictor values when validating and applying a prediction model to new patients. Stat Med. 2020;39:3591–607. doi: 10.1002/sim.8682. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Rodgers DM, Jacobucci R, Grimm KJ. A Multiple Imputation Approach for Handling Missing Data in Classification and Regression Trees. Jbds. 2021;1:127–53. doi: 10.35566/jbds/v1n1/p6. [DOI] [Google Scholar]
- 11.Twala B. An empirical comparison of techniques for handling incomplete data using decision trees. Appl Artif Intell. 2009;23:373–405. doi: 10.1080/08839510902872223. [DOI] [Google Scholar]
- 12.Gunn HJ, Hayati Rezvan P, Fernández MI, et al. How to apply variable selection machine learning algorithms with multiply imputed data: A missing discussion. Psychol Methods. 2023;28:452–71. doi: 10.1037/met0000478. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Wahl S, Boulesteix A-L, Zierer A, et al. Assessment of predictive performance in incomplete data by combining internal validation and multiple imputation. BMC Med Res Methodol. 2016;16:144. doi: 10.1186/s12874-016-0239-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Best H. The SAGE Handbook of Regression Analysis and Causal Inference. SAGE Publications; [Google Scholar]
- 15.Bartz-Beielstein T, Chandrasekaran S, Rehbach F, et al. In: Hyperparameter Tuning for Machine and Deep Learning with R: A Practical Guide. Bartz E, Bartz-Beielstein T, Zaefferer M, editors. Singapore: Springer Nature Singapore; 2023. Case study i: tuning random forest (ranger) pp. 187–220. [Google Scholar]
- 16.Steyerberg EW, Vickers AJ, Cook NR, et al. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology (Sunnyvale) 2010;21:128–38. doi: 10.1097/EDE.0b013e3181c30fb2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Cure-Cure C, Cure P. Validation of cross-sectional studies with long-term longitudinal studies. Osteoporos Int. 2012;23:399. doi: 10.1007/s00198-011-1784-x. [DOI] [PubMed] [Google Scholar]
- 18.Kudrina I, Shir Y, Fitzcharles M-A. Multidisciplinary treatment for rheumatic pain. Best Pract Res Clin Rheumatol. 2015;29:156–63. doi: 10.1016/j.berh.2015.04.029. [DOI] [PubMed] [Google Scholar]
- 19.Quiton RL, Leibel DK, Boyd EL, et al. Sociodemographic patterns of pain in an urban community sample: an examination of intersectional effects of sex, race, age, and poverty status. Pain. 2020;161:1044–51. doi: 10.1097/j.pain.0000000000001793. [DOI] [PubMed] [Google Scholar]
- 20.Wong CK, O’Rielly CM, Teitge BD, et al. The Characteristics and Effectiveness of Interventions for Frequent Emergency Department Utilizing Patients With Chronic Noncancer Pain: A Systematic Review. Acad Emerg Med. 2020;27:742–52. doi: 10.1111/acem.13934. [DOI] [PubMed] [Google Scholar]
- 21.Newman AK, Van Dyke BP, Torres CA, et al. The relationship of sociodemographic and psychological variables with chronic pain variables in a low-income population. Pain. 2017;158:1687–96. doi: 10.1097/j.pain.0000000000000964. [DOI] [PubMed] [Google Scholar]
- 22.Jia H, Guerin RJ, Barile JP, et al. National and State Trends in Anxiety and Depression Severity Scores Among Adults During the COVID-19 Pandemic - United States, 2020-2021. MMWR Morb Mortal Wkly Rep. 2021;70:1427–32. doi: 10.15585/mmwr.mm7040e3. [DOI] [PubMed] [Google Scholar]
- 23.Penninx B, Benros ME, Klein RS, et al. How COVID-19 shaped mental health: from infection to pandemic effects. Nat Med. 2022;28:2027–37. doi: 10.1038/s41591-022-02028-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Ziadni MS, Jaros S, Anderson SR, et al. A Longitudinal Investigation of the Impact of COVID-19 on Patients With Chronic Pain. J Pain. 2023;24:1830–42. doi: 10.1016/j.jpain.2023.05.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Rikard SM, Strahan AE, Schmit KM, et al. Chronic Pain Among Adults - United States, 2019-2021. MMWR Morb Mortal Wkly Rep. 2023;72:379–85. doi: 10.15585/mmwr.mm7215a1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Patel JA, Nielsen FBH, Badiani AA, et al. Poverty, inequality and COVID-19: the forgotten vulnerable. Public Health (Fairfax) 2020;183:110–1. doi: 10.1016/j.puhe.2020.05.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Berkowitz SA, Basu S. Unemployment Insurance, Health-Related Social Needs, Health Care Access, and Mental Health During the COVID-19 Pandemic. JAMA Intern Med. 2021;181:699–702. doi: 10.1001/jamainternmed.2020.7048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Callison K, Ward J. Associations Between Individual Demographic Characteristics And Involuntary Health Care Delays As A Result Of COVID-19. Health Aff (Millwood) 2021;40:837–43. doi: 10.1377/hlthaff.2021.00101. [DOI] [PubMed] [Google Scholar]
- 29.Reduce the proportion of adults with chronic pain that frequently limits life or work activities — cp-01. [20-Oct-2023]. https://health.gov/healthypeople/objectives-and-data/browse-objectives/chronic-pain/reduce-proportion-adults-chronic-pain-frequently-limits-life-or-work-activities-cp-01 Available. Accessed.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data are available in a public, open access repository.