Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Jun 27.
Published in final edited form as: Lancet Digit Health. 2022 Jun;4(6):e455–e465. doi: 10.1016/S2589-7500(22)00062-0

Developing and validating a machine-learning algorithm to predict opioid overdose in Medicaid beneficiaries in two US states: a prognostic modelling study

Wei-Hsuan Lo-Ciganic 1,2, Julie M Donohue 3,4, Qingnan Yang 5, James L Huang 6, Ching-Yuan Chang 7, Jeremy C Weiss 8, Jingchuan Guo 9,10,11, Hao H Zhang 12, Gerald Cochran 13, Adam J Gordon 14,15, Daniel C Malone 16, Chian K Kwoh 17, Debbie L Wilson 18, Courtney C Kuza 19, Walid F Gellad 20,21,22
PMCID: PMC9236281  NIHMSID: NIHMS1810681  PMID: 35623798

Summary

Background

Little is known about whether machine-learning algorithms developed to predict opioid overdose using earlier years and from a single state will perform as well when applied to other populations. We aimed to develop a machine-learning algorithm to predict 3-month risk of opioid overdose using Pennsylvania Medicaid data and externally validated it in two data sources (ie, later years of Pennsylvania Medicaid data and data from a different state).

Methods

This prognostic modelling study developed and validated a machine-learning algorithm to predict overdose in Medicaid beneficiaries with one or more opioid prescription in Pennsylvania and Arizona, USA. To predict risk of hospital or emergency department visits for overdose in the subsequent 3 months, we measured 284 potential predictors from pharmaceutical and health-care encounter claims data in 3-month periods, starting 3 months before the first opioid prescription and continuing until loss to follow-up or study end. We developed and internally validated a gradient-boosting machine algorithm to predict overdose using 2013–16 Pennsylvania Medicaid data (n=639 693). We externally validated the model using (1) 2017–18 Pennsylvania Medicaid data (n=318 585) and (2) 2015–17 Arizona Medicaid data (n=391 959). We reported several prediction performance metrics (eg, C-statistic, positive predictive value). Beneficiaries were stratified into risk-score subgroups to support clinical use.

Findings

A total of 8641 (1·35%) 2013–16 Pennsylvania Medicaid beneficiaries, 2705 (0·85%) 2017–18 Pennsylvania Medicaid beneficiaries, and 2410 (0·61%) 2015–17 Arizona beneficiaries had one or more overdose during the study period. C-statistics for the algorithm predicting 3-month overdoses developed from the 2013–16 Pennsylvania training dataset and validated on the 2013–16 Pennsylvania internal validation dataset, 2017–18 Pennsylvania external validation dataset, and 2015–17 Arizona external validation dataset were 0·841 (95% CI 0·835–0·847), 0·828 (0·822–0·834), and 0·817 (0·807–0·826), respectively. In external validation datasets, 71 361 (22·4%) of 318 585 2017–18 Pennsylvania beneficiaries were in high-risk subgroups (positive predictive value of 0·38–4·08%; capturing 73% of overdoses in the subsequent 3 months) and 40 041 (10%) of 391 959 2015–17 Arizona beneficiaries were in high-risk subgroups (positive predictive value of 0·19–1·97%; capturing 55% of overdoses). Lower risk subgroups in both validation datasets had few individuals (≤0·2%) with an overdose.

Interpretation

A machine-learning algorithm predicting opioid overdose derived from Pennsylvania Medicaid data performed well in external validation with more recent Pennsylvania data and with Arizona Medicaid data. The algorithm might be valuable for overdose risk prediction and stratification in Medicaid beneficiaries.

Funding

National Institute of Health, National Institute on Drug Abuse, National Institute on Aging.

Introduction

The USA continues to grapple with an opioid epidemic, with an estimated 75 673 opioid overdose deaths in the 12-month period ending in April 2021.1 Health systems, payers, and policy makers have implemented various policies and programmes to mitigate the crisis. The President’s Commission on Combating Drug Addiction and the Opioid Crisis in 2017 recommended applying advanced data analytics to improve identification of individuals at high risk of opioid overdose (hereafter overdose).2 Within the last 5 years, studies identified shortcomings of current opioid risk prediction tools and called for the development of more advanced models to improve identification of individuals at risk (or at no risk) of overdose.2

Our previous work showed that machine-learning approaches can improve risk prediction and stratification for incident opioid use disorder and subsequent overdose in Medicare beneficiaries.3,4 Medicaid is one of the largest US health-care payers. Medicaid beneficiaries have low incomes, substantial physical and mental comorbidities, and more often suffer from substance use disorders, placing them at greater risk of opioid misuse and overdose. However, few studies have developed prediction algorithms to identify Medicaid beneficiaries at high risk of overdose.57 Furthermore, little is known about whether prediction algorithms developed using earlier years from one US state will perform as well using more recent data or when applied to another state.8

To fill these knowledge gaps, we studied two state Medicaid programmes, Pennsylvania and Arizona, with different population characteristics and overdose rates.9 In 2018, Pennsylvania ranked fourth and Arizona ranked 21st in US drug overdose mortality.9 We first used Pennsylvania Medicaid claims data from 2013–16 to develop a machine-learning algorithm to predict overdose in the subsequent 3 months. Second, we externally validated our prediction algorithm using more recent years (ie, 2017–18) of Pennsylvania Medicaid data and externally validated it using 2015–17 Arizona Medicaid data.

Methods

Study design and data

In this prognostic modelling study we used administrative claims data of Medicaid beneficiaries in Pennsylvania from Jan 1, 2013, to Dec 31, 2016, to develop a machine learning algorithm for overdose prediction. To evaluate whether the algorithm performs well using recent years of data and whether it translates to another state Medicaid programme, we conducted two validations using: (1) Pennsylvania Medicaid data from Jan 1, 2017, to Dec 31, 2018 (ie, the 2017–18 Pennsylvania external validation dataset), and (2) Arizona Medicaid data from Jan 1, 2015, to December 31, 2017 (ie, the 2015–17 Arizona external validation dataset). Pennsylvania and Arizona have different geographical locations, population characteristics, and overdose rates with different drugs involved.10 Pennsylvania Medicaid ranks as the fourth largest of the 50 states in Medicaid expenditures11 and fifth in enrolment (approximately 3 million beneficiaries annually),10 while Arizona Medicaid ranks 14th in total expenditures11 and 12th in enrolment (approximately 1·7 million beneficiaries annually). Pennsylvania and Arizona implemented Medicaid eligibility expansion from the Affordable Care Act in different years (Arizona on Jan 1, 2014; Pennsylvania on Jan 1, 2015).

The Pennsylvania and the Arizona Medicaid datasets captured demographic information, eligibility, and enrolment information, outpatient, inpatient, and professional services, and prescription drugs. Prescription data contained all prescriptions reimbursed by Medicaid and included national drug codes, the dates of prescription fills, quantities dispensed, and days of supply. The prescriber information (eg, specialty) was available in the Pennsylvania Medicaid dataset but unavailable for the Arizona Medicaid dataset. We were able to link Arizona Medicaid data with death certificates data, allowing identification of fatal opioid overdoses not present in Medicaid claims, but not for the Pennsylvania dataset. We constructed study cohorts and created candidate predictors for the 2017–18 Pennsylvania and 2015–17 Arizona datasets as we did for the 2013–16 Pennsylvania dataset. We then applied the prediction algorithm to the external datasets. The study complied with Standards for Reporting of Diagnostic Accuracy and the Transparent Reporting of a Multivariable Prediction Model for Individual Prognostic or Diagnosis reporting guidelines (appendix pp 2528).12,13 It was approved by University of Pittsburgh and University of Florida Institutional Review Boards (human research ethics committees).

We identified Medicaid beneficiaries aged 18–64 years who filled one or more opioid prescription (excluding buprenorphine formulations approved for pain by the US Food and Drug Administration). An index date was defined as the date of an individual’s first opioid prescription during the study period. We excluded beneficiaries who: (1) only filled parenteral opioids or cough or cold medications containing opioids, or both; (2) had malignant cancer diagnoses;3 (3) received hospice care; (4) were dually eligible for Medicare or enrolled in special Medicaid programmes that we were unable to completely observe their health services or prescription drug use; (5) had the first opioid prescription during the study period’s last 3 months, or did not have at least a 3-month look-back period before their first opioid prescription, to allow measuring candidate predictors; or (6) were not enrolled for 3 months after the first opioid fill (appendix p 9). Beneficiaries remained in the cohort once eligible, regardless of whether they continued to receive opioids or had an occurrence of overdose, until they died or disenrolled from Medicaid. The same criteria were applied to datsets from both states.

We used International Classification of Diseases codes (ICD) versions 9 and 10 (appendix p 3) to identify any occurrence of opioid overdose (including prescription opioids and heroin) from inpatient or emergency department visits in 3-month periods from the first index opioid prescription.3 Overdose was defined as an opioid overdose code as the primary diagnosis, or other drug overdose or substance use disorders as the primary diagnosis (appendix p 4) with opioid overdose as a nonprimary diagnosis.3

To be consistent with previous literature and quarterly evaluation periods commonly used by prescription drug monitoring programmes and health plans, we chose 3 months for the predictors’ and outcomes’ measurement windows.14,15 Candidate predictors of overdose (n=284) included sociodemographics, patient health status, use patterns of opioid and other non-opioid prescriptions, and provider-level and regional-level factors measured at baseline (during the 3-month period before the first opioid fill) and in 3-month windows after initiating prescription opioids (appendix pp 56). We updated the predictors measured in each 3-month period to account for changes over time for predicting overdose risks in each subsequent period (appendix p 10). This timeupdating approach mimics active surveillance health systems might adopt.3

Statistical analysis

Our machine learning analysis using 2013–16 Pennsylvania Medicaid data comprised two steps: (1) developing a prediction model and creating overdose risk prediction scores for each individual, and (2) risk stratifying individuals into subgroups with similar overdose risks. We conducted external validation of the developed prediction algorithm using the 2017–18 Pennsylvania and the 2015–17 Arizona Medicaid datasets, respectively.

First, we randomly and equally divided the 2013–16 Pennsylvania Medicaid cohort into training (developing algorithms), testing (refining algorithms), and internal validation (evaluating algorithms’ prediction performance) datasets. We compared beneficiaries’ characteristics in training, testing, and internal validation datasets using two-tailed Student’s t test, χ2 test, and analysis of variance, or corresponding non-parametric tests. We applied several commonly used methods such as multivariate logistic regression, penalised regression, random forests, and gradient-boosting machine (GBM) to develop and test overdose prediction algorithms. Consistent with previous studies,3,5 GBM yielded the best prediction results (C-statistic of 0·841 for GBM vs up to 0·820 for other methods; appendix p 11) with an ability to handle complex interactions between predictors and outcomes. The study’s objective was to externally validate the best-performing algorithm; thus, we focused on reporting the GBM model (appendix pp 12). Using the internal validation datset of the 2013–16 Pennsylvania Medicaid cohort to assess the prediction algorithm’s discrimination performance (ie, the extent to which predicted high-risk patients exhibit higher overdose rates compared with those predicted as low risk), we report C-statistics (0·700–0·800=good; >0·800=very good)16 and precision-recall curves.17 However, since C-statistics do not account for outcome prevalence information, which is important given the rarity of overdose events, we also report sensitivity, specificity, positive predictive value, negative predictive value, positive likelihood ratio, negative likelihood ratio, number needed to evaluate to identify one overdose, and estimated rate of alerts (appendix pp 1213).18,19 Given that beneficiaries could have multiple 3-month periods until occurrence of a censored event (disenrolment or death), we presented episode-level performance as the main result. We conducted sensitivity analyses that iteratively and randomly selected patient-level random subsets from the internal validation data to ensure consistency of prediction performance.

No single prediction probability threshold to define high risk suits every purpose, as it is determined by the outcome’s risk and benefit profile, type of interventions, and resource availability. Therefore, our main analysis classified validation datset beneficiaries into subgroups using decile thresholds of predicted overdose risk scores from the training algorithm (ie, fixed thresholds) to allow comparison of risk profiles in different validation datasets. We further split the highest decile into three strata based on top first, second to fifth, and sixth to tenth percentiles to allow closer examination of patients at highest risk of experiencing an overdose. We thus created 12 risk subgroups. As an alternative, we conducted secondary analyses using decile risk score thresholds derived from each validation dataset to stratify beneficiaries into 12 risk subgroups. We created calibration plots (composed of 20 population bins of equal size) to examine the extent to which predicted overdose risks agree with observed risks by risk subgroup. We also present different thresholds along with other metrics at multiple levels of sensitivity and specificity (eg, arbitrarily choosing 90% sensitivity or a threshold with balanced sensitivity and specificity identified by the Youden Index as an anchor).20

In external validation analyses, we applied the developed GBM algorithm to the 2017–18 Pennsylvania and 2015–17 Arizona datasets. Because the original prediction algorithm included two prescriber-level variables, and we did not have prescriber information in the Arizona Medicaid dataset, the model automatically imputed these two prescriber-level variables for the Arizona dataset using the median values from the 2013–16 Pennsylvania dataset. As done in the 2013–16 Pennsylvania Medicaid dataset, we evaluated prediction performance. We also used risk score thresholds derived from the training dataset to classify beneficiaries in these validation cohorts into 12 risk subgroups, the definition of which was based on the training dataset (2013–16 Pennsylvania) risk scores.

In secondary analyses, we first reported the top 25 important predictors from the GBM model. Second, we compared our prediction performance over a 12-month period with any of the opioid measures included in the Core Set of Adult Health Care Quality Measures for Medicaid21 or Medicaid Section 1115 Substance Use Disorder Demonstrations (hereafter Medicaid opioid measures) used by US states to identify high-risk individuals or substance use behaviour in Medicaid. These include three metrics: high-dose use defined as more than 120 morphine milligram equivalent for 90 continuous days or longer; four or more opioid prescribers and four or more pharmacies; and concurrent opioid and benzodiazepine use for 30 days or longer. Third, we used Arizona Medicaid beneficiaries death certificate data to identify fatal opioid overdoses that did not receive medical attention to determine whether the prediction algorithm worked to predict fatal overdoses using International Classification of Diseases, Tenth Revision (ICD-10) underlying cause-of-death codes X42, X44, Y12, and Y14 for accidental and undetermined overdose and multiple cause-of-death codes T40.1 (heroin), T40.2 (natural and semisynthetic opioids), T40.3 (methadone), and T40.4 (synthetic opioids other than methadone).22

Analyses were performed using SAS 9.4 and Salford Predictive Modeler software suite version 8.2.

Role of the funding source

The funder had no role in the study design and conduct; data collection, management, analysis, and interpretation; manuscript preparation, review, or approval; and decision to submit the manuscript for publication.

Results

Beneficiaries in training, testing, and internal validation datasets of the 2013–16 Pennsylvania dataset (n=213 231 in each dataset used for model development) had similar characteristics and outcome distributions (table 1). Compared with Pennsylvania Medicaid beneficiaries in the 2013–16 dataset, Pennsylvania Medicaid beneficiaries in the 2017–18 external validation dataset (n=318 585) were more likely to be older (39·2 vs 36·2–36·3 years) and newly eligible for Medicaid (42·9% vs 22·0–22·1%) and to have opioid use disorder diagnoses (5·9% vs 4·5–4·6%), whereas Medicaid beneficiaries in the Arizona dataset (n=391 959) were more likely to be of other or unknown race (36·2% vs 12·7–12·8%) and less likely to have opioid use disorder diagnoses (2·8% vs 4·5–4·6%). Rates of one or more opioid-overdose episodes during the study period were lower in the 2017–18 Pennsylvania (0·8%) and 2015–17 Arizona (0·6%) validation datasets, compared with the 2013–16 Pennsylvania algorithm-development dataset (1·3–1·4%).

Table 1:

Selected characteristics of Pennsylvania and Arizona Medicaid beneficiaries

2013–16 Pennsylvania Medicaid (n=639 693)
External validation datasets
Training (n=213 231) Testing (n=213 231) Internal validation (n=213 231) 2017–18 Pennsylvania Medicaid (n=318 585) 2015–17 Arizona Medicaid (n=391 959)

Had ≥1 opioid overdose episode 2894 (1·4%) 2899 (1·4%) 2848 (1·3%) 2705 (0·8%) 2410 (0·6%)
Mean age, years 36·3 (12·2) 36·3 (12·3) 36·2 (12·2) 39·2 (12·5) 37·3 (12·8)
Age group, years
 18–30 87 767 (41·2%) 88 033 (41·3%) 88 294 (41·4%) 101 438 (31·8%) 150 595 (38·4%)
 31–40 52 403 (24·6%) 51 767 (24·3%) 52 074 (24·4%) 81 832 (25·7%) 94 507 (24·1%)
 41–50 38 348 (18·0%) 38 513 (18·1%) 38 390 (18·0%) 63 958 (20·1%) 71 377 (18·2%)
 51–64 34 499 (16·2%) 34 647 (16·2%) 34 246 (16·1%) 71 357 (22·4%) 75 480 (19·3%)
Sex
 Female 141 207 (66·2%) 141 084 (66·2%) 141 166 (66·2%) 209 397 (65·7%) 248 551 (63·4%)
 Male 72 024 (33·8%) 72 147 (33·8%) 72 065 (33·8%) 109 188 (34·3%) 143 408 (36·6%)
Race
 White 129 585 (60·8%) 129 668 (60·8%) 129 779 (60·9%) 198 166 (62·2%) 215 020 (54·9%)
 Black 56 318 (26·4%) 56 264 (26·4%) 56 476 (26·5%) 79 482 (24·9%) 34 954 (8·9%)
 Other or unknown 27 328 (12·8%) 27 299 (12·8%) 26 976 (12·7%) 40 937 (12·8%) 141 985 (36·2%)
Metropolitan residence 186 161 (87·3%) 186 316 (87·4%) 186 479 (87·5%) 276 640 (86·8%) 358 433 (91·4%)
Medicaid eligibility group at index
 Disabled 62 742 (29·4%) 62 609 (29·4%) 62 592 (29·4%) 70 025 (22·0%) 22 018 (5·6%)
 Newly eligible 46 819 (22·0%) 47 034 (22·1%) 46 956 (22·0%) 136 536 (42·9%) 190 911 (48·7%)
 Non-disabled adults 103 670 (48·6%) 103 588 (48·6%) 103 683 (48·6%) 112 024 (35·2%) 179 030 (45·7%)
Number of opioid fills 2·0 (1·7) 2·0 (1·7) 2·0 (1·7) 2·0 (1·7) 1·8 (1·5)
Average daily MME 37·1 (60·8) 37·0 (48·7) 37·0 (80·4) 37·7 (229·4) 39·3 (42·6)
Cumulative days of concurrent opioid and benzodiazepine use 3·0 (12·6) 2·9 (12·5) 3·0 (12·7) 3·9 (15·0) 2·7 (12·1)
Number of gabapentinoid fills 0·19 (0·7) 0·19 (0·7) 0·19 (0·7) 0·39 (1·1) 0·22 (0·8)
Number of hospitalisations 0·07 (0·3) 0·07 (0·4) 0·07 (0·4) 0·08 (0·3) 0·07 (0·4)
Number of emergency department visits 0·65 (1·2) 0·65 (1·2) 0·65 (1·3) 0·64 (1·3) 0·79 (1·4)
Opioid use disorder 9694 (4·5%) 9729 (4·6%) 9523 (4·5%) 18 730 (5·9%) 10 884 (2·8%)
Alcohol use disorder 5864 (2·8%) 5814 (2·7%) 5900 (2·8%) 9431 (3·0%) 11 554 (2·9%)
Anxiety disorders 26 579 (12·5%) 26 436 (12·4%) 26 368 (12·4%) 60 225 (18·9%) 47 132 (12·0%)
Mood disorders 38 809 (18·2%) 38 539 (18·1%) 38 325 (18·0%) 67 635 (21·2%) 48 132 (12·3%)

Data are n (%) or mean (SD). MME=morphine milligram equivalent.

Figure 1 summarises four prediction performance measures for the GBM models using the internal validation dataset from the 2013–16 Pennsylvania dataset and the external validation datasets. At the episode level, the prediction algorithm performed well in all three validation datasets (C-statistic of 0·841 [95% CI 0·835–0·847] for the 2013–16 Pennsylvania dataset, 0·828 [0·822–0·834] for the 2017–18 Pennsylvania dataset, and 0·817 [0·807–0·826] for the 2015–17 Arizona dataset; figure 1A). The improved performance in the internal-validation and external validation Pennsylvania datasets compared with the Arizona Medicaid dataset in precision-recall curves (figure 1B), number needed to evaluate (figure 1C), and positive alerts per 100 beneficiaries (figure 1D) were mainly driven by higher opioid overdose rates in Pennsylvania (table 1).

Figure 1: Performance matrix for predicting opioid overdose using GBM in Pennsylvania and Arizona Medicaid beneficiaries.

Figure 1:

(A) Areas under the receiver operating characteristic curves (or C-statistics). (B) Precision-recall curves (precision=positive predictive value and recall=sensitivity)—precision recall curves that are closer to the upper right corner or have a larger AUC than another method have improved performance. (C) The number needed to evaluate (by different cutoffs of sensitivity). (D) Alerts per 100 patients (by different cutoffs of sensitivity). Arizona Medicaid 2015–17=2015–17 Arizona external validation dataset (391 959 beneficiaries with 2 549 039 non-overdose episodes and 2172 overdose episodes). AUC=area under the curve. GBM=gradient boosting machine Pennsylvania Medicaid 2013–16=2013–16 Pennsylvania internal validation dataset (213 231 beneficiaries with 1 745 919 non-overdose episodes and 3377 overdose episodes). Pennsylvania Medicaid 2017–18=2017–18 Pennsylvania external validation dataset (318 585 beneficiaries with 1 825 672 non-overdose episodes and 3032 overdose episodes).

Prediction performance measures by varying sensitivity and specificity levels (90–100%) are shown in the appendix (pp 78). In the 2013–16 Pennsylvania internal validation dataset, at the balanced threshold using the Youden index, the GBM model had a 75·3% sensitivity, 78·5% specificity, 0·7% positive predictive value, 99·9% negative predictive value, number needed to evaluate of 149 to identify one opioid overdose, and approximately 22 positive alerts per 100 beneficiaries. In the 2017–18 Pennsylvania external validation dataset, at the balanced threshold, the GBM model had a 71·4% sensitivity, 79·9% specificity, 0·6% positive predictive value, 99·9% negative predictive value, number needed to evaluate of 171 to identify one opioid overdose, and approximately 20 positive alerts per 100 beneficiaries. In the 2015–17 Arizona external validation dataset, at the balanced threshold, the GBM model had a 67·2% sensitivity, 84·0% specificity, 0·4% positive predictive value, 100% negative predictive value, number needed to evaluate of 281 to identify one opioid overdose, and approximately 16·1 positive alerts per 100 beneficiaries. Sensitivity analyses using randomly and iteratively selected patient-level data overall yielded similar results as using episode-level data (appendix pp 1415).

Figure 2 depicts the actual overdose rate for individuals in the internal validation dataset using risk score thresholds derived from the 2013–16 Pennsylvania training dataset. The highest-risk subgroup (risk scores in the top 1st percentile; 1·3% [n=2666]) had a positive predictive value of 2·2%, a negative predictive value of 97·8%, and number needed to evaluate of 45. Among 343 individuals with overdose in the 2013–16 Pennsylvania internal validation dataset, 253 (73·8%) individuals were in the top two deciles of risk scores (shown in figure 2 as the top four risk subgroups). The 3rd–10th decile subgroups had minimal overdose rates (0–15 per 10 000). In external validation analyses (figure 3), the overall baseline overdose rate in 3-month windows was higher in the 2017–18 than in the 2013–16 Pennsylvania Medicaid dataset (0·29% vs 0·17%). The overall baseline 3-month overdose rate was also lower in the 2015–17 Arizona external validation dataset (0·09%). In the 2017–18 Pennsylvania external validation dataset (figure 3A), the highest-risk subgroup (or risk subgroup 1, 0·23% [n=736]) had a positive predictive value of 4·1%, a negative predictive value of 95·9%, and number needed to evaluate of 25. Of 912 individuals with overdose in the 2017–18 Pennsylvania external validation dataset, 661 (72·5%) individuals were in the top four risk subgroups. The fifth to 12th risk subgroups had minimal overdose rates (ranging from 0·03% to 0·21%). In the 2015–17 Arizona external validation dataset (figure 3B), the highest risk subgroup (risk subgroup 1, 0·10% [n=407]) had a positive predictive value of 1·97%, a negative predictive value of 98·0%, and number needed to evaluate of 51. Of 342 individuals with overdose in the 2015–17 Arizona external validation dataset, 187 (54·7%) individuals were in the top four risk subgroups. Consistent with the Pennsylvania external validation dataset, the fifth to 12th risk subgroups had minimal overdose rates (2–14 per 10 000). Similar magnitudes were found using risk score thresholds derived from each corresponding validation dataset (appendix pp 1617). Additional calibration curves are shown in the appendix (pp 1819).

Figure 2: Opioid overdose identified by risk subgroup in the 2013–16 internal validation Pennsylvania Medicaid dataset (n=213 231) using GBM.

Figure 2:

Based on the individual’s predicted probability of an opioid overdose (fatal or non-fatal) event, we classified 203 179 beneficiaries in the validation datasets into modified decile risk subgroups, with the highest decile further split into three additional strata based on the top first, second–fifth, and sixth–tenth percentiles to allow closer examination of beneficiaries at highest risk of experiencing an overdose. We used the thresholds of the risk scores derived from the 2013–16 Pennsylvania training dataset to identify a beneficiary’s risk subgroup: top first percentile (≥98·3); second–fifth percentile (96·6≤risk score<98·3); sixth–tenth percentile (64·9≤risk score<96·6); decile 2 (47·6≤risk score<64·9); decile 3 (38·4≤risk score<47·6); decile 4 (32·2≤risk score<38·4); decile 5 (27·5≤risk score<32·2); decile 6 (23·8≤risk score<27·5); decile 7 (20·4≤risk score<23·8); decile 8 (18·8≤risk score<20·4); decile 9 (14·2≤risk score<18·8); decile 10 (14·2<risk score). GBM=gradient boosting machine.

Figure 3: Opioid overdose identified by risk subgroup in the 2017–18 Pennsylvania (n=318 585) and 2015–17 Arizona Medicaid (n=391 959) external validation datasets using GBM.

Figure 3:

Based on the individual’s predicted probability of an opioid overdose (fatal or non-fatal) event, we classified beneficiaries in the two validation datasets into risk subgroups using the modified decile thresholds of the risk scores derived from the 2013–16 Pennsylvania training dataset, with the highest risk decile further split into three additional strata based on the top first, second–fifth, and sixth–tenth percentiles to allow closer examination of beneficiaries at highest risk of experiencing an overdose. The thresholds of the risk scores derived from the 2013–16 Pennsylvania training dataset to identify a beneficiary’s risk subgroup are: top first percentile (≥98·3); second–fifth percentile (96·6≤risk score<98·3); sixth–tenth percentile (64·9≤risk score<96·6); decile 2 (47·6≤risk score<64·9); decile 3 (38·4≤risk score<47·6); decile 4 (32·2≤risk score<38·4); decile 5 (27·5≤risk score<32·2); decile 6 (23·8≤risk score<27·5); decile 7 (20·4≤risk score<23·8); decile 8 (18·8≤risk score<20·4); decile 9 (14·2≤risk score<18·8); decile 10 (14·2<risk score). GBM=gradient boosting machine.

The top 25 most important predictors identified by the GBM model, such as having a diagnosis of OUD, total number of emergency department visits, race, gender, and age, are shown in the appendix (p 20). Table 2 compared prediction performance with existing Medicaid opioid measures over a 12-month period in the 2013–16 Pennsylvania Medicaid dataset. Using existing Medicaid opioid measures for identifying high-risk individuals (9·3%) captured 24·2% of all actual overdose cases (number needed to evaluate of 62) over a 12-month period. When using the top fifth percentile of our risk scores to identify high risk, our GBM algorithm captured 66·1% of all actual opioid overdose cases (number needed to evaluate of 19), despite there being a similar number of high-risk individuals identified in both models. Our analysis predicting fatal overdose in the Arizona external validation dataset (84% were captured in claims data) yielded similar findings (eg, C-statistic of 0·814; 95% CI 0·796–0·831; appendix pp 2122).

Table 2:

Comparison of prediction performance using any of the Medicaid opioid quality measures versus GBM in the 2013–16 Pennsylvania internal validation sample (n=135 106) over a 12-month period*

Any Medicaid core set opioid measure
High risk in GBM using different thresholds
Low risk (n=122 538, 90·7%) High risk (n=12 568, 9·3%) Top first percentile (n=4570, 3·4%) Top fifth percentile (n=11 053, 8·2%) Top tenth percentile (n=23 158, 17·1%)

Number of actual overdoses (% of each subgroup) 639 (0·5%) 204 (1·6%) 299 (6·5%) 557 (5·0%) 713 (3·1%)
Number of actual non-overdoses (% of each subgroup) 121 899 (99·5%) 12 364 (98·4%) 4271 (93·5%) 10 496 (95·0%) 22 445 (96·9%)
Number needed to evaluate NA 62 15 19 32
% of all overdoses over 12 months (n=843) captured 75·8% 24·2% 35·5% 66·1% 84·6%

MME=morphine milligram equivalent. GBM=gradient boosting machine.

*

To compare with Medicaid opioid measures, beneficiaries were required to have at least a 12-month period of follow-up and the resulting sample size was smaller than the sample size in the main analysis. If classifying beneficiaries with any of the Medicaid high-risk opioid use measures as opioid overdose, those remaining would be considered as non-overdose.

The Medicaid opioid quality measures included in the Core Set of Adult Health Care Quality Measures for Medicaid or Medicaid Section 1115 Substance Use Disorder Demonstrations to identify high-risk individuals or use behaviour in Medicaid. These simple measures include three metrics: (1) high-dose use, defined as >120 MME for ≥90 continuous days, (2) ≥4 opioid prescribers and ≥4 pharmacies, and (3) concurrent opioid and benzodiazepine use ≥30 days.

For GBM, we presented high-risk groups using different cutoff thresholds of prediction probability: individuals with (1) predicted score in the top first percentile (≥98·3); (2) predicted probability in the top fifth percentile (≥96·6); and (3) predicted probability in the top tenth percentile (≥64·9). If classifying beneficiaries in the high-risk group of opioid overdoses, those remaining would be considered as non-overdose.

Discussion

We developed and externally validated a machine-learning algorithm with strong performance for predicting 3-month risk of opioid overdose in Medicaid beneficiaries. Our study shows that an opioid overdose prediction algorithm developed in one state’s Medicaid programme can effectively translate to later time periods in the same US state and to a different state, addressing a major concern about the generalisability of opioid overdose prediction models. In addition, the algorithm represents an improvement on less accurate opioid risk measures currently tracked by state Medicaid programmes.21,23,24

To our knowledge, this study is the first study predicting 3-month opioid overdose risk among Medicaid populations with external validation after initiation of prescription opioids. We only identified two previous studies that used Medicaid data to predict risk of opioid use disorder development within 1 or 5 years after initiating opioid prescriptions,6,7 but neither examined overdose risk, nor did they include external validation. Studies using advanced methods to more accurately identify individuals at risk of overdose are needed. This study expanded our previous work using machine-learning approaches to improve accuracy of predicting overdose in the subsequent 3 months in a large state Medicaid dataset and broaden applicability of these models across state Medicaid programmes.5 Our best-performing GBM has several advantages, including handling missing data automatically, no additional feature selection process required prior to the GBM modelling, greater flexibility in hyper parameter tuning to include complex interactions between predictors and outcomes, and often providing better performance compared with other approaches.3,5 We acknowledge, however, that the flexibility during model tuning can be time-consuming and computationally expensive.

Our Arizona Medicaid external validation analysis showed the feasibility of applying our prediction model to other state Medicaid programmes with very different population race and ethnicity, geography, and overdose rates compared with Pennsylvania Medicaid. When using fixed risk score thresholds identified from the Pennsylvania dataset, fewer Arizona Medicaid beneficiaries were classified into the top four high-risk subgroups (eg, 10% for the Arizona dataset vs 22% for the 2017–18 Pennsylvania dataset) with 55% of all overdoses captured. Using various risk scores identified from each validation dataset, over 75% of overdoses were captured in the top three decile groups in all validation datasets (appendix pp 1617). The model showed good performance across states and the potential clinical and policy use of different risk stratification approaches without any major change or adaptation of the models developed from a US state Medicaid programme. Future work should further validate the applicability of our model to other state Medicaid programmes. To maintain the model’s generalisability, the model might need to be recalibrated in different states such as Midwest and South regions.

Although our model had good discrimination (C-statistics >0·80), rare overdose outcomes led to low positive predictive values that could increase false positives and overestimate the benefits or underestimate intervention costs and resources. In cases such as these, reporting additional measures, including sensitivity, number needed to evaluate, and estimated alert rate provides for more thorough evaluation of a clinical prediction model’s performance.18 For example, number needed to evaluate is the number of patients necessary to evaluate in order to detect one outcome using an early warning tool (ie, the machine-learning algorithm in our study) versus if no tools existed.18 When predicting a rare outcome like overdose, the number needed to evaluate closely estimates the number needed to treat or number needed to screen because very low baseline overdose rates will make the absolute risk reduction close to the post-screening incidence. Despite low positive predictive values, our number needed to evaluate (149–602 in the Pennsylvania datasets; 281–1170 in the Arizona dataset varying by different risk thresholds) using GBM algorithms is similar to the number needed to screen for commonly used cancer screening tests, such as annual mammography screening to prevent one breast cancer death (number needed to screen of 233–1316 varying by subgroups with different underlying risks).25

Our machine learning model represents an advance on strategies that Medicaid programmes are using to predict overdose risk, allowing them to better target time sensitive or so-called just in time interventions. Our study showed how current rules (eg, Medicaid Core Set opioid measures) for identifying so-called high-risk patients might not accurately predict risk of overdose, compared with using different risk score thresholds (eg, top fifth percentile of risk scores) to identify high-risk patients. Although not perfect, our risk classification with 12 subgroups allows those implementing the algorithm to determine the risk threshold at which to intervene, based on costs and intensity of interventions and resource availability. Resource intensive and burdensome interventions (eg, pharmacy lock-in programmes) could be limited to the small number of individuals in the highest-risk subgroup. Lower cost, less burdensome, or less risky interventions (eg, naloxone distribution)26 could be targeted towards more individuals in moderate to high risk subgroups.27,28 Nonetheless, additional screening and assessment are needed to avoid unintended consequences resulting from false positives.

Our study has limitations. First, we could not capture prescriptions paid out of pocket or patients with only illicit opioid use. For example, only 40% of fatal overdoses in Arizona were captured using our cohort definition (ie, having one or more opioid fills in the study period). Second, claims data do not capture overdoses that do not receive medical attention. However, we applied a previously validated algorithm using ICD codes to identify opioid-overdose events reported in medical claims (positive predictive value of 81–84%).29 Our algorithm also did well in a sensitivity analysis using Arizona death certificates data to predict fatal overdose. Third, the Arizona dataset did not have prescriber information nor a separate variable for ethnicity, which could introduce misclassification biases. Fourth, we were unable to capture other potential predictors, including laboratory results in clinical data and sociobehavioural information that might improve the model. Fifth, our prediction algorithm might not be generalisable to other populations or states. Sixth, as expected, the positive predictive value from our model was low due to overdose being a rare outcome. Nonetheless, given the serious consequences of overdose, our risk-stratified approach appeared to be effective in both Pennsylvania and Arizona. Finally, although our algorithm has many potential uses, there are key implementation barriers to overcome before implementation (eg, infrastructure to automatically generate risk scores or algorithm’s usability and effectiveness due to data lags for claims data). Furthermore, the current algorithm included race and ethnicity and requires comprehensive bias evaluations to identify potential approaches for ensuring algorithm fairness to target interventions and provide health services equitably.

In conclusion, a machine-learning algorithm predicting opioid overdose derived from Pennsylvania Medicaid data performed well in external validation with data from more recent years and data from another state with different characteristics. The algorithm could be a valuable and feasible tool to predict and stratify risk of opioid overdose in Medicaid beneficiaries.

Supplementary Material

1

Research in context.

Evidence before this study

In previous work, we developed machine learning approaches to improve risk prediction and stratification for development of opioid use disorder and subsequent overdose in Medicare beneficiaries. We are not aware of studies applying these methods in a Medicaid population, and then externally validating them. We thus searched PubMed for research articles, with no language restrictions, published from database inception up to June 1, 2021, using the following search terms: “Analgesics, Opioid / therapeutic use* AND Algorithms*” AND “Risk Assessment / methods*”. After excluding eight studies that included narrow and specific patient populations (ie, opioid naive, paediatric, oncology, and anaesthesia patients), we identified two previous relevant studies that included Medicaid data for predicting the risk of opioid use disorder development within 1 or 5 years after initiating opioid prescriptions. Neither study examined overdose risk, nor did they include external validation.

Added value of this study

This study developed a machine learning algorithm capable of predicting 3-month risk of opioid overdose using Pennsylvania Medicaid claims data (2013–16) and validated it in two data sources: more recent years of Medicaid data from Pennsylvania (2017–18) and in claims data from Arizona’s Medicaid programme (2015–17). This work found that the algorithm was robust at predicting Medicaid beneficiaries’ 3-month risk of opioid overdose, without major changes or adaptations to the model, despite the difference in calendar years and the different characteristics of the US states. Our prediction algorithm has the potential to be applied to Medicaid populations in other states and could be useful to guide clinical decisions and target interventions based on the degree of individual’s risk.

Implications of all the available evidence

Machine-learning algorithms that more accurately predict patients’ risks for opioid overdose over short time intervals and that provide better risk stratification than currently used tools can be valuable tools in data-informed decisions regarding the allocation of interventions and resources. The model derived from using historical data in one large state was scalable to more recent data and data from another state and provides an opportunity to improve on existing Medicaid programme strategies for addressing opioid overdose risk.

Acknowledgments

This work was supported by the grant R01DA044985 from the NIH–NIDA, and grant R21 AG060308 from the NIH–NIA. The views presented here are those of the authors alone and do not necessarily represent the views of the Department of Veterans Affairs or the Pennsylvania Department of Human Services.

Footnotes

Declaration of interests

W-HL-C and WFG are named as inventors in one preliminary patent (U1195.70174US00) filing from the University of Florida and University of Pittsburgh for use of the machine learning algorithm for opioid risk prediction in Medicare described in this Article. W-HL-C, WFG, DLW, and C-YC are recipients of a grant from the National Institute on Aging (NIA; R21 AG060308). W-HL-C, WFG, JMD, AJG, GC, JLH, CCK, DCM, QY, JW, and HHZ are recipients of a grant from the US National Institute on Drug Abuse (NIDA; R01DA044985). W-HL-C declares grants from the Richard King Mellon Foundation–University of Pittsburgh, University of Florida Clinical and Translational Science Institute, the US National Institute of Mental Health (1R03MH114503-01 and R01MH121907), Pharmaceutical Research and Manufacturers of America Foundation, NIDA (1R01DA050676-01A1 and R01DA044985), Veterans Affairs (VA) Merit 1 (I01HX002191-01A2), and Merck, Sharp & Dohme and Bristol Myers Squibb. CKK declares grants from Lilly, Pfizer, GSK, Cumberland Pharmaceuticals, AbbVie, and EMD Serono; consultant fees from EMD Serono, Express Scripts, and Regeneron; an advisory board role with EMD Serono, Thusane, Regeneron, Taiwan Lipisome Company, Amzell, LG Chem, and Novartis; payment or honoraria from Focus Communications and PRIME Education; and being on the Data and Safety Monitoring Committee for Kolon Tissue Gene and on the board of directors for the International Chinese Osteoarthritis Research Society. WFG declares grant or contract to his institution from Richard King Mellon Foundation. JMD declares salary support from the Pennsylvania Department of Human Services, Richard King Mellon Foundation, and the US National Institute of Health (NIH)–NIDA (R01DA048019). AJG declares grants or contracts from NIH and the VA; royalties from UpToDate; and other financial and non-financial interests from American Society of Addiction Medicine, Association for Multidisciplinary Education and Research in Substance Use and Addiction, and the International Society of Addiction Journal Editors. JCW declares grants or contracts from Allegheny Health Network, Carnegie Mellon University, and University of Pittsburgh Medical Center; payment or honoraria from St Jude’s; and receipt of equipment, materials, or other services from Amazon Web Services and Azure. DLW declares grant funding from Merck, Sharp & Dohme, and NIH–NIDA (1R01DA050676-01A1). All other authors declare no competing interests.

Data sharing

National Drug Codes can be provided by request to the corresponding author. Modelling codes can be provided by request to the corresponding author under proper use agreement. Access to the Pennsylvania Medicaid data was made possible through an intergovernmental agreement between the University of Pittsburgh and the Pennsylvania Department of Human Services. Access to the Arizona Medicaid data was made possible through an inter-institutional data use agreement between the University of Florida and the Center for Health Information and Research at Arizona State University. Per the data use agreements, the relevant limited datasets used in this study contained some patient health information variables (eg, dates of services) and thus cannot be made publicly available. This study was approved by the University of Pittsburgh and University of Florida Institutional Review Boards (IRBs; human research ethics committees). Patient consent was waived for this study due to the use of existing secondary data sources per the IRBs’ policies.

Contributor Information

Wei-Hsuan Lo-Ciganic, Department of Pharmaceutical Outcomes and Policy, College of Pharmacy, University of Florida, Gainesville, FL, USA; Center for Drug Evaluation and Safety, College of Pharmacy, University of Florida, Gainesville, FL, USA.

Julie M Donohue, Department of Health Policy and Management, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA; Center for Pharmaceutical Policy and Prescribing, Health Policy Institute, University of Pittsburgh, Pittsburgh, PA, USA.

Qingnan Yang, Center for Pharmaceutical Policy and Prescribing, Health Policy Institute, University of Pittsburgh, Pittsburgh, PA, USA.

James L Huang, Department of Pharmaceutical Outcomes and Policy, College of Pharmacy, University of Florida, Gainesville, FL, USA.

Ching-Yuan Chang, Department of Pharmaceutical Outcomes and Policy, College of Pharmacy, University of Florida, Gainesville, FL, USA.

Jeremy C Weiss, Heinz College of Information Systems and Public Policy, Carnegie Mellon University, Pittsburgh, PA, USA.

Jingchuan Guo, Department of Pharmaceutical Outcomes and Policy, College of Pharmacy, University of Florida, Gainesville, FL, USA; Center for Drug Evaluation and Safety, College of Pharmacy, University of Florida, Gainesville, FL, USA; Center for Pharmaceutical Policy and Prescribing, Health Policy Institute, University of Pittsburgh, Pittsburgh, PA, USA.

Hao H Zhang, Department of Mathematics, University of Arizona, Tucson, AZ, USA.

Gerald Cochran, Program for Addiction Research, Clinical Care, Knowledge, and Advocacy, Division of Epidemiology, Department of Internal Medicine, University of Utah, Salt Lake City, UT, USA.

Adam J Gordon, Program for Addiction Research, Clinical Care, Knowledge, and Advocacy, Division of Epidemiology, Department of Internal Medicine, University of Utah, Salt Lake City, UT, USA; Informatics, Decision-Enhancement, and Analytic Sciences Center, Veterans Affairs Salt Lake City Health Care System, Salt Lake City, UT, USA.

Daniel C Malone, Department of Pharmacotherapy, College of Pharmacy, University of Utah, Salt Lake City, UT, USA.

Chian K Kwoh, Division of Rheumatology, Department of Medicine, and the University of Arizona Arthritis Center, University of Arizona, Tucson, AZ, USA.

Debbie L Wilson, Department of Pharmaceutical Outcomes and Policy, College of Pharmacy, University of Florida, Gainesville, FL, USA.

Courtney C Kuza, Center for Pharmaceutical Policy and Prescribing, Health Policy Institute, University of Pittsburgh, Pittsburgh, PA, USA.

Walid F Gellad, Center for Pharmaceutical Policy and Prescribing, Health Policy Institute, University of Pittsburgh, Pittsburgh, PA, USA; Division of General Internal Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA; Center for Health Equity Research Promotion, Veterans Affairs Pittsburgh Healthcare System, Pittsburgh, PA, USA.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES