Abstract
Purpose:
Identifying predictors of opioid overdose following release from prison is critical for opioid overdose prevention.
Methods:
We leveraged an individually linked, state-wide database from 2015–2020 to predict the risk of opioid overdose within 90 days of release from Massachusetts state prisons. We developed two decision tree modeling schemes: a model fit on all individuals with a single weight for those that experienced an opioid overdose and models stratified by race/ethnicity. We compared the performance of each model using several performance measures and identified factors that were most predictive of opioid overdose within racial/ethnic groups and across models.
Results:
We found that out of 44,246 prison releases in Massachusetts between 2015–2020, 2,237 (5.1%) resulted in opioid overdose in the 90 days following release. The performance of the two predictive models varied. The single weight model had high sensitivity (79%) and low specificity (56%) for predicting opioid overdose and was more sensitive for White non-Hispanic individuals (sensitivity = 84%) than for racial/ethnic minority individuals.
Conclusions:
Stratified models had better balanced performance metrics for both White non-Hispanic and racial/ethnic minority groups and identified different predictors of overdose between racial/ethnic groups. Across racial/ethnic groups and models, involuntary commitment (involuntary treatment for alcohol/substance use disorder) was an important predictor of opioid overdose.
Keywords: Opioid overdose, incarceration, machine learning, algorithmic bias, substance use, decision trees
INTRODUCTION
In 2021, there were more than 100,000 overdose deaths in the US, many of which involved synthetic opioids such as illicitly manufactured fentanyl and fentanyl analogs (1). In the US, at least half of people incarcerated at any given time meet diagnostic criteria for a substance use disorder (2) and more than 20% have opioid use disorder (OUD) (3). Additionally, at least 20% of people with OUD have been involved in the criminal legal system (4). After release from incarceration, people are at particularly high risk for opioid overdose due to a complex combination of reduced tolerance, stress and anxiety, lack of social support, and struggling to meet basic needs such as housing (5,6).
There are large racial disparities in incarceration. Black non-Hispanic individuals are twice as likely to be arrested for drug-related violations than White non-Hispanic individuals, despite similar patterns of drug use (7). However, In Massachusetts jails between 2015–2020, the estimated rate of overdose deaths per 100,000 people was highest among White individuals, when compared to Black and Latinx individuals (8).
People who have been incarcerated have up to 40 times the risk of an opioid overdose death at two-weeks post-release compared to the non-incarcerated population, with this risk remaining high for several months (9). Carceral systems rarely screen all individuals for OUD or provide them with medications for opioid use disorder (MOUD) (10), leaving people who return to substance use post-release more vulnerable to overdose during community re-entry. In addition, several structural and social factors increase an individual’s risk of opioid overdose post-release, such as living in a low socioeconomic neighborhood (11,12).
It is critical to identify individuals most at risk of overdose, but the lack of available administrative data or data linkage infrastructure makes identifying these individuals difficult (13). Some demographic and incarceration-specific predictors of overdose following release from incarceration have been proposed in prior literature, such as age, sex, race, length of incarceration, security level, and type of conviction (12). Machine learning methods can be useful for identifying predictors of opioid overdose in this population. While machine learning methods have been used for predicting the risk of opioid overdose with electronic health records (14–16), these analyses have not been extended to incarcerated individuals. Decision trees are a form of supervised machine learning model that are more flexible than traditional statistical methods, such as logistic regression, but still maintain interpretability (17). Our objective was to use decision trees to identify individual, social, and structural factors related to opioid overdose in the 90 days following release from Massachusetts prisons for the population overall and stratified by race/ethnicity, as criminal legal involvement may differentially affect the risk of opioid overdose by race/ethnicity. Additionally, we aimed to evaluate the predictive performance of these models.
METHODS
Overview
We used a unique, individually linkable data warehouse maintained by the Massachusetts Department of Public Health to predict fatal and non-fatal opioid overdoses in the 90 days following release from state prisons from 2015–2020. We trained decision tree models on the entire cohort and stratified by race/ethnicity and evaluated their predictive performance. We also identified individual, social, and structural factors that were most influential in predicting opioid overdose. We ran sensitivity analyses to explore the effects of different case weighting schemes and different methods of handling missingness.
Data Source
We used data from the Massachusetts Department of Public Health’s Public Health Data Warehouse (PHD), which combines datasets from multiple governmental agencies throughout Massachusetts (18) (Supplemental Figure S1). In the PHD, each individual has a unique identifier derived by matching identifiers in each dataset to those in the Massachusetts All Payer Claims Database (APCD), which allows individuals to be linked between any datasets. Descriptions of the datasets and variables in the PHD are provided on the Massachusetts Department of Public Health’s website (19).
For this analysis, we leveraged the following datasets: Department of Corrections (DOC) for prison data; Acute Care Hospital Case Mix (Case Mix) for acute care hospitalization release records related to opioid overdose; Massachusetts Ambulance Trip Record Information System (MATRIS) for opioid overdose-related ambulance trips; Death Certificate data from the Registry of Vital Records and Statistics for opioid overdose deaths; and the American Community Survey (ACS) for zip-code level variables.
For the DOC, data come from all 14 state prisons, the total population of which decreased during the study period from 10,335 individuals to 6,660 (20). While prisons typically hold individuals sentenced for a year or longer, in Massachusetts, women who are sentenced to jail are often placed in prisons due to the limited availability of women’s jails and may be incarcerated for shorter periods or for lesser convictions than their male counterparts. Importantly, MOUD was not consistently offered across all prisons during the study period.
Cohort inclusion
We included adults who were 18 years of age and older who were released from a state prison to the community from January 1, 2015, through September 30, 2020. We began our analysis in 2015, as fentanyl had largely saturated the illicit drug market in Massachusetts at that time (21). We excluded anyone released after September 30, 2020, to allow for the 90 day post-release follow-up period. Each incarceration event was characterized using release date. When individuals had multiple incarceration events, we treated each incarceration as a separate and independent observation.
Primary Outcome
Our primary outcome was a fatal or non-fatal opioid overdose (hereafter referred to as “opioid overdose”) within 90 days of release from prison. Fatal and non-fatal overdoses were combined into a single outcome variable because we do not believe that there is a clinical difference in predictors of fatal and non-fatal overdose, but rather fatality depends on the speed of response. Opioid overdoses were identified using Case Mix, MATRIS, and death records. In Case Mix, opioid overdoses were defined from emergency department, outpatient observation, and inpatient hospitalization releases using relevant International Classification of Diseases 9 and 10 (ICD-9/10) diagnosis codes. In MATRIS, opioid overdoses were defined using multiple criteria to distinguish acute overdoses from other opioid-related Emergency Medical Service events using an algorithm developed by the Massachusetts Department of Public Health (22). In death records, overdoses were defined using ICD-9/10 diagnosis codes for mortality selected from the underlying, contributing, and literal cause of death fields to identify poisonings/overdoses. Relevant ICD-9/10 codes are listed in the supplemental material, and the algorithm used by the Department of Public to define overdose in the PHD is available in the online documentation for the PHD (23).
Predictor Variables
Predictors were determined a priori and were based on preliminary work that identified factors associated with opioid overdose post-incarceration (12). Individual demographic information included age at release, race/ethnicity, and sex. Race/ethnicity in the PHD is reported as White non-Hispanic, Black non-Hispanic, Asian/Pacific Islander non-Hispanic, Hispanic, and another non-Hispanic race. Individuals who identify as Hispanic and another race are identified only as Hispanic in the PHD. For any individuals missing race/ethnicity in the DOC dataset, we used the race/ethnicity reported from Case Mix, MATRIS, or death records. For individuals further missing race/ethnicity, we used a composite race variable constructed from a tiered system of available data sources within the PHD (18). In our analysis, we use race/ethnicity as indicators of sociopolitical realities and histories, not as indicators of biological differences (24,25).
Information regarding one’s incarceration included recorded serious mental illness, participation in Massachusetts’ reentry initiative to provide MOUD in prison, involuntary commitment (incarcerated for treatment), custody type (held in DOC or in a non-DOC facility), governing offense (offense with the longest sentence), security level, and violent crime. While we could not calculate the total amount of time spent in prison for each incarceration, we measured the amount of time spent at the most recent facility prior to release. We also developed a variable to count the number of times an individual appeared in the DOC data.
As structural factors including ones’ neighborhood at release influence risk of opioid overdose, we included the following ACS data at release zip code: the percent of the population below poverty, the percent that is non-White, the percent that is over age 25 with less than a ninth-grade education, the Index of Concentration of the Extremes for White high-income vs. Black low-income, and the Index of Concentration of the Extremes for White non-Hispanic high income vs. Black non-Hispanic, Asian/Pacific Islander non-Hispanic, Hispanic, and another non-Hispanic race low income. Further details of the variables are summarized in Supplemental Table S1.
Statistical analysis
We developed decision tree models for predictive modeling. We supplied the models with predictor variables (Supplemental Table S1) to predict opioid overdose within 90 days of release from prison. Decision trees were trained with a top-down approach that identified a binary rule at each step that best split the dataset into “opioid overdose” and “no opioid overdose” groups. Splits were identified using all variables, and the variable with the lowest entropy value (i.e., best separation of opioid overdose and no opioid overdose observations) was selected at each step. All variables were only used in one split per branch, and missing values were assigned the most popular category, for categorical variables, or median, for continuous variables, of that variable when it was split. Pruning was performed via cost-complexity analysis using 10-fold cross-validation. Cost-complexity pruning is described in detail in the PROC HSPLIT documentation (26).
To account for the heavy imbalance in the outcome, we used case weighting to increase the penalty for misclassifying an opioid overdose relative to misclassifying no opioid overdose. We compared two modeling approaches:
Model I: A model fit on all individuals, with opioid overdoses weighted inversely proportionate to the prevalence of opioid overdose within the entire cohort, and
Model II: Stratified models trained on the data stratified by race/ethnicity, with opioid overdoses weighted inversely proportionate to the prevalence of opioid overdose by race/ethnicity.
We compared the model performance using sensitivity, specificity, misclassification rate, Brier score, and area under the receiver operator curve (AUC). We identified variables that were important for predicting opioid overdose in each model using the residual sum of squares (RSS) (26). The relative importance of a variable was calculated by dividing the RSS for each variable by the maximum RSS. All analyses were performed using SAS Studio version 3.81 (Enterprise Edition) and the HPSPLIT procedure was used for model fitting.
Sensitivity analyses
We evaluated various weighting assignments for opioid overdose case weightings. A number of variables exhibited missing data, particularly governing offense (64.8%) and violent crime (64.8%). We explored the impact of different strategies for handling missingness, including complete case analysis and alternative methods for filling in missing values. Additionally, we fit sex-stratified models to accommodate differences between the male and female prison populations, as women are often placed in prison rather than jail, due to a lack of women’s jails in Massachusetts. Finally, we estimated the generalizability of model performance using training-testing splits of complete and stratified data. See Supplementary Material Table S2–S4 for complete descriptions and results of sensitivity analyses.
We also ran logistic regression analyses to further quantify the association of predictors with opioid overdose, improve interpretability, and to compare the performance of the logistic regression model against the decision tree models. We used backwards selection to eliminate variables with the highest p-values until all variables in the model had a p-value of less than 0.1. The observations are weighted the same as Model I to account for imbalance in the data and for consistency with Model I. We report odds ratios, 95% confidence intervals, and p-values for these variables. We further used the logistic regression model to predict opioid overdose in the same cohort – observations with predicted probabilities greater than or equal to 0.5 were predicted to experience opioid overdose. These predictions were used to calculate the sensitivity, specificity, misclassification rate, and Brier score to compare to the metrics of the decision tree models.
RESULTS
From 2015–2020, there were 44,246 releases from prison in Massachusetts. Of those releases, 2,237 (5.1%) experienced an opioid overdose within 90 days of their release. White non-Hispanic individuals made up the largest proportion of the sample (63.6%) and the majority of opioid overdoses (Table 1).
Table 1.
Characteristics of releases from prisons in Massachusetts, 2015–2020. Note: Summaries are provided for all records (“Total”), those with an opioid overdose within 90 days of release (“OD”), and those with no observed opioid overdose within 90 days of release (“no OD”). Frequencies below ten and complimentary cells are suppressed per the PHD External User Manual. DOC = Department of Corrections
| Variable | Category | No OD | OD | Total | |||
|---|---|---|---|---|---|---|---|
| n | % | n | % | n | % | ||
| Total | 42009 | 94.9% | 2237 | 5.1% | 44246 | 100.0% | |
| Sex | Male | 27813 | 66.2% | 1326 | 59.3% | 29139 | 65.9% |
| Female | 14196 | 33.8% | 911 | 40.7% | 15107 | 34.1% | |
| Race | White non-Hispanic | 26314 | 62.6% | 1838 | 82.1% | 28152 | 63.6% |
| Black non-Hispanic | 7749 | 18.4% | 140 | 6.3% | 7889 | 17.8% | |
| Asian/Pacific Islander non-Hispanic | 304 | 0.7% | * | * | * | * | |
| Hispanic | 6046 | 14.4% | 198 | 8.9% | 6244 | 14.1% | |
| American Indian/other non-Hispanic | 1596 | 3.8% | * | * | * | * | |
| Serious mental illness | No | 29058 | 69.2% | 1702 | 76.1% | 30760 | 69.5% |
| Yes | 4976 | 11.8% | 223 | 10.0% | 5199 | 11.8% | |
| Missing | 7975 | 19.0% | 312 | 13.9% | 8287 | 18.7% | |
| Participation in medication assisted treatment reentry initiative | No | 41325 | 98.4% | 2214 | 98.9% | 43538 | 98.4% |
| Yes | 684 | 1.6% | 24 | 1.1% | 708 | 1.6% | |
| Involuntary commitment | No | 32548 | 77.5% | 1258 | 56.2% | 33806 | 76.4% |
| Yes | 9461 | 22.5% | 979 | 43.8% | 10440 | 23.6% | |
| Custody type | Inmate housed in MA DOC facility | 39057 | 93.0% | 2036 | 91.0% | 41093 | 92.9% |
| Inmate housed in county/federal/interstate facility outside of MA DOC | 2952 | 7.0% | 201 | 9.0% | 3153 | 7.1% | |
| Inmate type | Pretrial | 12760 | 30.4% | 677 | 30.3% | 13437 | 30.4% |
| Criminal | 15093 | 35.9% | 476 | 21.3% | 15569 | 35.2% | |
| Civil | 14156 | 33.7% | 1084 | 48.5% | 15240 | 34.4% | |
| Governing offense | Property | 2408 | 5.7% | 126 | 5.6% | 2534 | 5.7% |
| Drug | 3481 | 8.3% | 104 | 4.6% | 3585 | 8.1% | |
| Person | 5792 | 13.8% | 162 | 7.2% | 5954 | 13.5% | |
| Sex | 1024 | 2.4% | 11 | 0.5% | 1035 | 2.3% | |
| Other** | 2388 | 5.7% | 73 | 3.3% | 2461 | 5.6% | |
| Missing | 26916 | 64.1% | 1761 | 78.7% | 28677 | 64.8% | |
| Security level | Pre-release | 1698 | 4.0% | 23 | 1.0% | 1721 | 3.9% |
| Minimum | 11438 | 27.2% | 937 | 41.9% | 12375 | 28.0% | |
| Medium | 25531 | 60.8% | 1132 | 50.6% | 26663 | 60.3% | |
| Maximum | 2493 | 5.9% | 71 | 3.2% | 2564 | 5.8% | |
| Massachusetts Probation Service’s Electronic Monitoring Program | 70 | 0.2% | * | * | * | * | |
| Stony Brook Stabilization and Treatment Center | 779 | 1.9% | * | * | * | * | |
| Violent crime | No | 8277 | 19.7% | 303 | 13.5% | 8580 | 19.4% |
| Yes | 6816 | 16.2% | 173 | 7.7% | 6989 | 15.8% | |
| Missing | 26916 | 64.1% | 1761 | 78.7% | 28677 | 64.8% | |
| Mean | SD | Mean | SD | Mean | SD | ||
| Age at release (years) | 37.18 | 11.63 | 33.48 | 9.41 | 37 | 11.55 | |
| Missing | Missing: | 74 | Missing: | 0 | Missing: | 74 | |
| Time spent at the most recent facility prior to release (days) | 447.32 | 1139 | 121.49 | 358.56 | 430.83 | 1115 | |
| Missing | Missing: | 23 | Missing: | 0 | Missing: | 23 | |
| Number of prison stays prior to and including current episode | 2.32 | 0.29 | 2.97 | 2.71 | 2.35 | 2.32 | |
| Missing | Missing: | 0 | Missing: | 0 | Missing: | 0 | |
| Percent of population below poverty in census zip code of release | 10.6 | 6.03 | 10.01 | 6.01 | 10.6 | 6.03 | |
| Missing | Missing: | 5751 | Missing: | 223 | Missing: | 6974 | |
| Percent of population that is non-white in census zip code of release | 37.29 | 23.9 | 35.4 | 24.81 | 37.19 | 23.95 | |
| Missing | Missing: | 5751 | Missing: | 223 | Missing: | 6974 | |
| Percent of population over 25 years old with less than a 9th grade education in census zip code of release | 6.74 | 4.93 | 6.59 | 5.05 | 6.73 | 4.94 | |
| Missing | Missing: | 5751 | Missing: | 223 | Missing: | 6974 | |
| Index of Concentration of the Extremes for White high income vs. Black low income at release zip code | 0.25 | 0.2 | 0.27 | 0.19 | 0.25 | 0.2 | |
| Missing | Missing: | 5772 | Missing: | 223 | Missing: | 5995 | |
| Index of Concentration of the Extremes for White non-Hispanic vs. people of color low income at release zip code | 0.16 | 0.27 | 0.19 | 0.25 | 0.16 | 0.27 | |
| Missing | Missing: | 5772 | Missing: | 223 | Missing: | 5995 | |
Other governing offenses include: obstruction of justice, habitual criminal, prostitution, and some weapon possession charges.
Model Performance
We found that the overall sensitivity for predicting opioid overdose for Model I was relatively high (77.4%), while specificity was poor (56.3%), and the AUC (71.5%) and misclassification rates (32.8%) were moderate (Figure 1). The performance of this model varied widely by race/ethnicity (Table 2). For example, the White non-Hispanic population had a high sensitivity (81.9%), low specificity (42.9%), and high misclassification rate (56.3%). Conversely, the Black non-Hispanic population had a low sensitivity (42.9%), high specificity (88.6%), and low misclassification rate (12.2%). These metrics indicate that the model is overpredicting opioid overdoses in the White non-Hispanic population and underpredicting in the Black non-Hispanic population. See Table 2 for metrics for other races/ethnicities.
Figure 1.

Representation of the decision tree model produced by Model I. Variables in the model at the top of the tree are considered to be the most important or most influential. Boxes that are orange indicate predicted overdose and those that are blue indicate no predicted overdose.
Table 2.
Model based error for Model I. Note: For Model I, we report the sensitivity, specificity, misclassification rate, and Brier score across the entire dataset and all subgroups. For the model evaluated on the entire dataset, we also include the AUC.
| Race | N | Sensitivity | Specificity | Misclassification rate | Brier score | AUC |
|---|---|---|---|---|---|---|
| All | 44,246 | 0.77 | 0.56 | 0.33 | 0.23 | 0.71 |
| White non-Hispanic | 28,152 | 0.82 | 0.41 | 0.56 | 0.29 | - |
| Black non-Hispanic | 7,889 | 0.43 | 0.89 | 0.12 | 0.11 | - |
| Asian/Pacific Islander non-Hispanic | * | 1.00 | 0.81 | 0.19 | 0.13 | - |
| Hispanic | 6,244 | 0.63 | 0.77 | 0.24 | 0.14 | - |
| American Indian/other non-Hispanic | * | 0.66 | 0.69 | 0.31 | 0.19 | - |
The stratified Model II, however, demonstrated improved performance across racial/ethnic groups (Table 3; Figures 2a–2e). Compared to Model I, for the White non-Hispanic population, the sensitivity decreased to 70.6%, specificity increased to 55.4%, and misclassification rate decreased to 36.8%. For the Black non-Hispanic population, sensitivity increased to 62.4%, specificity decreased to 76.0%, and misclassification rate increased to 31.0%. Overall, the stratified models exhibited a better sensitivity-specificity balance. Each of the race/ethnicity stratified models also exhibited adequate AUC (0.66–0.92) (Table 3).
Table 3.
Model based error for Model II. Note: For Model II, we report the sensitivity, specificity, misclassification rate, Brier score, and AUC across each racial/ethnic population.
| Race | N | Sensitivity | Specificity | Misclassification rate | Brier score | AUC |
|---|---|---|---|---|---|---|
| White non-Hispanic | 28,152 | 0.71 | 0.55 | 0.37 | 0.24 | 0.66 |
| Black non-Hispanic | 7,889 | 0.62 | 0.76 | 0.31 | 0.21 | 0.71 |
| Asian/Pacific Islander non-Hispanic | * | 1.00 | 0.85 | 0.08 | 0.11 | 0.92 |
| Hispanic | 6,244 | 0.82 | 0.59 | 0.29 | 0.22 | 0.71 |
| American Indian/other non-Hispanic | * | 0.74 | 0.69 | 0.29 | 0.21 | 0.72 |
AUC = Area under the receiver operating characteristic curve
Figure 2.


Representation of the decision tree models produced from Model II, which are stratified by race: (a) White non-Hispanic, (b) Black non-Hispanic, (c) Asian/Pacific Islander non-Hispanic, (d) Hispanic, and (e) other non-Hispanic. Variables in the model at the top of the tree are considered to be the most important or most influential. Boxes that are orange indicate predicted overdose and those that are blue indicate no predicted overdose.
Variables Influential in Predicting Opioid Overdose
For Model I, the variables that contributed the most to predicting opioid overdose included time spent at the most recent facility prior to release, involuntary commitment, age at release, sex, number of prison incarcerations prior to and including the current episode, and race/ethnicity (Table 4).
Table 4.
Relative importance of variables in models. Note: For each variable, the change in RSS is divided by the maximum change in RSS resulting in values between 0 and 1 (1 being most important; importance rank within model provided in parenthesis). The model fit on all individuals with a single weight (Model I) is used for the values for the “All” column, and the racestratified models (Model II) are used for the race/ethnicity-specific columns. Importance is scaled within column, such that the variable deemed most important for a given model has a value of 1. Values cannot be compared directly across columns, but variable importance rankings may be compared.
| Category | Variable | All | White non-Hispanic | Black non-Hispanic | Asian/Pacific Islander non-Hispanic | Hispanic | American Indian/other non-Hispanic |
|---|---|---|---|---|---|---|---|
| Individual factors | Race/ethnicity | 0.34 (6) | - | - | - | - | - |
| Sex | 0.57 (4) | 0.61 (3) | - | - | - | - | |
| Age at release from prison | 0.58 (3) | 1.00 (1) | - | - | - | - | |
| Incarceration-related factors | Involuntary commitment | 0.59 (2) | 0.70 (2) | - | 1.00 (1) | - | - |
| Time spent at the most recent facility prior to release (days) | 1.00 (1) | 0.49 (4) | 1.00 (1) | - | 1.00 (1) | - | |
| Number of prison incarcerations prior to and including current episode | 0.40 (5) | - | - | - | - | 1.00 (1) | |
| Security level | - | - | 0.63 (2) | - | 0.47 (2) | 0.90 (2) | |
| Structural factors | Percent of population below poverty in census zip code of release | - | 0.35 (5) | - | - | - | - |
| Percent of population that is non-white in census zip code of release | - | - | 0.48 (4) | - | - | - | |
| Percent of population over 25 years old with less than 9th grade education in census zip code of release | - | - | 0.59 (3) | - | - | - |
When we stratified the analysis by race/ethnicity for Model II, we found additional variables were selected into the models (Table 4; Figures 2a–2e). We observed substantial overlap between variables in Model I and the model trained in the White non-Hispanic population. For White non-Hispanic individuals, the percent of the population living below the federal poverty level at the release zip code was influential in opioid overdoses. For Black non-Hispanic individuals, the percentage of the population over the age of 25 with less than a ninthgrade education and percentage that is non White at the release zip code were influential. The time spent at the most recent facility prior to release and involuntary commitment remained important across multiple racial/ethnic strata.
Sensitivity analyses
Some variables included in the models (e.g., violent crime, governing offense), exhibited substantial missingness. We found that using only complete cases greatly decreased the size of the data (from 44,246 observations to 12,668 observations) and led to poor-fitting models. We tested different strategies to impute missing values and found that filling in missing values with the most common value of the variable at each split yielded stability in training (Supplementary Table S3). Sex stratified models did not exhibit greatly increased performance when compared to Model I.
Significant predictor variables from the logistic regression analysis are summarized in Supplemental Table S6. For example, compared to people without involuntary commitment, those who were committed had 3.2 times the odds of an opioid overdose in the 90 days following release (95% CI: 3.0, 3.4). Race/ethnicity was also significant, with White non-Hispanic individuals having the highest odds of opioid overdose. Compared to Model I, the sensitivity of the logistic regression model was lower at 69.0% (compared to 77.4%), the specificity was higher at 62.6% (compared to 56.3%), and the misclassification rate was higher at 37.1% (compared to 32.8%). Based on these metrics, the logistic regression model appears to be comparable to Model I, but further underestimates opioid overdoses.
DISCUSSION
We used decision tree models to predict opioid overdose within 90 days of release from prison in Massachusetts. We found that more than 5% of release events from prison have a resulting opioid overdose in the 90 days following release, meaning that 1 in 20 release events has an associated opioid overdose from 2015–2020. The performance of the overall model varied substantially by race/ethnicity. We also identified a variety of important factors for predicting opioid overdose. For example, the length of time spent at the facility prior to release had not previously been identified.
Using machine learning methods with administrative data, such as criminal legal data, can result in biased outcomes. Our dataset was predominantly White non-Hispanic and included very few opioid overdose events. From the modeling perspective, this led Model I to naively predict that when opioid overdoses occur, they primarily occur among White non-Hispanic individuals. This is, of course, untrue as rates of incarceration are substantially higher among Black non-Hispanic individuals compared to White non-Hispanic individuals, and opioid overdose rates in Black non-Hispanic populations have also surpassed those of White non-Hispanic people in recent years in both Massachusetts and across the US (27).
We accounted for imbalances in opioid overdose prevalence by upweighting opioid overdose observations based on overall and race/ethnicity stratified prevalence. We found that an overall model (Model I) was successful at identifying opioid overdose among the White non-Hispanic population but exhibited poor model sensitivity for other racial/ethnic groups. Stratified models (Model II) improved sensitivity across racial/ethnic groups.
The sensitivity of Model I was driven by strong sensitivity among the White non-Hispanic population, and notably, sensitivity for the Black non-Hispanic population for this model is poor. Therefore, any potential policy conclusions drawn from Model I are likely to be racially/ethnically biased. These results mirror prior findings in machine learning and predictive modeling for criminal legal and other fields, where datasets and the models trained on them reflect embedded structural racism (28,29).
Involuntary commitment was associated with a higher risk of opioid overdose following incarceration across the cohort. Involuntary commitment is a legal mechanism, often initiated by family members or healthcare professionals, that forces individuals with substance use disorders into treatment (30). While other research on this topic has focused on the ethical principles or qualitative aspects, our analysis is among the first to quantify the harms that are associated with involuntary commitment regarding opioid overdose. Future work should explore this association further and should explore how predictive variables, such as involuntary commitment, impact both opioid use post-release and factors known to reduce opioid overdose, such as initiation and retention on MOUD.
We also found that time spent at the most recent facility prior to release was related to risk of opioid overdose, with shorter stays being associated with a higher risk of opioid overdose. This variable may act as a proxy for prison transfer or movement of facilities during one incarceration stay. We hypothesize that movement between institutions may break the routine of people with OUD, including medication routines, which increases their opioid overdose risk. Being released from a minimum-security facility was also associated with a higher risk of opioid overdose. Minimum security institutions may be less well-equipped to identify individuals with OUD, provide medications, and provide comprehensive medical discharge that links individuals to services. It is critical that such facilities have appropriate services so that individuals with OUD do not need to be housed in higher security facilities. This aligns with previous work in which individuals reported that being in a stable environment during and following incarceration and being in a facility that is responsive to the needs of people with OUD may be protective against opioid overdose (12).
Our study had several limitations. We are unable to account for individuals who overdosed but did not encounter the medical system, so we underestimate the total number of opioid overdoses suffered within the cohort. While opioid overdose deaths are recorded accurately, the proportion of non-fatal overdoses that are undetected is an area of active work by the Massachusetts Department of Public Health. The Massachusetts Department of Public Health estimates that up to 50% of patients refuse emergency medical service transport after overdose reversals (31). Additionally, we only had access to Massachusetts prison data between 2015–2020 and were unable to determine if individuals were incarcerated either prior to this time period or outside of Massachusetts. Future studies could model longitudinal records of incarceration to assess the impact of full incarceration history. We did not have access to more granular data on types of crimes and could not draw further associations between type of crime and opioid overdose. History of opioid use was not available in the datasets that we accessed, and further work is being done to identify prior opioid use using the PHD. While decision trees are prone to overfitting and not able to learn highly complex associations, We observed notable missingness in certain variables. These variables did not affect the performance and interpretation of the models, but this may be due to the missingness itself, and better data collection or imputation/hotfix methods may be required. we were limited by data access requirements that constrained the availability of more sophisticated machine learning methods. Future work should use more sophisticated machine learning methods, such as random forests, to account for the imbalanced data. We were unable to obtain records from persons incarcerated in jails, which likely represent a different population as they house individuals pre-trial and with short sentence lengths, and future work should assess risk in this population.
We have not tested our model in other states, largely because Massachusetts is one of a few states with such a unique and comprehensive dataset. Future work should apply these models in other states as they develop linked data warehouses. Our findings add to the growing evidence base that involuntary commitment (detaining and/or incarcerating someone for treatment of alcohol or substance use disorder) is more harmful than helpful. As states grapple with ways to abate opioid overdoses, such punitive or forced methods are not an effective way to do so.
In conclusion, our study developed race/ethnicity-stratified prediction models for opioid overdose following release from prison and identified system-level predictors of overdose. We also found substantial bias in the prediction results when models were not stratified by race/ethnicity. As machine learning methods are increasingly applied to imperfect and incomplete real-world epidemiologic data, our analysis provides a path toward reducing bias. Our findings also provide insight into where and how public health and criminal legal data systems can be improved when applying machine learning methods. .
Supplementary Material
ACKNOWLEDGEMENTS
We acknowledge the Massachusetts Department of Public Health for creating the unique, cross-sector database used for this project and for providing technical support for the analysis.
PRIMARY FUNDING
This work was supported by the National Institutes of Health: The National Institute on Drug Abuse [DP2DA051864 to J.A.B, P.P., K.Y., S.N., P.A.S., L.B.R. K01DA051684 to J.A.B] and the National Institute of General Medical Sciences [R35GM141821 to L.F.W., P.P., Y.Z.]. The National Institute on Drug Abuse and the National Institute of General Medical Sciences had no direct role in the design, conduct, and analysis of the study or in the decision to submit the manuscript for publication.
Joshua A. Barocas reports financial support was provided by National Institutes of Health. Prasad Patil, Kristina Yamkovoy, Samantha K. Nall, Pallavi Aytha Swathi, Lauren Brinkley-Rubinstein reports financial support was provided by National Institutes of Health. Laura F. White, Prasad Patil, Yanjia Zhang reports financial support was provided by National Institute of General Medical Sciences. If there are other authors, they declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Abbreviations
- ACS
American Community Survey
- APCD
All Payer Claims Database
- AUC
area under the receiver operator curve
- DOC
Department of Corrections
- ICD
International Classification of Diseases
- MATRIS
Massachusetts Ambulance Trip Record Information System
- MOUD
medications for opioid use disorder
- OD
overdose
- OUD
opioid use disorder
- PHD
Public Health Data Warehouse
- RSS
relative sum of squares
- US
United States
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Declarations of competing interest:
None to declare.
REFERENCES
- 1.Spencer M, Miniño A, Warner M. Drug Overdose Deaths in the United States, 2001–2021. 2023; [PubMed]
- 2.Bronson J, Stroop J, Statisticians B, et al. Drug Use, Dependence, and Abuse Among State Prisoners and Jail. 2007;
- 3.Fazel S, Bains P, Doll H. Substance abuse and dependence in prisoners: a systematic review. Addiction (Abingdon, England). 2006;101(2):181–191. [DOI] [PubMed] [Google Scholar]
- 4.Winkelman TNA, Chang VW, Binswanger IA. Health, Polysubstance Use, and Criminal Justice Involvement Among Adults With Varying Levels of Opioid Use. JAMA Netw Open [electronic article]. 2018;1(3):e180558–e180558. (https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2687053). (Accessed January 18, 2023) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Waddell EN, Baker R, Hartung DM, et al. Reducing overdose after release from incarceration (ROAR): study protocol for an intervention to reduce risk of fatal and nonfatal opioid overdose among women after release from prison. Health Justice [electronic article]. 2020;8(1). (https://pubmed.ncbi.nlm.nih.gov/32651887/). (Accessed January 18, 2023) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Joudrey PJ, Khan MR, Wang EA, et al. A conceptual model for understanding post-release opioid-related overdose risk. Addiction science & clinical practice [electronic article]. 2019;14(1):17. ( 10.1186/s13722-019-0145-5) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Horowitz J, Wertheimer J. Drug Arrests Stayed High Even as Imprisonment Fell From 2009 to 2019 | The Pew Charitable Trusts. 2022;(https://www.pewtrusts.org/en/research-andanalysis/issue-briefs/2022/02/drug-arrests-stayed-high-even-as-imprisonment-fell-from-2009-to-2019). (Accessed January 18, 2023) [Google Scholar]
- 8.Vera Institute. (https://www.vera.org/publications/overdose-deaths-and-jailincarceration/ma). (Accessed March 24, 2024)
- 9.Ranapurwala SI, Shanahan ME, Alexandridis AA, et al. Opioid overdose mortality among former North Carolina inmates: 2000–2015. Am J Public Health. 2018;108(9):1207–1213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Scott CK, Dennis ML, Grella CE, et al. The impact of the opioid crisis on U.S. state prison systems. Health Justice. 2021;9(1):1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Joudrey PJ, Khan MR, Wang EA, et al. A conceptual model for understanding post-release opioid-related overdose risk. Addiction science & clinical practice. 2019;14(1):17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Flam-Ross JM, Lown J, Patil P, et al. Factors associated with opioid-involved overdose among previously incarcerated people in the U.S.: A community engaged narrative review. Int J Drug Policy [electronic article]. 2022;100. (https://pubmed.ncbi.nlm.nih.gov/34896932/). (Accessed January 18, 2023) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Schell RC, Allen B, Goedel WC, et al. Identifying Predictors of Opioid Overdose Death at a Neighborhood Level With Machine Learning. Am J Epidemiol. 2022;191(3):526–533. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Lo-Ciganic WH, Huang JL, Zhang HH, et al. Evaluation of Machine-Learning Algorithms for Predicting Opioid Overdose Risk Among Medicare Beneficiaries With Opioid Prescriptions. JAMA Netw Open. 2019;2(3):e190968. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Dong X, Rashidian S, Wang Y, et al. Machine Learning Based Opioid Overdose Prediction Using Electronic Health Records. AMIA Annual Symposium Proceedings. 2019;2019:389. [PMC free article] [PubMed] [Google Scholar]
- 16.Neill DB, Herlands W. Machine Learning for Drug Overdose Surveillance. 10.1080/15228835.2017.1416511. 2018;36(1):8–14. [DOI] [Google Scholar]
- 17.de Ville B Decision trees. Wiley Interdiscip Rev Comput Stat [electronic article]. 2013;5(6):448–455. (https://onlinelibrary.wiley.com/doi/full/10.1002/wics.1278). (Accessed March 14, 2024) [Google Scholar]
- 18.Public Health Data Warehouse (PHD) Technical Documentation | Mass.gov. (https://www.mass.gov/info-details/public-health-data-warehouse-phd-technicaldocumentation#technical-documentation-). (Accessed March 14, 2024)
- 19.PHD Datasets Brief Descriptions. (https://www.mass.gov/doc/phd-datasets-briefdescriptions-pdf/download). (Accessed April 2, 2024)
- 20.Admissions and Releases | Mass.gov. (https://www.mass.gov/lists/admissions-andreleases). (Accessed November 8, 2023) [Google Scholar]
- 21.Somerville NJ, O’Donnell J, Gladden RM, et al. Characteristics of Fentanyl Overdose — Massachusetts, 2014–2016. MMWR Morb Mortal Wkly Rep [electronic article]. 2019;66(14):382–386. (https://www.facebook.com/CDCMMWR). (Accessed August 28, 2023) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Bettano A, Jones K, Fillo KT, et al. Opioid-related incident severity and emergency medical service naloxone administration by sex in Massachusetts, 2013–2019. Subst Abus [electronic article]. 2022;43(1):479–485. (https://pubmed.ncbi.nlm.nih.gov/34283708/). (Accessed August 28, 2023) [DOI] [PubMed] [Google Scholar]
- 23.PHD 2.0 Analytic Data Dictionaries Part 1. :360–368.
- 24.Williams DR, Collins C. Racial residential segregation: a fundamental cause of racial disparities in health. Public Health Reports [electronic article]. 2001;116(5):404. (/pmc/articles/PMC1497358/?report=abstract). (Accessed March 24, 2024) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Gartner DR, Delamater PL, Hummer RA, et al. Integrating Surveillance Data to Estimate Race/Ethnicity-specific Hysterectomy Inequalities Among Reproductive-aged Women: Who’s at Risk? Epidemiology [electronic article]. 2020;31(3):385–392. (https://pubmed.ncbi.nlm.nih.gov/32251065/). (Accessed March 24, 2024) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.SAS. The HPSPLIT Procedure. SAS/STAT(R) 12.3 User’s Guide: High-Performance Procedures. 2018;(http://support.sas.com/documentation/cdl/en/stathpug/66410/HTML/default/viewer.htm#stathpug_hpsplit_overview.htm)
- 27.Current Overdose Data | Mass.gov. (https://www.mass.gov/lists/current-overdosedata#updated-data-%E2%80%93-as-of-june-2023-). (Accessed November 8, 2023)
- 28.Žliobaitė I Measuring discrimination in algorithmic decision making. Data Min Knowl Discov [electronic article]. 2017;31(4):1060–1089. (https://www.researchgate.net/publication/315913147_Measuring_discrimination_in_algorithmic_decision_making). (Accessed August 28, 2023) [Google Scholar]
- 29.Sveen W, Dewan M, Dexheimer JW. The Risk of Coding Racism into Pediatric Sepsis Care: The Necessity of Antiracism in Machine Learning. J Pediatr [electronic article]. 2022;247:129–132. (https://pubmed.ncbi.nlm.nih.gov/35469891/). (Accessed August 28, 2023) [DOI] [PubMed] [Google Scholar]
- 30.Section 35: The Process | Mass.gov. (https://www.mass.gov/info-details/section-35-theprocess). (Accessed November 8, 2023)
- 31.Lim JK, Forman LS, Ruiz S, et al. Factors associated with help seeking by community responders trained in overdose prevention and naloxone administration in Massachusetts. Drug Alcohol Depend [electronic article]. 2019;204. (https://pubmed.ncbi.nlm.nih.gov/31526959/). (Accessed April 1, 2024) [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
