Using Big Data to Predict Outcomes of Opioid Treatment Programs

Wanting CUI; Keren BACHI; Yasmin HURD; Joseph FINKELSTEIN

doi:10.3233/SHTI200571

. Author manuscript; available in PMC: 2021 Jan 30.

Published in final edited form as: Stud Health Technol Inform. 2020 Jun 26;272:366–369. doi: 10.3233/SHTI200571

Using Big Data to Predict Outcomes of Opioid Treatment Programs

Wanting CUI ^a,¹, Keren BACHI ^a, Yasmin HURD ^a, Joseph FINKELSTEIN ^a

PMCID: PMC7847309 NIHMSID: NIHMS1663569 PMID: 32604678

Abstract

Potential of big data analytics in analyzing outcomes of opioid treatment programs (OTP) has not been fully explored. The goal of this study was to assess potential of big data in predicting OTP outcomes based on the initial intake forms which includes demographics, social and health history. The analytical sample comprised over 30,000 people admitted in OTP. Around 66% of patients reported improvements after completing OTP. We compared the results of Logistics Regression, Random Forest, and XGBoost for predictive modeling. XGBoost with sampling and threshold tuning performed the best (44% F1 score) with over 60% accuracy. Further big data exploration of OTP is warranted.

Keywords: Opioid Treatment Program, Big Data Analytics, Machine Learning

1. Introduction

Due to the persistent use of heroin and over-prescription of opioid drugs, opioid addiction has become a serious and pressing issue in recent years. According to CDC, both prescription opioid and heroin overdose deaths are on the rise since 1999 [1]. In 2017, there were over 17,000 overdose deaths of prescription opioids, and over 15,000 overdose deaths of heroin [1]. Not only are most opioid abusers exposed to high risk of contracting infectious diseases, but also they are at risk of developing other mental illness [2–3]. Apart from health concerns, the opioid crisis also exacerbates financial hardship for users. According to the book ‘Medication-Assisted Treatment for Opioid Addiction in Opioid Treatment’, heroin users spend over $5 billion dollars on drug-related legal fees [4].

While the extent and prevalence of opioid addiction are widely studied by the medical world, research on the effectiveness of opioid treatment programs (OTPs) are sparse. There are a fair amount of studies on various drugs used to treat opioid addictions. As early as 1965, methadone was one of the first drugs and the most frequently used medication by far for opioid addiction treatment programs [4]. In 2002, the FDA approved buprenorphine for use in medical maintenance treatment and medically supervised withdrawal [4]. It can be used in OTPs and prescribed by qualified physicians.

AIMS is a unique database that contains patients’ admission, transfer, update and discharge records from the Mount Sinai Health System (MSHS) opioid treatment programs of the New York State area since the 1960s. The database also contains patients’ medications, daily intake logs and drug screen information. By studying the AIMS database, we hope to provide descriptive statistics of patients who enrolled in the program, construct predictive models to identify frequent offenders and to evaluate the effectiveness of the program for different types of patients.

2. Method

2.1. Dataset

Data was collected from the New York State Office of Addiction Service and Supports’ (OASAS), Opioid Treatment Program. Only patients’ information who received treatments at MSHS was collected. We accessed the data in 2019, so it included admission records from May 25th, 1965 to Oct 31st, 2018. There were 5 major updates of the admission form since 2000. Thus, for predictive modeling, we only included admission records after Oct 1st, 2014. Some patients were readmitted to the program multiple times. Since their conditions are different at each admission, we assume independence and treat each admission individually.

2.2. Variables

In predictive modeling, we used the difference of primary substance frequency between admission and discharge to determine patients’ treatment effectiveness. We divided the difference of primary substance frequency into two levels: patients with 0 or negative difference of primary frequency were considered treatment ineffective (Label = 1); patients with positive difference of primary frequency were labeled as 0.

We only kept records where the primary abuse substance is opioid related drugs. Furthermore, records with missing values of primary frequency at admission or discharge were discarded.

Most information obtained through the admission form was used as predictors. We excluded identifiers, variables that were added in 2017 and variables with 50% or more missing values. We used patients’ socio-economic status, living situations, education levels and health conditions. We defined age at admission (AGE) using admission date and date of birth. Speech, hearing, sight and mobility impairments were combined into one variable impairment (IMPAIRMENT). Impairment was a Boolean value. It was true if a patient has one or more types of impairment and it was false if a patient has no impairment. Status of Hepatitis B and status of Hepatitis C were combined into infectious disease (INFECTIOUS) by similar rules.

The project has been approved by institutional ethics board. In exploratory data analysis, we calculated summary statistics. In prediction, we randomly split the data into train (75%) and test (25%) datasets and used 5 fold validation to tune parameters. We adopted logistic regression, random forest and XGBoost, and compared the results of these models. All analyses were performed in Anaconda Jupyter Notebook, using Python 3.7.3.

3. Result

Over 30,000 patients and 46,000 admission records were included in the sample. Around 11,000 patients were admitted to the program more than once. The earliest record was from 1965 and the latest record was from 2018. According to Figure 1, while the number of clinics remained constant since 1970, the number of admissions increased drastically since 2001 and became stable after 2008.

Figure 1. — Number of admission records and number of clinics by year from 1965 to 2018.

The average age of patients at admission was 42 years old. Around 22,000 patients enrolled in the program were male, while only 7,700 of them were female. Around 75% of patients listed heroin as their primary abuse substance. And over 80% of these patients would consume heroin on a daily basis. The two most frequent routes of intake were through inhalation and injection. Furthermore, over 60% patients were addicted to more than one substance.

In toxicology, around 44% of drug tests detected illegal substances in patients’ systems which required further actions. Opiates, benzodiazepines and cocaine were the top 3 most frequently detected drugs. Some patients used several drugs concurrently. Combinations of these drugs were also commonly detected among patients. When constructing machine learning models, we used the difference of primary substance frequency between admission and discharge. During admission, over 80% patients reported using opioid drugs daily. This number decreased to 24% at discharge. 66% of patients showed improvement after treatment, while 30% of patients had no change and 3% of patients used more frequently after treatment.

We compared the results of logistics regression, random forest, and XGBoost. We adjusted the decision threshold, assigned class weight to minority class and employed sampling techniques to overcome the unbalanced dataset issue.

Both logistic regression and XGBoost performed well (Table 1). While using sampling methods generally yielded a high F1 score, the accuracy of these algorithms were relatively low. Meanwhile, the threshold method was good at retaining accuracy, while also improving F1 score. Thus, we combined these two methods together. Our final model used XGBoost with both up sampling and threshold tuning. The model F1 score was 44%, with a 60% of accuracy.

Table 1.

Predictive models results.

	Baseline		Up Sampling		Threshold

	F1 Score	Accuracy	F1 Score	Accuracy	F1 Score	Accuracy
Logistic	0.08	0.64	0.44	0.54	0.36	0.61
R.F.	0.21	0.62	0.24	0.6	0.43	0.52
XGBoost	0.07	0.65	0.45	0.57	0.22	0.63
Final Model	XGBoost + Up Sampling + Threshold			F1 Score: 0.44 Accuracy: 0.6

Open in a new tab

4. Discussion

Based on logistic regression and XGBoost, both models indicated that standard opioid treatment programs are less effective on patients with brain injury or other health conditions. Since patients are required to go to the clinic 6 days a week to obtain treatment, patients with health conditions might find it hard to adhere to such a vigorous schedule. These patients would also require special medical attention throughout the treatments. In addition, patients who used cocaine concurrently with opioids showed less improvement than those that had not used cocaine. In contrast, patients who were referred by the chemical dependence treatment showed better results. These patients may have received previous treatments in other programs or are more prepared for the outpatient chemical dependence treatment format.

5. Conclusion

We analyzed the trends of admission to the OTPs for the past 50 years. There was in a significant increase in admission in the early 2000s and the number of admissions has been constant for the past 10 years. Around 66% of patients reported improvement after treatment. In predictive modeling, we constructed a machine learning model to identify patients who might not be treated effectively by the OTPs, so that additional care could be provided. XGBoost with both up sampling and threshold tuning is our best model with 44% of F1 score. Through this model, we identified that patients with brain injury or other health conditions and patients who use cocaine concurrently required additional help for continuous treatment.

AIMS is an important and complex database. In future studies, we plan to incorporate patient’s medical histories into analysis and evaluate the effectiveness of the treatment program through multiple aspects.

References

[1].Centers for Disease Control and Prevention, National Center for Health Statistics, Multiple Cause of Death 1999–2017 on CDC WONDER Online Database, 2018. https://www.drugabuse.gov/relatedtopics/trends-statistics/overdose-death-rates.
[2].Ehrich E, Tumcliff R, Du Y, et al. „ Evaluation of opioid modulation in major depressive disorder. Neuropsychopharmacology 2015;40(6): 1448–1455. [DOI] [PMC free article] [PubMed] [Google Scholar]
[3].Volkow ND, Jones EB, Einstein EB, Wargo EM. Prevention and Treatment of Opioid Misuse and Addiction: A Review. JAMA Psychiatry 2019;76(2): 208–216. [DOI] [PubMed] [Google Scholar]
[4].Center for Substance Abuse Treatment, Medication-Assisted Treatment for Opioid Addiction in Opioid Treatment Programs, Substance Abuse and Mental Health Services Administration, Rockville, MD, US, 2005. [PubMed] [Google Scholar]

[R1] [1].Centers for Disease Control and Prevention, National Center for Health Statistics, Multiple Cause of Death 1999–2017 on CDC WONDER Online Database, 2018. https://www.drugabuse.gov/relatedtopics/trends-statistics/overdose-death-rates.

[R2] [2].Ehrich E, Tumcliff R, Du Y, et al. „ Evaluation of opioid modulation in major depressive disorder. Neuropsychopharmacology 2015;40(6): 1448–1455. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] [3].Volkow ND, Jones EB, Einstein EB, Wargo EM. Prevention and Treatment of Opioid Misuse and Addiction: A Review. JAMA Psychiatry 2019;76(2): 208–216. [DOI] [PubMed] [Google Scholar]

[R4] [4].Center for Substance Abuse Treatment, Medication-Assisted Treatment for Opioid Addiction in Opioid Treatment Programs, Substance Abuse and Mental Health Services Administration, Rockville, MD, US, 2005. [PubMed] [Google Scholar]

PERMALINK

Using Big Data to Predict Outcomes of Opioid Treatment Programs

Wanting CUI

Keren BACHI

Yasmin HURD

Joseph FINKELSTEIN

Abstract

1. Introduction

2. Method

2.1. Dataset

2.2. Variables

3. Result

Figure 1.

Table 1.

4. Discussion

5. Conclusion

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Using Big Data to Predict Outcomes of Opioid Treatment Programs

Wanting CUI

Keren BACHI

Yasmin HURD

Joseph FINKELSTEIN

Abstract

1. Introduction

2. Method

2.1. Dataset

2.2. Variables

3. Result

Figure 1.

Table 1.

4. Discussion

5. Conclusion

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases