Abstract
Objective.
Our objective is to describe how we combine, at an individual level, multiple administrative datasets to create a Comprehensive Opioid Risk Registry (CORR). The CORR will characterize the role that individual characteristics, household characteristics, and community characteristics have on an individual’s risk of opioid use disorder or opioid overdose.
Data Sources.
Study data sources include the voluntary Oregon All Payer Claims Database (APCD), American Community Survey Census Data, Oregon Death Certificate data, Oregon Hospital Discharge Data (HDD), and Oregon Prescription Drug Monitoring (PDMP) Data in 2013 through 2018.
Study Design.
To create the CORR we first prepared the APCD data set by cleaning and geocoding addresses, creating a community grouper and adding census indices, creating household grouper, and imputing patient race. Then we deployed a probabilistic linkage methodology to incorporate other data sources maintaining compliance with strict data governance regulations.
Data Collection / Extraction methods.
Administrative datasets were obtained through an executed data use agreement with each data owner. The APCD served as the population universe to which all other data sources were linked.
Principal Findings.
There were 3,628,992 unique people in the APCD over the entire study period. We identified 968,767 unique households in 2013 and 1,209,236 in 2018, and geocoded patient addresses representing all census tracts in Oregon. Census, death certificate, HDD, and PDMP datasets were successfully linked to this population universe.
Conclusions.
This methodology can be replicated in other states and may also apply to a broad array of health services research topics.
Keywords: Linkage, opioids, health services research, administrative data
Introduction
The opioid overdose epidemic has claimed hundreds of thousands of lives in the United States, with 46,000 lives lost in 2018 alone1. Although fentanyl and other synthetic analogues of fentanyl are the primary drivers of deaths in recent years, the question remains about how an individual becomes addicted to opioids in the first place and how to mitigate that risk2. Previous studies have focused on characteristics of prescriptions given to opioid naïve patients. For example, one study demonstrated that 5% of opioid naïve patients became long-term users, with higher odds ratios of long-term use associated with multiple prescriptions or higher dose prescriptions3. Another determined that the likelihood of chronic use increased with each additional day of medication supplied starting with the third day4
Although important, these studies merely scratch the surface of the complexities involved in transition from the opioid naïve state to chronic use to misuse and then overdose death. This transition is a multifactorial progression that is not simply the result of a single characteristic, such as the opioid dosage received. Rather, it is an interaction between the home and community where an individual lives, their socioeconomic status, their medical comorbidities and perhaps other undiscovered interactions.
Many previous studies rely on single datasets, such as prescription drug monitoring program (PDMP) or commercial insurance datasets. Administrative datasets used for opioid research are often limited, restricted to a subset of a population (e.g., a single payer type) or a subset of records (e.g., paid pharmacy claims). However, there is great power in leveraging and merging other relevant data, such as community health indicators, medical diagnoses, and vital records7. One example of this process is the Massachusetts Chapter 55 legislation project, which combined death certificate data, PDMP data, and all payer claims data and other data sources, then created partnerships between the state and academics to answer questions related to the opioid epidemic8. This work has been leveraged to answer many questions, including the one-year mortality risk of overdose victims who initially survive, what adverse effects are experienced by seniors inappropriately prescribed opioids, and how often medication for opioid use disorder is prescribed after a nonfatal overdose9–11. Another example is work done in the Kaiser system, which has brought together clinical, dispensing, and socioeconomic data sources to examine opioid-related outcomes across patients in multiple market segments, though the data are limited to Kaiser membership12
In this paper, we describe how we linked, at an individual level, numerous public health datasets with all-payer claims data and census data to create a rich administrative dataset that will assess multiple factors affecting opioid-related risk. The project, funded by the National Institute on Drug Abuse, examines the role that individual characteristics, household characteristics, and community characteristics have on an individual’s risk of opioid overdose. This methodology can be replicated in other states and may also apply to a broad array of health services research topics.
Methods
Study data sources covered years 2013–2018 and included the voluntary Oregon All Payer Claims Database (APCD), American Community Survey Census Data, Oregon Death Certificate data, Oregon Hospital Discharge Data (HDD), and Oregon PDMP Data. Table 1 provides an overview of the data sources. A data use agreement was instituted with all four data owners, outlining access and permissible uses for both the unique patient linkage and the analytic uses after removal of unique person identifiers. The study was reviewed by the Oregon Public Health Division Institutional Review Board which ceded review to Partners Healthcare Human Research Committee, which approved the protocol.
Table 1:
Overview of the Data Sources
Data source | Description | Owner | Dates |
---|---|---|---|
Voluntary All Payer Claims Data (APCD) | Medical and pharmacy claims and enrollment data for most Commercial, Medicaid, and Medicare Advantage plans in Oregon. | Comagine Health Oregon Data Collaborative | 2013 – 2018 |
Census – American Community Survey | Publicly available American Community Survey census data | US Census Bureau | 5-year average, 2014 – 2018 |
Death certificate data (vital statistics) | Cause of death and contributing cause of death for all decedents in Oregon | Oregon Health Authority – Vital Statistics | 2013 – 2018 |
Hospital discharge data (HDD) | Diagnostic and procedure codes recorded during each inpatient hospitalization in Oregon, regardless of payer | Oregon Health Authority – Office of Health Analytics | 2013 – 2018 |
Prescription Drug Monitoring Program (PDMP) | Schedule II-V controlled substances dispensed from outpatient pharmacies in Oregon, regardless of payer | Oregon Health Authority – Violence and Injury Prevention Program | 2013 – 2018 |
Dataset preparation
The APCD served as the base data source for this project, encompassing the universe of patients to which we linked supplemental data sources. The Oregon APCD includes commercial, Medicaid, and Medicare Advantage claims and provided the patient addresses that were critical in establishing household and community groupers in each study year.
We first applied initial inclusion criteria using the APCD enrollment tables. Unique individuals must have met the following requirements: at least one valid zip code in Oregon any time between 2013 and 2018, calculated age between 2 and 100 years as of 1/1/2014, and commercial, Medicare Advantage, or Medicaid coverage anytime between 2013 and 2018. Oregon’s APCD covers roughly 80% of commercially-insured individuals (including Medicare Advantage); 100% of individuals with Medicaid coverage are included. Patients with Medicare Fee For Service (FFS) coverage will be added in a future refresh of this database.
Address cleaning and Geocoding
Cleaning and standardizing patient addresses were critical for two main components of this work: 1) creation of a community grouper (Federal Information Processing Standard, or FIPS code) to link census tract characteristics to individual patients, and 2) creation of a household grouper to identify patients living together within each study year. To include as many patients as possible in these groupers, we developed a program using R software to apply address cleaning processes to all patient addresses on file in the APCD between 2013 and 2018. We standardized the address format by making standardizing street elements, identifying and correcting misspelled words and format patterns, removing extra characters and spaces, and applying a consistent ordering of the address components. We segregated the clean and standardized address components into five distinct variables: Street address, Unit Number, City, State, and Zip.
We defined communities as census tracts using the US Census Geocoding Services Web Application Programming Interface, whichconverts addresses to FIPS codes at the census tract level13. We developed a program using R software to retrieve the 12-digit FIPS code for every standardized address in the APCD. In the end, we created a table with the FIPS code for every de-duplicated address in the APCD representing all census block groups in Oregon.
Community grouper and census indices
We defined communities at the census tract level defined by FIPS code as described above. Census tracks are identified with an 11-digit FIPS code, simply dropping the last digit of the FIPS code.
Each person within a given community was then assigned index scores: variables that characterize the socio-economic characteristics of that community. We used established indices including the Social Vulnerability Index (SVI)14and Area Deprivation Index (ADI)15. These indices allow us to characterize individual patients’ risk of excess opioid harms while accounting for their neighborhood socioeconomic status, minority composition, household composition, housing and transportation and neighborhood disadvantage. The SVI and ADI have been validated to describe a community’s vulnerability to deal with various environmental and health related events16–18.
Household grouper
Using dates at each address from insurance enrollment data, we identified the address where the individual had lived the most days in each calendar year by calculating the number of days at each address. We identified other individuals with the same address in the same year and grouped members as a ‘household,’ then assigned a unique household identifier to each household in each year. We also added a variable to tabulate the number of people in each household. This methodology will allow us to examine patient characteristics and prescription profiles of household members over time, following unique patients as they change insurance, move, or the mix of individuals in their household changes over the study period.
Patient Name
To facilitate the best possible probabilistic linkage, we standardized patient first name, last name and date of birth, the variables that will be used for linkage with other administrative datasets. To do this, we standardized word case in first and last names, removing hyphens and spaces, and formatted date of birth in a standard MMDDYYYY format.
Linkage procedure
Each public health dataset was probabilistically linked, one at a time, to the APCD using patient first name, last name, and birth date. The linkage analyst used linkage and de-duplication software (FastLink in R) to probabilistically link individuals19. We used the Fastlink version 0.5.0, published on November 12, 2018.
Because each dataset was housed in a separate division within the Oregon Health Authority, governed by a different set of legislative statutes, our linkage process was necessarily complex and carefully ordered (Figure 1). A critical part of this process was a firewall between the analysts performing the linkage and the research study team. The linkage analyst is one of the authors (ND) and had access to the patient identifiers required to perform a probabilistic linkage. She worked in a secure and strictly controlled environment. The study team never had access to this secure environment, and no study team member was able to access the linked datasets until they were de-identified to the satisfaction of all data owners.
Figure 1: Comprehensive Opioid Risk Registry Linkage Procedure.
†APCD = All Payer Claims Data, PDMP = Prescription Drug Monitoring Program, CORR = Comprehensive Opioid Risk Registry
Step 1: Creating the enhanced APCD data set
The first step was to create an “Enhanced All Payer All Claims” dataset containing patient level data from the APCD, Vital Statistics, and the Census. The vital statistics dataset was generated by the data owners and sent to the linkage analyst with patient first name, last name, and date of birth in place. The Census data (SVI and ADI) was deterministically linked to all unique individuals in the population universe using FIPS codes14,15. After linkage, the FIPS codes were masked and replaced with a randomly generated unique community number.
Step 2: Linkage to the Hospital Discharge Data
The second step was to link the HDD to the Enhanced APCD. Because researchers cannot retain patient-level identifiers in the HDD data, an HDD analyst and the linkage analyst conducted the linkage together. Patients successfully linked from the HDD were assigned a random ID number, consistent between both datasets, before patient names and dates of birth provided by the HDD were destroyed. The HDD analyst retained custody of the HDD patient identifiers at all times.
Step 3: Creating a minimally necessary dataset
Prior to linkage with the PDMP, a minimally necessary dataset was created to ensure that no individual could be traced back to the original datasets once the PDMP data were linked in. The linkage analyst generated binary and categorical variables from the source data that contain only the information necessary to conduct the study aims. Once the minimally necessary dataset was created, the linkage analyst removed the source variables. The original vital statistics and HDD datasets were then destroyed to further reduce the chance of patient re-identification.
Step 4: Linkage to the PDMP
In the fourth step we linked the minimally necessary dataset to the PDMP. The PDMP dataset has strict regulations governing its use, so it was linked last to maintain compliance with the governing statute and ensure that all identifying information was removed prior to this linkage. Similar to the HDD data governance, researchers cannot retain patient-level identifiers in the PDMP data, so a PDMP analyst and the linkage analyst worked together to link individuals from the PDMP to the APCD. This process was modified to include COVID19 precautions, enabling a remote connection and adequate physical distancing between the PDMP and linkage analyst and other team members. A unique study ID was assigned to each unique patient, provider, and pharmacy and then all identifiers were destroyed. The PDMP analyst retained custody of the PDMP identifiers at all times.
Step 5: Final CORR database de-identification
The fifth and final step was the de-identification of the newly created Comprehensive Opioid Risk Registry (CORR) database. In compliance with Oregon statute, the linkage analyst created randomly generated unique study identifiers before destroying all patient, provider, and pharmacy identifiers as well as the source APCD and census data sets to prevent re-identification. The result is a robust dataset that can be used for extensive analyses without risk to individuals’ health information security and privacy.
Results
There were 3,628,992 unique people who met our initial population inclusion criteria in the APCD. This was the population to which the public health datasets (HDD, Vital Statistics, PDMP) were linked. All unique individuals with a valid address in any single year, 2013 – 2018, were included in the population universe, and placed in households and communities according to their address and FIPS code (Figure 2).
Figure 2: APCD Preparation and Resulting Comprehensive Opioid Risk Registry Creation.
†APCD = All Payer Claims Data, PDMP = Prescription Drug Monitoring Program
The number of unique individuals with a valid address ranged from 1,970,320 in 2013 to 2,528,446 in 2018. The increase in unique individuals is explained by several factors: first, Oregon’s Medicaid expansion program (in alignment with the Affordable Care Act) was implemented in 2014, significantly reducing the number of uninsured residents and increasing the number of individuals captured in the APCD beginning in 2014; second, the population of Oregon grew during this study period as more people moved to the state; third, low unemployment and sustained Medicaid funding resulted in more insured individuals captured in the APCD; and lastly, the APCD transitioned vendors during this study period and some data from 2013 were lost. The increase in unique individuals captured over time will not be a substantial limitation of the CORR database because 2013 will be used as a look-back year while the study period will run 2015–2018, years in which >2 million individuals met inclusion criteria.
The number of unique households ranged from 968,767 in 2013 to 1,209,236 in 2018, which averages to 2.0 people per household in 2013 to 2.1 people per household in 2018. The number of unique communities (census tracts) range from 827 in 2013 to 828 in 2018. The number of communities increased because one census tract crosses the border between Oregon and Washington and included additional people in 2018. As such, the CORR database contains residents of all census tracts in Oregon (Figure 2).
Discussion
The transition from opioid naïve state to chronic use, misuse, and overdose is complex and multifactorial, and our methodology for linking All Payer Claims Data, Vital Statistics, Hospital Discharge, Prescription Drug Monitoring Program, and census data is robust and enables researchers to examine these complexities and interactions. Our methodology includes a linkage procedure to comply with multiple state governance rules, cleaning and preparation of APCD for address geocoding and patient-level probabilistic linkage, and linkage of disparate data sources. This robust database will enable us to examine community social determinants of health in combination with individual factors, prescription factors, and household factors.
Existing literature on linkage of health services data is limited, but previous work has identified several important challenges to investigator-initiated linked databases. Obtaining the appropriate regulatory approvals and getting data use agreements in place can take months or years and the process is often subject to delays that are hard to reconcile with project timelines20,21. Linkage of data sets requires expertise and knowledge regarding the datasets to be linked and skills in the use of software for conducting the linkage, thus requiring a technically skilled multidisciplinary team of analysts and researchers20. These challenges mean that linked data sets of this nature are often expensive and time-intensive to create, and the statutory limitations often mean that the datasets created can only be used once for the research they were created for20. Some states, such as Massachusetts’ Chapter 55, have begun to streamline data linkages to facilitate research in opioid use outcomes22. Despite these challenges, it is important to leverage administrative datasets to conduct research that increases our understanding of multi-dimensional phenomena such as the opioid crisis7.
This study raises several important issues. First, data governance and ownership of public health datasets vary between and within states, and this linkage procedure provides a guideline for navigating these complex requirements. Governance restrictions could limit the utility of this exact methodology, but with modifications this procedure could be replicated in the 26 other states that have both a PDMP and an APCD (with 5 additional PDMP states currently implementing an APCD). This procedure was the result of significant collaboration between the data owners and the authors, beyond standard data request processes, fueled by a belief in the significant research potential for these linked datasets. This database model provides significant advantages over other models that are restricted to a subset of the population (e.g. single payer type) or a subset of records (e.g. paid pharmacy claims). For population health research, this ability to consider other inputs into a patient’s health status beyond medical services (e.g. comorbidities of household members, neighborhood stressors, household prescription availability, etc.) is powerful, and is a benefit that could be expanded beyond opioid research. The interactions between an individual’s home and community and their socioeconomic status are important considerations for any population-level health services research and could provide a ripe opportunity to identify interventions to address social determinants of health.
This approach also has some important limitations. As with any probabilistic linkage, we may inaccurately link individuals between data sources and may not link individuals that should have been. We minimized this risk with robust cleaning and standardization of linkage variables. When capturing opioid related harms (e.g. overdose), we may miss some events that were not recorded in a medical setting and did not result in death. Administrative data sources are limited in their capture of other opioid-related outcomes, such as opioid misuse or use disorder. We are planning a future refresh of this database, including the addition of emergency medical services (EMS) data to identify overdoses that prompted an emergency response where the patient may not have been transported to a hospital. Household and community estimates of opioid prevalence may be underestimated, as patients with Medicare fee for service insurance are not currently included. The future refresh of this database will include this population.
In conclusion, this preparation and linkage of previously disparate data sources is novel and could be replicated for health services research where examination of community social determinants of health in combination with individual factors, prescription factors, and household factors, is relevant. Future studies could focus on the addition of other data sources, application of this model in other states, or use of this model to study the effects of other drugs (e.g. stimulants) or the COVID-19 pandemic on individuals and their communities.
Key Points.
The transition from the opioid naïve state to chronic use, misuse and overdose is a multifactorial progression that involves the interaction of household, community, and individual characteristics.
This article outlines a procedure to comply with various state data governance rules to create a robust database to further opioid research, considering patient household and community factors when assessing overdose risk
A methodology to clean and prepare All Payer Claims Data for address geocoding and patient-level probabilistic linkage.
A methodology to link disparate data sources to further population-level health services research, including All Payer Claims Data, Vital Statistics, Hospital Discharge, Prescription Drug Monitoring Program, and census data.
Acknowledgements
The authors wish to thank Benjamin Chan, Dancia Hall, Craig New, Steven Ranzoni, Josh Van Otterloo, Peter Geissert, Dagan Wright, and Laura Chisolm at the Oregon Health Authority for their ongoing partnership, support, and insight. This work would not be possible without funding from the National Institutes of Health, NIH 1-R01-DA044167.
Research contained in this paper was funded by NIH grant 1-R01-DA044167. The funder was not involved in the study design, collection, analysis and interpretation of data, nor in the writing of the report or the decision to submit the report for publication. The conclusions in this article are those of the authors and do not necessarily represent the official position of the funders. The authors have no conflicts of interest to declare.
Footnotes
This work was previously presented at the National Prescription Drug and Heroin Summit, the Annual Meeting of the National Association of Health Data Organizations, and at a meeting of the Oregon Public Health Association. A poster was presented at the Academy Health Annual Research Meeting. This manuscript, in part or in full, has not been submitted or published anywhere.
References
- 1.Wilson N, Kariisa M, Seth P, Smith IV H, Davis NL. Drug and opioid-involved overdose deaths—United States, 2017–2018. MMWR Morb Mortal Wkly Rep. 2020;69(11):290–297. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Gladden RM, O’Donnell J, Mattson CL, Seth P. Changes in Opioid-Involved Overdose Deaths by Opioid Type and Presence of Benzodiazepines, Cocaine, and Methamphetamine — 25 States, July– December 2017 to January–June 2018. Morbidity and Mortality Weekly Report. 2019;68(34):737–744. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Deyo RA, Hallvik SE, Hildebran C, et al. Association between initial opioid prescribing patterns and subsequent long-term use among opioid-naive patients: a statewide retrospective cohort study. J Gen Intern Med. 2017;32(1):21–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Shah A, Hayes CJ, Martin BC. Characteristics of Initial Prescription Episodes and Likelihood of Long-Term Opioid Use — United States, 2006–2015. Morbidity and Mortality Weekly Report. 2017;6(10):265–269. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Butler MM, Ancona RM, Beauchamp GA, et al. Emergency Department Prescription Opioids as an Initial Exposure Preceding Addiction. Annals of Emergency Medicine. 2016;68(2):202–208. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Lipari RN, Hughes A. How people obtain the prescription pain relievers they misuse. 2017. The CBHSQ Report: January 12, 2017. [PubMed] [Google Scholar]
- 7.Smart R, Kase CA, Taylor EA, Lumsden S, Smith SR, Stein BD. Strengths and weaknesses of existing data sources to support research to address the opioids crisis. Preventive Medicine Reports. 2020;17:101015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.The Commonwealth of Massachusetts. An Assessment of Fatal and Nonfatal Opioid Overdoses in Massachusetts (2011–2015). In. Massachusetts Department of Public Health 2017. [Google Scholar]
- 9.Weiner SG, Baker O, Bernson D, Schuur JD. One year mortality of patients treated with naloxone for opioid overdose by emergency medical services. Substance Abuse. 2020:1–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Rose AJ, McBain R, Schuler MS, et al. Effect of Age on Opioid Prescribing, Overdose, and Mortality in Massachusetts, 2011 to 2015. Journal of the American Geriatrics Society. 2019;67(1):128–132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Larochelle MR, Bernson D, Land T, et al. Medication for Opioid Use Disorder After Nonfatal Opioid Overdose and Association With Mortality: A Cohort Study. Ann Intern Med. 2018;169(3):137–145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Mosen DM, Rosales AG, Mummadi R, Hu W, Brooks N. Demographic, Clinical, and Prescribing Characteristics Associated with Future Opioid Use in an Opioid-Naive Population in an Integrated Health System. The Permanente journal. 2020;24:1–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.U.S. Census Bureau. Geocoding Services Web Application Programming Interface (API). https://geocoding.geo.census.gov/geocoder/Geocoding_Services_API.pdf. Published 2019. Accessed March 23, 2020.
- 14.Flanagan BE, Hallisey EJ, Adams E, Lavery A. Measuring community vulnerability to natural and anthropogenic hazards: the centers for disease control and prevention’s social vulnerability index. J Environ Health. 2018;80(10):34–36. [PMC free article] [PubMed] [Google Scholar]
- 15.Singh GK. Area deprivation and widening inequalities in US mortality, 1969–1998. Am J Public Health. 2003;93(7):1137–1143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Wolkin A, Patterson JR, Harris S, et al. Reducing public health risk during disasters: identifying social vulnerabilities. J Homel Secur Emerg Manag. 2015;12(4):809–822. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Kind AJ, Jencks S, Brock J, et al. Neighborhood socioeconomic disadvantage and 30-day rehospitalization: a retrospective cohort study. Ann Intern Med. 2014;161(11):765–774. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Hu J, Kind AJH, Nerenz D. Area deprivation index predicts readmission risk at an urban teaching hospital. Am J Med Qual. 2018;33(5):493–501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Enamorado T, Fifield B, Imai K. Using a probabilistic model to assist merging of large-scale administrative records. Available at SSRN 3214172. 2018. [Google Scholar]
- 20.Bradley CJ, Penberthy L, Devers KJ, Holden DJ. Health services research and data linkages: issues, methods, and directions for the future. Health Serv Res. 2010;45(5 Pt 2):1468–1488. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Harron K, Dibben C, Boyd J, et al. Challenges in administrative data linkage for research. Big Data Soc. 2017;4(2):2053951717745678. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Weiner SG, Baker O, Bernson D, Schuur JD. One-year mortality of patients after emergency department treatment for nonfatal opioid overdose. Ann Emerg Med. 2020;75(1):13–17. [DOI] [PMC free article] [PubMed] [Google Scholar]