geoPIPE: Geospatial Pipeline for Enhancing Open Data for Substance Use Disorders Research

Daniel R Harris; Nick Anthony; Mojde Mir; Chris Delcher

. 2023 Apr 29;2022:522–531.

geoPIPE: Geospatial Pipeline for Enhancing Open Data for Substance Use Disorders Research

Daniel R Harris ¹, Nick Anthony ¹, Mojde Mir ², Chris Delcher ¹

PMCID: PMC10148314 PMID: 37128463

Abstract

We present our open-source pipeline for quickly enhancing open data sets with research-focused expansions and show its effectiveness on a cornerstone open data set released by the Cook County government in Illinois. The City of Chicago and Cook County were both early adopters of open data portals and have made a wide variety of data available to the public; we focus on the medical examiner case archive which provides information about deaths recorded by Cook County’s Office of the Medical Examiner, including overdoses invaluable to substance use disorder research. Our pipeline derives key variables from open data and links to other publicly available data sets in support of accelerating translational research on substance use disorders. Our methods apply to location-based analyses of overdoses in general and, as an example, we highlight their impact on opioid research. We provide our pipeline as open-source software to act as open infrastructure for open data to help fill the gap between data release and data use.

Introduction

Open data enhances scientific research by improving transparency and reproducibility; it is largely regarded as an essential component of the future of data-driven research ^1–3. State and local governments are beginning to adopt open data strategies for non-research purposes; this can be advantageous for research. For example, a case study on open data in Chicago highlighted the importance of open data for civic engagement and as a transformative actor in changing traditional political communication⁴. The utility of open data has been discussed and established in a variety of domains, including specialized science domains and data science^5–7. Although open data can be considered the future of data-driven research, reservations exist including concerns open data will release the “floodgates” to endless general queries, misinterpretation of analytical results, and record privacy³. Various environments and contexts lead to the development of open data portals and policies for which there is no publicly accepted standard ⁸. Policies allowing the open release of data often weigh the trade-off between risk and openness, yet evidence is growing to suggest that transparency mitigates future risk and increases value of the data⁸. Translational research, in particular translation to population health or “T4” research, can leverage large data sets to monitor morbidity, mortality, benefits, and risks of clinical interventions, public health policies, and environmental changes ^9–11. Open data can serve as the bridge between small-scale systems and large-scale systems for population health analytics.

Open data is a relatively new concept and is still in an early stage of development as a field¹². An open data ecosystem needs four key components: online availability of published data, methods for search and viewing data and licenses, data cleaning and enrichment (linkage, analytics, visualization), and a feedback loop for data providers and other important stakeholders¹³. Publicly accessible open data portals are web applications leveraging popular software frameworks. The Socrata Open Data Platform is used by large U.S. cities, including Chicago¹⁴. The Comprehensive Knowledge Archive Network (CKAN) is used by the official data portals for Australia and Canada and by data.gov, the official open data portal for the United States¹⁵. These software frameworks allow users to browse and download data; they have extensive application programming interfaces (APIs) available for developers to enhance all parts of the data life cycle. These frameworks typically offer prepackaged analytical and visualization features to identify trends, patterns, and relationships in data; however, their application for a specific research question can be limited. The research community expands upon open data as needed; for example, enhanced visualization of open data from Chicago demonstrated improved interpretability of the data¹⁶.

Limitations of open data naturally exist due to real-world limitations; namely there exists a gap between publishing data and enabling the analytical use and reuse of data^17,18. A common issue is that open data remains unlinked to other existing data sets such as those important for public health research¹². The opioid overdose crisis remains a critical concern for many states and data is a key element in understanding opioid-related issues^19,20. For example, naloxone is known to be an effective strategy in combating opioid overdoses and is distributable by any pharmacy, yet many research questions centered around naloxone exist and require data to effectively answer¹⁹. Using deaths recorded by the Cook County Medical Examiner’s Office, we highlight our open data enrichment methodology to enhance utility of death locations, extract important drug names from cause of death fields, and calculate distances to important landmarks or points of interest. To specifically help analyze the role of location in opioid-related overdoses, we calculate distance between opioid-related fatal overdose incidents and local pharmacies. We package our methods into a reusable and open-source pipeline which further adds to the reproducibility of analyses with open data and enables its application in future research.

Methods

We demonstrate our methods on open data published by the Cook County Medical Examiner’s Office (Illinois) ²¹ This open data set is one of the first of its kind nationally and offers details on deaths recorded by the medical examiner (ME). Death data is incredibly important to a wide range of health outcomes research; in its raw form, ME data aligns with the “person”, “place”, and “time” model for epidemiologic investigations of death²². Figure 1 highlights deaths reported to the Cook County Medical Examiner’s Office and made available through their open data portal which began mid-year 2014. Opioid-related deaths steadily increased between 2014 and 2020, aligning with national trends. 2020 saw sharp increases in the number of reported deaths, including those where the primary cause was COVID-19 (51.97%).

Figure 1. — Deaths reported to the Cook County Medical Examiner’s Office (any versus opioid-related causes)

We enrich this data set with features broadly applicable to mortality research and with features specifically related to substance use disorder research and in particular research with place-based analyses. Figure 2 summarizes the key steps of our pipeline. Our pipeline begins by downloading the ME case data and preprocessing it to improve a spatial join to reference data. The ME data is hosted in a web application utilizing the Socrata Data Platform²¹; our pipeline connects to Socrata using the Socrata application programming interface (API) and downloads data matching the unique identifier of the ME case data set.

Figure 2. — Our pipeline for enriching open data

The ME data contains the incident address and coordinates (latitude and longitude) of the death if the address was geocoded by the county before posting; this data is missing for some records. Our Cook County data had approximately 65,227 records with 8,528 of those missing latitude and longitude. For those with missing coordinates, the address is cleaned and prepared by removing spurious information (such as apartment numbers and non-alphanumeric characters), standardizing capitalization, and backpatching missing fields when possible (such as determining missing cities based on other fields). Once prepared, the cleaned addresses are geocoded to retrieve the missing latitude and longitude. For geocoding, our pipeline uses ArcGIS Geocoding Python library; this library connects to ArcGIS servers via an API key and returns coordinates and a score indicating the quality of the address match calculated. Scores range from 0 (worst) to 100 (best), where candidates are scored according to how well their normalized address matches reference addresses; a score of 100 indicates a perfect match²³. These scores are not in the original data but are key in understanding potential data quality issues. If needed for analysis, we remove records with geocoded quality scores indicating erroneous or imprecise matching. We managed to recapture 7,311 of initial the 8,528 (~86% recovery rate) records missing latitude and longitude; records not recovered likely did not have an incident address. The newly geocoded data replaces the missing data in the open ME data set.

Our pipeline joins land-use inventory, public parks shapefiles, and a data dictionary to create annotated spatial and reference data^24,25. The cleaned ME data and the reference shapefiles are processed using ArcGIS to map ME points into land-use polygons. Unwanted columns are dropped after the spatial join is complete; the important fields for analysis are retained. The spatially joined data is then joined and appended to the open data set as additional fields. Our pipeline supports calculating distances from the incident location to important landmarks for research purposes; it can then assign the closest distance to such landmarks for each death record. Our distance function uses the haversine formula for calculating distance between two points on a sphere²⁶. We chose to calculate distance between the overdose incident and pharmacies due to the important role that pharmacies play as sources of naloxone for reversing an opioid overdose; we show how the output of these distance calculations as a function of naloxone access can be used in our results discussion.

We enrich the data by extracting drug names from the primary and secondary cause of death fields. The underlying cause-of-death is defined by the World Health Organization (WHO) as “the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury.”²⁷ This field is a string describing the cause of death, such as “NOVEL CORONA (COVID-19) VIRAL INFECTION” or “COMBINED DRUG (COCAINE, FENTANYL, HEROIN, CLONAZEPAM)”. We utilize a drug extraction tool developed in-house that operates on a custom lexicon of drugs; this lexicon includes both therapeutic medicines and illicit substances. Given a list of drugs and a category grouping alike drugs, this tool extracts mentions using popular string similarity measures. We use Jaro-Wrinkler by default for measuring string similarity and wish to offer algorithm choice as a future configuration option due to known differences in sensitivity and performance across techniques²⁸. A heuristically determined threshold is used to generate a list of successful drug mentions; individual drugs are rolled up into their common groupings. The drugs extracted from these columns are merged into the same file and indicator flags for the presence or lack of each drug are generated per record.

In total for the Cook County data, 134 flags for drugs and drug groupings are generated for both primary and secondary causes which are needed for mortality research. For example, if cocaine_primary is true, then cocaine contributed as the primary cause of death and if cocaine_secondary is true then cocaine was present as a secondary cause. These indicator flags are joined and appended to the open data set as additional fields.

The pipeline also searches the secondary cause of death field for mentions of “hot” or “cold” as evidence that the death may have been temperature related. These additions complement existing fields in the data where the ME has determined temperature to play a role in the death. Together, these variables enable future analyses measuring the impact of cold and heat waves which is important given the association between homelessness and outdoor exposure in those with substance use disorders²⁹. Additionally, important labels such as classifying locations as motels or hotels are created, which can be locational “hot spots” for drug use³⁰. After all enhancements are calculated, a post-processing step combines the results from previous steps into a single data frame and performs data quality checks. We remove unnecessary columns, such as columns temporarily needed by our GIS methods, and eliminate duplicates in preparation of running a final suite of data quality checks.

As a final step, a suite of tests is performed against each column to validate its data by requiring certain assertions to be true. For example, the final data must contain the ME “casenumber” column and it must be unique. We ensure the age column is appropriately non-negative. We check if “male”, “female”, and “unknown” labels exist for sex. We verify there are no unexpected values for race and manner of death. The ranges for most columns are checked to ensure they align with expectations. For example, the latitude and longitude values are checked to see if they are within an expected boundary box.

Results

We present the results of our pipeline on the Cook County, IL ME data; the pipeline took roughly two hours to generate the enriched data set. Most of this time was spent geocoding the missing coordinates. There were 65,227 death records total and 7,311 of those required geocoding to fill in missing latitude and longitude. For the missing data, the mean geocoding quality score was 93.9 (where 100 is a perfect deterministic match to a street in the reference data); the minimum score was 70 and the maximum was 100. We normalized missing data (1,172) by forcing addresses with “N/A”, “UNK”, or “UNKNOWN” to a standardized sentinel value for missing. Additionally, we patched 1,307 records with missing “city” fields in the death location data by swapping in the “city” field from the decedent’s residence. This resulted in 63,795 records with viable addresses after cleaning and merging the newly geocoding results. Our results contained 2,083 non-missing street addresses occurring more than once which we identified to support analysis of addresses with multiple overdose incidents. Geocoding missing addresses did not significantly impact mean latitude and longitude, having differences of 0.005 and 0.012 respectively, which represented a shift approximately a third of a mile north and two-thirds of a mile west. The geocoded results from the missing addresses did expand the bounding box of all possible locations by approximately 28 miles north/south and 8 miles east/west.

Drug names were extracted from 5,787 of the 65,227 records using our string similarity metrics on primary and secondary cause of death fields. Table 1 shows example output for a single record where two drugs were mentioned. The word “cocaine” matched exactly on our Cocaine search group which is tagged as a drug and as a stimulant. Alternatively, we also had a match for “cociane” in a different record with a similarity score of 0.966 due to the modest error of transposing the “i” and “a” in the word.

Table 1:

Sample output for drug extraction for a single example record.

Record	Search Group	Word Found	Similarity	Tags
1	Cocaine	cocaine	1.0	drug;stimulant
1	Alcohol	ethanolism	0.94	drug;eth_alc

Open in a new tab

The tags referenced in Table 1 are high-level groupings assigned to each drug name in the configuration file of the drug extraction tool³¹. Tags on successful matches are converted into flags and joined back to the original data by our pipeline. For example, our record #1 from Table 1 would have positive entries for “drug” and “stimulant” variables due to the similarity matching. We consider a similarity score strictly higher than 0.90 as a successfully matched record; the optimal threshold may vary from category to category. These thresholds were selected after manual review of the extraction output from preliminary runs. Tags are not hierarchical, but we are generating links to RXNORM which in turn is hierarchical³².

Additionally, we added a COVID search group and tag containing words relevant to COVID-19 (corona, coronavirus, covid19, etc) which generated 18,117 hits; a single cause of death description can contain hits from multiple groups if more than one cause is listed. Figure 3 shows the occurrence of the top 10 most frequent hits outside of COVID, which is led by fentanyl, alcohol, and heroin. ANPP is a metabolite of fentanyl. Table 2 shows matches for alcohol, cocaine, and fentanyl where the alcohol search group has two search terms: alcohol and ethanol. Any number of terms may be used when defining the input to the string-matching algorithm. This is controlled by a configuration file in the drug extraction component and is completely customizable to the needs of the user³¹.

Figure 3. — Frequency of drugs matched using our string similarity metrics.

Table 2:

Example words matched to various categories with their mean similarity score and frequency.

		Similarity
Search Group	Word Found	Mean	Frequency
Alcohol	alcohilsm	0.90	1
	alcohol	1.0	365
	alcoholic	0.95	36
	alcoholism	0.94	504
	ethanol	1.0	2,979
	ethanoliam	0.94	1
	ethanolism	0.94	2,329
	ethaolism	.90	1
Cocaine	cocaine	1.0	4,152
Cocaine	cociane	0.96	1
Fentanyl	fentanyl	1.0	6,273
Fentanyl	fetal	0.90	57
Heroin	heroin	1.0	4,614

Open in a new tab

This table highlights the occurrence of simple typos that happen in the real world despite best efforts in adhering to disciplined manual data entry. Note that some candidate words may match erroneously as demonstrated by “fetal” matching “fentanyl” by simply removing three letters. In this case, the similarity score is generously 0.90 for this match. Our initial review of drug extraction output guided our selection in choosing a safe but effective threshold strictly above 0.90 for determining a successfully matched record. Our similarity algorithm performs well when suffixes change on root words, such as “alcohol” matching both “alcoholic” (0.95) and “alcoholism” (0.95) or when simple keystroke errors introduce missing letters or transposed letters (“cocaine” vs “cociane” at 0.96).

We calculated the distance of every incident address to the nearest pharmacy. The average distance between each death and the nearest pharmacy was 0.84 miles (mi) with a standard deviation of 7.24 mi. The closest pharmacy was within 0.00 miles of the death, due to deaths occurring in hospital systems with integrated pharmacies, while the furthest distance was 355.79 mi, which represented an address outside of our Cook County, IL study area. Approximately 25% of deaths were within 0.17 mi of the nearest pharmacy, 50% were within 0.31 mi, and 75% were within 0.50 mi. For opioid overdoses, 62% of deaths where within one-third of a mile to the nearest pharmacy. Figure 4 shows overdose locations (in orange), the overdose locations within one-third of a mile from a pharmacy (in white), and pharmacy buffer areas sized one-third of a mile (in blue); this figure covers data in 2020.

Figure 4. — Overdose locations and proximity to pharmacies.

Discussion

Our pipeline enriches open data sets by adding key components needed to understand how location and specific drugs contribute to overdose. The biggest benefit is that the pipeline is reusable and deterministic; anyone can run our software to generate our enhancements of the Cook County, IL data set. Additionally, little adjustment is needed for running this on other open or public data sets which could include other medical examiner’s offices, coroner offices, emergency medical services, syndromic surveillance, and social media (e.g., discussions that include these drug names). A key adjustment in future adoption is ensuring that important variables, such as address or cause of death, have the correct expected names.

Our results show that our pipeline can be applied to any number of research questions regarding substance use disorders where location plays an important role. The search groups and tags were originally proposed and generated by a domain expert in substance use disorders wishing to subdivide the Cook County data into meaningful subgroups based on cause of death; the same domain expert helped validate our extractions by manually reviewing records. A future enhancement would focus on utilizing our RxNorm connection to provide hierarchical labels for grouping drugs together within meaningful categories.

We released our pipeline as open-source software available for anyone to use³³. The pipeline acts as a template for enriching open data sets and can be easily modified for use with other data sets. In the case of modification for other data sets, changes can be recommended for inclusion as a function of being in an open-source community, ensuring that the pipeline improves over time and can develop into a suite of flexible open data pipelines. Prospective work includes improving the pipeline download process and developing the user-interface to remove the requirement that the user have working knowledge of command-line tools and environments.

The open-data enhancements are rejoined to the original data and support any number of geospatial analytic tasks. Figure 5 shows a “hot spot” analysis of opioid deaths using our geospatial components derived from the original data, which highlights which locations are the most meaningful in the data. Naloxone availability has been demonstrated to be an essential component of addressing the opioid overdose crisis¹⁹. The utility of calculating overdose distance to the nearest pharmacies lies in either the abilities to analyze geospatial accessibility of naloxone (with neighborhood socioeconomic demographics) and to measure the impact of public policies on naloxone accessibility over time³⁴. We saw that 62% of opioid overdoses occurred within a third of a mile of a pharmacy, highlighting the potential role that pharmacies play in the opioid overdose crisis as distributers of naloxone¹⁹.

We wish to expand our pipeline to process additional location-based contextual columns that define details about the location of death; descriptions vary from “hospital” to “bathroom”. Cross-referencing locations such as “bathrooms” with land-use data will verify locations as residential or commercial. In the example of bathrooms in commercial zones, reference data for commercial properties may allow further identification of problematic hot spots and could aid in the effective placement of naloxone boxes designed for public use in case of emergencies³⁵.

Conclusion

We discussed and highlighted key features of our pipeline for enriching open data sets for location-based analyses useful in substance use disorders research and gave examples of using our pipeline to calculate distances relevant for opioid-specific contexts. We focused on death records released in an open data set by the Office of the Medical Examiner in Cook County, IL. This cornerstone data set is one of the first of its kind in that it offers a variety of location-based contextual details for deaths occurring mid-year 2014 to present. As future work, we wish to explore adapting our work to create reusable pipelines on other open data sets to demonstrate reuse and interoperability. We also wish to incorporate a final analytic step which pipes the enhanced data into machine learning algorithms we are currently developing; these algorithms are designed to make neighborhood-level predictions that are important for understanding the local context of overdose deaths.

Acknowledgement

This project is fully supported by the Centers for Disease Control and Prevention of the U.S. Department of Health and Human Services (HHS) as part of grant 1R01CE003360-01-00. The contents are those of the author(s) and do not necessarily represent the official views of, nor an endorsement, by CDC/HHS, or the U.S. Government. We also wish to thank the Cook County Government of Illinois and in particular the Office of the Medical Examiner for opening their data to the public.

Figures & Table

References

1.Murray-Rust P. Open Data in Science. Nat Prec. Published online January 18, 2008. pp. 1–1. doi:10.1038/npre.2008.1526.1.
2.Molloy JC. The Open Knowledge Foundation: Open Data Means Better Science. PLOS Biology. 2011;9(12):e1001195. doi: 10.1371/journal.pbio.1001195. doi:10.1371/journal.pbio.1001195. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Gewin V. Data sharing: An open mind on open data. Nature. 2016;529(7584):117–119. doi: 10.1038/nj7584-117a. doi:10.1038/nj7584-117a. [DOI] [PubMed] [Google Scholar]
4.Kassen M. A promising phenomenon of open data: A case study of the Chicago open data project. Government Information Quarterly. 2013;30(4):508–513. doi:10.1016/j.giq.2013.05.012. [Google Scholar]
5.Reichman OJ, Jones MB, Schildhauer MP. Challenges and Opportunities of Open Data in Ecology. Science. 2011;331(6018):703–705. doi: 10.1126/science.1197962. doi:10.1126/science.1197962. [DOI] [PubMed] [Google Scholar]
6.Tatem AJ. WorldPop, open data for spatial demography. Sci Data. 2017;4(1):170004. doi: 10.1038/sdata.2017.4. doi:10.1038/sdata.2017.4. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Uhlir PF, Schröder P. Open Data for Global Science. Data Science Journal. 2007;6:OD36–OD53. doi:10.2481/dsj.6.OD36. [Google Scholar]
8.Zuiderwijk A, Janssen M. Open data policies, their implementation and impact: A framework for comparison. Government Information Quarterly. 2014;31(1):17–29. doi:10.1016/j.giq.2013.04.003. [Google Scholar]
9.Surkis A, Hogle JA, DiazGranados D, et al. Classifying publications from the clinical and translational science award program along the translational research spectrum: a machine learning approach. Journal of Translational Medicine. 2016;14(1):235. doi: 10.1186/s12967-016-0992-8. doi:10.1186/s12967-016-0992-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Vukotich CJ. Challenges of T3 and T4 Translational Research. Journal of Research Practice. 2016;12(2) Accessed February 16, 2022. https://eric.ed.gov/?id=EJ1121185. [Google Scholar]
11.Pathfinder. Accessed February 16, 2022. https://catalyst.harvard.edu/pathfinder/
12.Weerakkody V, Irani Z, Kapoor K, Sivarajah U, Dwivedi YK. Open data and its usability: an empirical view from the Citizen’s perspective. Inf Syst Front. 2017;19(2):285–300. doi:10.1007/s10796-016-9679-1. [Google Scholar]
13.Zuiderwijk A, Janssen M, Davis C. Innovation with open data: Essential elements of open data ecosystems. Information Polity. 2014;19(1-2):17–33. doi:10.3233/IP-140329. [Google Scholar]
14.Open Data Platform | Tyler Technologies. Accessed February 21, 2022. https://www.tylertech.com/products/data-insights/open-data-platform .
15.CKAN - The open source data management system. ckan.org. Accessed February 21, 2022. http://ckan.org/
16.Barcellos R, Viterbo J, Miranda L, Bernardini F, Maciel C, Trevisan D. Proceedings of the 18th Annual International Conference on Digital Government Research dg.o ’17. Association for Computing Machinery; 2017. Transparency in practice: using visualization to enhance the interpretability of open data; pp. 139–148. doi:10.1145/3085228.3085294. [Google Scholar]
17.Braunschweig K, Eberius J, Thiele M, Lehner W. The state of open data. Limits of current open data platforms. Published online 2012.
18.Janssen M, Charalabidis Y, Zuiderwijk A. Benefits, Adoption Barriers and Myths of Open Data and Open Government. Information Systems Management. 2012;29(4):258–268. doi:10.1080/10580530.2012.716740. [Google Scholar]
19.Kerensky T, Walley AY. Opioid overdose prevention and naloxone rescue kits: what we know and what we don’t know. Addict Sci Clin Pract. 2017;12(1):4. doi: 10.1186/s13722-016-0068-3. doi:10.1186/s13722-016-0068-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Wu E, Villani J, Davis A, et al. Community dashboards to support data-informed decision-making in the HEALing communities study. Drug and Alcohol Dependence. 2020;217:108331. doi: 10.1016/j.drugalcdep.2020.108331. doi:10.1016/j.drugalcdep.2020.108331. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Medical Examiner Case Archive | Cook County Open Data. Accessed February 16, 2022. https://datacatalog.cookcountyil.gov/Public-Safety/Medical-Examiner-Case-Archive/cjeq-bs86 .
22.Compton WM, Thomas YF, Conway KP, Colliver JD. Developments in the epidemiology of drug use and drug use disorders. Am J Psychiatry. 2005;162(8):1494–1502. doi: 10.1176/appi.ajp.162.8.1494. doi:10.1176/appi.ajp.162.8.1494. [DOI] [PubMed] [Google Scholar]
23.Tips for improving geocoding quality—ArcGIS Pro | Documentation. Accessed March 2, 2022. https://pro.arcgis.com/en/pro-app/2.8/help/data/geocoding/tips-for-improving-geocoding-quality.htm .
24.Land Use Inventory - CMAP. Accessed March 2, 2022. https://www.cmap.illinois.gov/data/land-use/inventory .
25.Parks - Chicago Park District Park Boundaries (current) | City of Chicago | Data Portal. Chicago. Accessed March 2, 2022. https://data.cityofchicago.org/Parks-Recreation/Parks-Chicago-Park-District-Park-Boundaries-curren/ej32-qgdr .
26.Geospatial Analysis 6th Edition, 2021 update - de Smith, Goodchild, Longley and Colleagues. Accessed March 2, 2022. https://www.spatialanalysisonline.com/HTML/index.html .
27.Underlying Cause of Death 1999-2020. Accessed March 2, 2022. https://wonder.cdc.gov/wonder/help/ucd.html .
28.Winkler WE. String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage. 1990. Accessed February 21, 2022. https://eric.ed.gov/?id=ED325505.
29.CDC - Homelessness as a Public Health Law Issue - Publications by Topic - Public Health Law. Published April 23, 2020. Accessed March 2, 2022. https://www.cdc.gov/phlp/publications/topic/resources/resources-homelessness.html .
30.Sadler RC, Furr-Holden D. The epidemiology of opioid overdose in Flint and Genesee County, Michigan: Implications for public health practice and intervention. Drug Alcohol Depend. 2019;204:107560. doi: 10.1016/j.drugalcdep.2019.107560. doi:10.1016/j.drugalcdep.2019.107560. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Drug Extraction Tool. UK IPOP; 2022. Accessed March 9, 2022. https://github.com/UK-IPOP/drug-extraction/blob/9f72a2cf31b2e6b89f29925a1e1ab69fc08a169c/pkg/models/drug_info.yaml.
32.Nelson SJ, Zeng K, Kilbourne J, Powell T, Moore R. Normalized names for clinical drugs: RxNorm at 6 years. Journal of the American Medical Informatics Association. 2011;18(4):441–448. doi: 10.1136/amiajnl-2011-000116. doi:10.1136/amiajnl-2011-000116. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.UK-IPOP/Geocoding. UK IPOP; 2022. Accessed March 7, 2022. https://github.com/UK-IPOP/geocoding.
34.Egan KL, Foster SE, Knudsen AN, Lee JGL. Naloxone Availability in Retail Pharmacies and Neighborhood Inequities in Access. American Journal of Preventive Medicine. 2020;58(5):699–702. doi: 10.1016/j.amepre.2019.11.009. doi:10.1016/j.amepre.2019.11.009. [DOI] [PubMed] [Google Scholar]
35.NALOXBOX - Making Public Access Naloxone Easy. NaloxBox. Accessed March 9, 2022. https://naloxbox.org/

[r1-871] 1.Murray-Rust P. Open Data in Science. Nat Prec. Published online January 18, 2008. pp. 1–1. doi:10.1038/npre.2008.1526.1.

[r2-871] 2.Molloy JC. The Open Knowledge Foundation: Open Data Means Better Science. PLOS Biology. 2011;9(12):e1001195. doi: 10.1371/journal.pbio.1001195. doi:10.1371/journal.pbio.1001195. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r3-871] 3.Gewin V. Data sharing: An open mind on open data. Nature. 2016;529(7584):117–119. doi: 10.1038/nj7584-117a. doi:10.1038/nj7584-117a. [DOI] [PubMed] [Google Scholar]

[r4-871] 4.Kassen M. A promising phenomenon of open data: A case study of the Chicago open data project. Government Information Quarterly. 2013;30(4):508–513. doi:10.1016/j.giq.2013.05.012. [Google Scholar]

[r5-871] 5.Reichman OJ, Jones MB, Schildhauer MP. Challenges and Opportunities of Open Data in Ecology. Science. 2011;331(6018):703–705. doi: 10.1126/science.1197962. doi:10.1126/science.1197962. [DOI] [PubMed] [Google Scholar]

[r6-871] 6.Tatem AJ. WorldPop, open data for spatial demography. Sci Data. 2017;4(1):170004. doi: 10.1038/sdata.2017.4. doi:10.1038/sdata.2017.4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r7-871] 7.Uhlir PF, Schröder P. Open Data for Global Science. Data Science Journal. 2007;6:OD36–OD53. doi:10.2481/dsj.6.OD36. [Google Scholar]

[r8-871] 8.Zuiderwijk A, Janssen M. Open data policies, their implementation and impact: A framework for comparison. Government Information Quarterly. 2014;31(1):17–29. doi:10.1016/j.giq.2013.04.003. [Google Scholar]

[r9-871] 9.Surkis A, Hogle JA, DiazGranados D, et al. Classifying publications from the clinical and translational science award program along the translational research spectrum: a machine learning approach. Journal of Translational Medicine. 2016;14(1):235. doi: 10.1186/s12967-016-0992-8. doi:10.1186/s12967-016-0992-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r10-871] 10.Vukotich CJ. Challenges of T3 and T4 Translational Research. Journal of Research Practice. 2016;12(2) Accessed February 16, 2022. https://eric.ed.gov/?id=EJ1121185. [Google Scholar]

[r11-871] 11.Pathfinder. Accessed February 16, 2022. https://catalyst.harvard.edu/pathfinder/

[r12-871] 12.Weerakkody V, Irani Z, Kapoor K, Sivarajah U, Dwivedi YK. Open data and its usability: an empirical view from the Citizen’s perspective. Inf Syst Front. 2017;19(2):285–300. doi:10.1007/s10796-016-9679-1. [Google Scholar]

[r13-871] 13.Zuiderwijk A, Janssen M, Davis C. Innovation with open data: Essential elements of open data ecosystems. Information Polity. 2014;19(1-2):17–33. doi:10.3233/IP-140329. [Google Scholar]

[r14-871] 14.Open Data Platform | Tyler Technologies. Accessed February 21, 2022. https://www.tylertech.com/products/data-insights/open-data-platform .

[r15-871] 15.CKAN - The open source data management system. ckan.org. Accessed February 21, 2022. http://ckan.org/

[r16-871] 16.Barcellos R, Viterbo J, Miranda L, Bernardini F, Maciel C, Trevisan D. Proceedings of the 18th Annual International Conference on Digital Government Research dg.o ’17. Association for Computing Machinery; 2017. Transparency in practice: using visualization to enhance the interpretability of open data; pp. 139–148. doi:10.1145/3085228.3085294. [Google Scholar]

[r17-871] 17.Braunschweig K, Eberius J, Thiele M, Lehner W. The state of open data. Limits of current open data platforms. Published online 2012.

[r18-871] 18.Janssen M, Charalabidis Y, Zuiderwijk A. Benefits, Adoption Barriers and Myths of Open Data and Open Government. Information Systems Management. 2012;29(4):258–268. doi:10.1080/10580530.2012.716740. [Google Scholar]

[r19-871] 19.Kerensky T, Walley AY. Opioid overdose prevention and naloxone rescue kits: what we know and what we don’t know. Addict Sci Clin Pract. 2017;12(1):4. doi: 10.1186/s13722-016-0068-3. doi:10.1186/s13722-016-0068-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r20-871] 20.Wu E, Villani J, Davis A, et al. Community dashboards to support data-informed decision-making in the HEALing communities study. Drug and Alcohol Dependence. 2020;217:108331. doi: 10.1016/j.drugalcdep.2020.108331. doi:10.1016/j.drugalcdep.2020.108331. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r21-871] 21.Medical Examiner Case Archive | Cook County Open Data. Accessed February 16, 2022. https://datacatalog.cookcountyil.gov/Public-Safety/Medical-Examiner-Case-Archive/cjeq-bs86 .

[r22-871] 22.Compton WM, Thomas YF, Conway KP, Colliver JD. Developments in the epidemiology of drug use and drug use disorders. Am J Psychiatry. 2005;162(8):1494–1502. doi: 10.1176/appi.ajp.162.8.1494. doi:10.1176/appi.ajp.162.8.1494. [DOI] [PubMed] [Google Scholar]

[r23-871] 23.Tips for improving geocoding quality—ArcGIS Pro | Documentation. Accessed March 2, 2022. https://pro.arcgis.com/en/pro-app/2.8/help/data/geocoding/tips-for-improving-geocoding-quality.htm .

[r24-871] 24.Land Use Inventory - CMAP. Accessed March 2, 2022. https://www.cmap.illinois.gov/data/land-use/inventory .

[r25-871] 25.Parks - Chicago Park District Park Boundaries (current) | City of Chicago | Data Portal. Chicago. Accessed March 2, 2022. https://data.cityofchicago.org/Parks-Recreation/Parks-Chicago-Park-District-Park-Boundaries-curren/ej32-qgdr .

[r26-871] 26.Geospatial Analysis 6th Edition, 2021 update - de Smith, Goodchild, Longley and Colleagues. Accessed March 2, 2022. https://www.spatialanalysisonline.com/HTML/index.html .

[r27-871] 27.Underlying Cause of Death 1999-2020. Accessed March 2, 2022. https://wonder.cdc.gov/wonder/help/ucd.html .

[r28-871] 28.Winkler WE. String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage. 1990. Accessed February 21, 2022. https://eric.ed.gov/?id=ED325505.

[r29-871] 29.CDC - Homelessness as a Public Health Law Issue - Publications by Topic - Public Health Law. Published April 23, 2020. Accessed March 2, 2022. https://www.cdc.gov/phlp/publications/topic/resources/resources-homelessness.html .

[r30-871] 30.Sadler RC, Furr-Holden D. The epidemiology of opioid overdose in Flint and Genesee County, Michigan: Implications for public health practice and intervention. Drug Alcohol Depend. 2019;204:107560. doi: 10.1016/j.drugalcdep.2019.107560. doi:10.1016/j.drugalcdep.2019.107560. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r31-871] 31.Drug Extraction Tool. UK IPOP; 2022. Accessed March 9, 2022. https://github.com/UK-IPOP/drug-extraction/blob/9f72a2cf31b2e6b89f29925a1e1ab69fc08a169c/pkg/models/drug_info.yaml.

[r32-871] 32.Nelson SJ, Zeng K, Kilbourne J, Powell T, Moore R. Normalized names for clinical drugs: RxNorm at 6 years. Journal of the American Medical Informatics Association. 2011;18(4):441–448. doi: 10.1136/amiajnl-2011-000116. doi:10.1136/amiajnl-2011-000116. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r33-871] 33.UK-IPOP/Geocoding. UK IPOP; 2022. Accessed March 7, 2022. https://github.com/UK-IPOP/geocoding.

[r34-871] 34.Egan KL, Foster SE, Knudsen AN, Lee JGL. Naloxone Availability in Retail Pharmacies and Neighborhood Inequities in Access. American Journal of Preventive Medicine. 2020;58(5):699–702. doi: 10.1016/j.amepre.2019.11.009. doi:10.1016/j.amepre.2019.11.009. [DOI] [PubMed] [Google Scholar]

[r35-871] 35.NALOXBOX - Making Public Access Naloxone Easy. NaloxBox. Accessed March 9, 2022. https://naloxbox.org/

PERMALINK

geoPIPE: Geospatial Pipeline for Enhancing Open Data for Substance Use Disorders Research

Daniel R Harris, PhD

Nick Anthony, MS

Mojde Mir, MS, MPH

Chris Delcher, PhD

Abstract

Introduction

Methods

Figure 1.

Figure 2.

Results

Table 1:

Figure 3.

Table 2:

Figure 4.

Discussion

Figure 5:

Conclusion

Acknowledgement

Figures & Table

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

geoPIPE: Geospatial Pipeline for Enhancing Open Data for Substance Use Disorders Research

Daniel R Harris, PhD

Nick Anthony, MS

Mojde Mir, MS, MPH

Chris Delcher, PhD

Abstract

Introduction

Methods

Figure 1.

Figure 2.

Results

Table 1:

Figure 3.

Table 2:

Figure 4.

Discussion

Figure 5:

Conclusion

Acknowledgement

Figures & Table

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases