Abstract
Background:
Electronic health records (EHR) commonly contain patient addresses that provide valuable data for geocoding and spatial analysis, enabling more comprehensive descriptions of individual patients for clinical purposes. Despite the widespread use of EHR in clinical decision support and interventions, no systematic review has examined the extent to which spatial analysis is used to characterize patient phenotypes.
Objective:
This study reviews advanced spatial analyses that employed individual-level health data from EHR within the US to characterize patient phenotypes.
Methods:
We systematically evaluated English-language peer-reviewed articles from PubMed/MEDLINE, Scopus, Web of Science, and Google Scholar databases from inception to August 20, 2023, without imposing constraints on time, study design, or specific health domains.
Results:
Only 49 articles met the eligibility criteria. These articles utilized diverse spatial methods, with a predominant focus on clustering techniques, while spatiotemporal analysis (frequentist and Bayesian) and modeling were relatively underexplored. A noteworthy surge (n = 42, 85.7%) in publications was observed post-2017. The publications investigated a variety of adult and pediatric clinical areas, including infectious disease, endocrinology, and cardiology, using phenotypes defined over a range of data domains, such as demographics, diagnoses, and visits. The primary health outcomes investigated were asthma, hypertension, and diabetes. Notably, patient phenotypes involving genomics, imaging, and notes were rarely utilized.
Conclusions:
This review underscores the growing interest in spatial analysis of EHR-derived data and highlights knowledge gaps in clinical health, phenotype domains, and spatial methodologies. Additionally, this review proposes guidelines for harnessing the potential of spatial analysis to enhance the context of individual patients for future clinical decision support.
Keywords: clinical phenotypes, electronic health records, geocoding, geographic information systems, patient phenotypes, spatial analysis
Introduction
Electronic health records (EHR) have significantly enriched clinical decision support by providing relatively cost-effective, time-efficient, and convenient sources of a large population of patient records [1, 2]. Because EHR often contain patient addresses, spatial analysis can enable value addition via high-resolution geocoding. The simplest of such analyses may be mapping, which can promote a better understanding of health disparities. Further, patient geocoding can link external data such as environmental, demographic, and socio-economic factors for more refined patient phenotyping and a more profound understanding of patient exposures for targeted interventions [3].
The possibilities for applying spatial analysis of individual-level EHR-derived data are beyond geocoding, basic mapping, or external data linkage. For instance, spatial network analysis examines proximity to the sources of pollution [4], measures accessibility to healthcare facilities [5], and optimizes resource allocations to mitigate health disparities [6]. Spatial clustering pinpoints statistically significant spatial and spatiotemporal hotspots and cold spots [7], especially when considering longitudinal EHR data. Moreover, spatial and spatiotemporal modeling can identify localized patterns, trends, and relationships within a specific region [8, 9]; however, ignoring spatial dependence in modeling can bias parameter estimates [9–11]. Identifying underserved communities through spatial analysis can enhance clinical decision support to implement targeted interventions such as screening, vaccination, or health education campaigns.
While spatial methodologies have the potential to better describe the context of individual patients in biomedical studies, there is a need for improvement in their utilization to derive meaningful insights. To accurately address medical conditions, identify a disease in a patient, and scale that to cohorts of patients, phenotyping is required [12]. Phenotypes are a combination of observable traits, symptoms, and characteristics. They can contain inclusion and exclusion criteria (e.g., diagnoses, procedures, laboratory reports, and medications) and can be used to recruit patients who fit the necessary criteria for clinical trials.
A prior systematic review employed spatially linked EHR data to investigate the effects of social, physical, and built environments on health outcomes [13]. Another study highlighted the need to integrate spatial data related to individual patients into healthcare decision-making and practice [14]. Nonetheless, this is the first comprehensive study that systematically reviews the US-based studies that used spatial analysis for analyzing EHR-derived data in characterizing patient phenotypes for clinical decision support and interventions. This review collates and synthesizes existing literature that employed individual-level health data from EHR in conjunction with advanced spatial analyses and patient phenotyping. Thus, the main objectives of this review are:
To evaluate the degree to which advanced spatial methods are currently being utilized with individual-level data sourced from EHR in the US;
To identify areas of spatial analyses most applicable to biomedical studies;
To categorize publications concerning their biomedical and clinical areas and the specific patient phenotypes they target.
To highlight knowledge gaps and propose guidelines for harnessing the potential of spatial analysis to enhance the context of individual-level data sourced from EHR for future biomedical studies
Methods
This systematic review was performed using the protocols outlined by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) to identify the articles that satisfy the eligibility criteria for subsequent data extraction and synthesis.
Data Source
A comprehensive search for peer-reviewed articles was carried out using abstracts and titles screening within PubMed/MEDLINE, Scopus, and Web of Science databases using the search terms in Table 1. The search was conducted on August 29, 2023, without limitations on time, study design, or specific health domains.
Table 1.
Articles search strategy key terms.
| Key Terms |
|---|
| (“Geospatial*” OR “Geospatial*” OR “Spatio-Temporal” OR “Spatial Temporal” OR “Space-Time” OR “Space Time” OR “Spatiotemporal” OR “Geocod*” Or “ Spatial Autocorrelation” OR “Spatial Interpolation”‘ OR “Spatial Epidemiology” OR “Spatial Data” OR “Spatial Modeling” Or “Spatial Modelling” OR “Spatial Mapping” OR “Geographic Mapping” OR “Georeferenc*” OR “Spatial Analys*” OR “Spatial Inequalit*” OR “SpatialDisparit*” OR “Spatial Dependenc*” OR “Spatial Access*” OR “Geographical Mapping” OR “Geographical Visualization” OR “Geographic Visualization” OR “Geovisualization” OR “Geographical Information System*” OR “Geographic Information System*” OR “Geofencing” OR “Geographical Distribution*” Or “Geographic Distribution*” OR “Spatial Statistic*” OR “Spatial Bayesian” OR “Spatial Hotspot*” OR “Spatial Cluster*” OR “Geographic Cluster*” OR “Geographic Hotspot*” OR “Remote Sensing” OR “Global Positioning System” OR “Spatial Pattern*” OR “Spatial Data Mining” OR “Spatial Variabilit*” Or “Spatial Heterogeneit*” OR “Geostatistic*” OR “Spatial Covariance” OR “Spatial Regression” OR “Spatial Uncertaint*” OR “Spatial Point Pattern*” OR “Kriging” OR “Cartography” OR “Spatial Decision Support System*” OR “OpenStreetMap” OR “Location-Based Services” OR “Spatial Quer*” OR “GIS” OR “Web GIS” OR “Satellite Imager*” OR “ArcGIS” OR “QGIS” OR “Risk Mapping”) |
| (“EHR” OR “EMR” OR “EPR” OR “Electronic Health Record*” OR “Electronic Medical Record*” OR “Electronic Patient Record*” OR “EDW” OR “Enterprise Data Warehouse” OR “RDW” OR “Research Data Warehouse”) |
Note: The selected articles that utilized spatial analysis of EHR data were manually excluded if they lacked patient phenotype characteristics or were not conducted based on the US data.
Search Strategy
The initial search comprised two main categories. The first category included a broad set of key terms related to spatial analysis. The second category employed the key terms associated with EHR. Henceforth, our reference to EHR will also encompass electronic medical records (EMR), electronic patient records (EPR), enterprise data warehouses (EDW), and research data warehouses (RDW). The Boolean operator (AND) was applied to synthesize the two categories.
Study Selection
The retrieved abstracts and titles were imported into Covidence systematic review software, where duplicate records between original databases are automatically eliminated. Two reviewers (AM and BH) independently assessed the eligibility of the articles based on the following inclusion and exclusion criteria:
The articles were eligible for primary inclusion if they (1) were composed in English, (2) were original peer-reviewed articles, (3) used individual-level patient data derived from EHR/ EMR/ EPR/ EDW/ RDW, (4) incorporated at least one form of spatial methods. Conversely, the articles were excluded if they (1) were not peer-reviewed (e.g., letters, editorials, reviews, case reports, abstracts, and grey literature), (2) solely geocoded addresses or generated basic visualizations (e.g., dot map and choropleth map) without any spatial analysis, (3) not based on US EHR data.
The reviewers (AM and BH) independently reviewed the full texts of all remaining articles. The articles also were excluded if they lacked phenotype characteristics. Further, we manually checked the references for all the selected articles for possible inclusion. We also searched the first 20 pages of Google Scholar for potential inclusions. A third reviewer (AVA) was consulted to break ties.
Data Extraction
Upon identifying articles that satisfied all inclusion criteria, two reviewers (AM and BH) extracted the following items for each article: title, publication year, country and region, sample size, study period, spatial methodologies, and key findings from the spatial methods. Moreover, articles were assessed to identify clinical domains (including primary and secondary when applicable), health conditions or problems, and themes (including social determinants of health (SDOH), environmental factors, ecological aspects, climate, microbiome, genomics, and clinical phenotypic characteristics). Previous publications have emphasized the importance of data domain sources in phenotyping, underscoring the need for validating the created phenotype [15] and using multiple data sources. Thus, in cases where the included publications did not provide details of data sources but instead referenced previously published works, referenced publications were reviewed. We also documented the number of organizations contributing data. Additionally, we cataloged the types of EHR that served as the sources.
Narrative Synthesis
Following data extraction, the articles were categorized into the following spatial methodology classifications: descriptive, clustering, modeling (frequentist), spatiotemporal (frequentist), and Bayesian. The phenotype characteristics were extracted and recorded as free text. It should be noted that the categories were not mutually exclusive.
The quality appraisal of the studies was not feasible due to the substantial heterogeneity in spatial methodologies and health domains. The geospatial distribution of the included studies was visualized using ArcGIS Pro software 3.0 (ESRI, Redlands, CA, US).
Results
Study Selection
The initial search yielded 1,758 references. After removing duplicate records, we identified 952 articles for abstract and title screening, from which 375 were selected for full-text review. Out of these, 322 articles were excluded as they only contained geocoding or basic mapping without any spatial analysis. Additionally, 14 articles were omitted due to the absence of patient phenotype characteristics (n = 2) or were not based on US data (n = 13). We further manually searched references and Google Scholar and found 11 new articles that met the eligibility criteria. Therefore, 49 articles that fulfilled the inclusion criteria were retained for data extraction and synthesis. Figure 1 depicts the PRISMA flowchart for the study selection process.
Figure 1.
PRISMA study selection flowchart.
Temporal and geographic distribution of studies
While no time restrictions were imposed, a limited number of articles (n = 7, 14.3%) were published prior to 2017. The earliest article included in this study was published in 2011, and the publication frequency has experienced a significant upsurge since 2017 (n = 42, 85.7%). There was only one article [16] at the national level. General characteristics of the included articles are presented in Table 2. Most studies were concentrated in North Carolina (n = 8, 16.3%), Pennsylvania (n = 6, 12.2%), California (n = 6, 12.2%), and Illinois (n = 4, 8.2%). Figure 2 illustrates the geospatial distribution of articles at the state level in the US.
Table 2.
General characteristics of the included studies
| No. | Author | Year | Region | Sample Size | Study Period |
|---|---|---|---|---|---|
| 1 | Ali et al. [7] | 2019 | Atlanta | 4,613 | 2002 – 2010 |
| 2 | Beck et al. [17] | 2019 | Cincinnati | 24,428 | 2011 – 2016 |
| 3 | Bravo et al. [18] | 2018 | Durham | 147,000 | 2007 – 2011 |
| 4 | Bravo et al. [19] | 2019 | Durham | 147,351 | 2007 – 2011 |
| 5 | Bravo et al. [20] | 2019 | Durham | 41,203 | 2007 – 2011 |
| 6 | Brooks et al. [21] | 2020 | Delaware | 5,421 | 2020 |
| 7 | Carey et al. [22] | 2021 | Utah | 366 | 2006 – 2015 |
| 8 | Casey et al. [23] | 2016 | Pennsylvania | 20,569 | 2006 – 2013 |
| 9 | Chang et al. [8] | 2015 | Wisconsin | 103,690 | 2007 – 2009 |
| 10 | Cobert et al. [24] | 2020 | Durham | 10,352 | 2013 – 2018 |
| 11 | Davidson et al. [25] | 2018 | Denver | 21,578 | 2011 – 2012 |
| 12 | DeMass et al. [26] | 2023 | South Carolina | 2,195 | 2019 – 2020 |
| 13 | Epstein et al. [27] | 2014 | Los Angeles | 5,390 | 2007 – 2011 |
| 14 | Gaudio et al. [28] | 2022 | Tennessee | 2,240 | 2015 – 2021 |
| 15 | Georgantopoulos et al. [29] | 2020 | South Carolina | 3,736 | 1999 – 2015 |
| 16 | Ghazi et al. [30] | 2021 | Twin Cities, Minnesota | 20,289 | 2012 – 2019 |
| 17 | Grag et al. [31] | 2023 | Chicago | 777,994 | 2007 – 2012 |
| 18 | Grunwell et al. [32] | 2022 | Georgia | 1,403 | 2015 – 2020 |
| 19 | Hanna-Attisha et al. [33] | 2016 | Flint, Michigan | 1,473 | 2013 – 2015 |
| 20 | Immergluck et al. [34] | 2019 | Atlanta | 13,938 | 2002 – 2010 |
| 21 | Jilcott et al. [35] | 2011 | Eastern North Carolina | 744 | 2007 – 2008 |
| 22 | Kane et al. [36] | 2023 | Kansas and Missouri | 2,427 | 2011 – 2020 |
| 23 | Kersten et al. [37] | 2018 | San Francisco | 47,175 | 2007 – 2011 |
| 24 | Lantos et al. [38] | 2018 | North Carolina | 3,527 | NA |
| 25 | Lantos et al. [39] | 2017 | Durham | 3,527 | <=2015 |
| 26 | Le-Scherban et al. [40] | 2019 | Philadelphia | 3,778 | 2016 |
| 27 | Lieu et al. [41] | 2014 | Northern California | 154,424 | 2000 – 2011 |
| 28 | Lipner et al. [42] | 2017 | Colorado | 479 | 2008 – 2015 |
| 29 | Liu et al. [43] | 2021 | Cincinnati and Houston | 88,013 | 2011 – 2016 |
| 30 | Mayne et al. [44] | 2019 | Chicago | 14,309 | 2015 – 2017 |
| 31 | Mayne et al. [45] | 2018 | Chicago | 4,748 | 2009 – 2013 |
| 32 | Oyana et al. [46] | 2017 | Memphis | 28,793 | 2005 – 2015 |
| 33 | Patterson et al. [16] | 2017 | Nationwide | ~100 million | 2003 – 2010 |
| 34 | Pearson et al. [47] | 2019 | Philadelphia | 642 | 2000 – 2017 |
| 35 | Samuels et al. [48] | 2022 | New Haven | 6,366 | 2013 – 2017 |
| 36 | Schwartz et al. [49] | 2011 | Pennsylvania | 47,769 | 2009 – 2010 |
| 37 | Sharif-Askary et al. [50] | 2018 | North Carolina | 558 | 1998 – 2013 |
| 38 | Sidell et al. [51] | 2022 | Southern California | 446,440 | 2020 – 2021 |
| 39 | Siegel et al. [52] | 2022 | Delaware | 3,449 | 2012 – 2020 |
| 40 | Soares et al. [53] | 2017 | Pennsylvania | 2,049 | 2011 – 2012 |
| 41 | Sun et al. [54] | 2022 | Southern California | 395,927 | 2008 – 2018 |
| 41 | Tabano et al. [55] | 2017 | Denver | 31,275 | 2009 – 2011 |
| 43 | Wakefield et al. [56] | 2020 | Memphis | 3,754 | 2015 – 2017 |
| 44 | Wilson et al. [57] | 2022 | Chicago | 39,211 | 2014 – 2016 |
| 45 | Winckler et al. [58] | 2023 | Southern California | 7,896 | 2017 – 2019 |
| 46 | Xie et al. [3] | 2017 | Philadelphia | 27,604 | 2011 – 2014 |
| 47 | Xie et al. [59] | 2023 | Washington | 242,637 | 2015 – 2019 |
| 48 | Zhan et al. [60] | 2021 | Central Texas | 21,923 | 2019 |
| 49 | Zhao et al. [61] | 2021 | Wisconsin | 43,752 | 2007 – 2012 |
Figure 2.
Geospatial distribution of the included studies at the state level in the US
Spatial Methodologies
Most studies focused on frequentist methods compared to the Bayesian methods. Among frequentist methods, the most prevalent category was clustering (n = 29), followed by descriptive (n = 12), modeling (n = 6), and spatiotemporal analyses (n = 2).
Descriptive Analyses
Descriptive analyses were categorized into four groups: spatial sampling (n = 2), spatial overlay (n = 2), proximity analysis (n = 4), and spatial interpolation (n = 4).
Spatial Sampling
A two-standard deviation ellipse method is employed to optimize spatial sampling density. This ellipse contains almost 95% of the locations of patients and is used to ensure that the collected samples reflect the underlying spatial pattern in data, particularly when resources are limited [62]. [38, 39] adopted this approach when sampling women who underwent cytomegalovirus antibody testing during pregnancy, especially in peripheral areas with limited subject representation.
Spatial Overlay
Spatial overlay integrates various spatial data sources, often maps, to represent their shared features. [56] overlaid the map of major radiation treatment interruptions based on race onto the map of median household income. Their analysis implied that regions with higher income levels experienced lower rates of radiation treatment interruption. [48] spatially joined patient addresses to the nearest city parcels and computed an estimate of the incidence of emergency department visits for asthma for each parcel.
Proximity Analysis
Proximity analysis includes measuring distances between geographic features to identify nearby features within a defined distance or buffer zone to uncover proximity patterns [63]. [57] created temporal and spatial buffers to assess the correlation between individual exposure to violent crime and blood pressure. [49] evaluated the associations between environmental factors and body mass index (BMI) within a 0.5-mile network buffer from the place of residence. [23] investigated the associations between prenatal residential greenness and birth outcomes within 250m and 1,250m buffers. Utilizing a GIS service area network analysis, [35] examined BMI percentile and proximity to fast-food and pizza establishments among adolescents within 0.25 mile Euclidean and network buffer zones.
Spatial Interpolation
Ordinary Kriging is one of the most widely used spatial interpolation techniques that leverages the spatial autocorrelation structure of observed locations to estimate values at unmeasured locations [64]. [33] applied ordinary Kriging with a spherical semi-variogram model based on observations of the children’s elevated blood lead level (BLL) geocoded to the home address to visualize BLL variations before and after water source changes. [44] interpolated the levels of neighborhood physical disorder based on an exponential variogram. [16] demonstrated spatial variations for the incidence rates of each ICD-9 diagnostic code based on an exponential variogram. [54] estimated monthly average concentrations of ne particulate matter to investigate the associations between air pollution exposure during pregnancy and gestational diabetes mellitus (GDM).
Spatial Clustering
Spatial clustering techniques assess whether health outcomes are random, uniform, or clustered and pinpoint the locations of clusters [65]. Spatial clustering was the most widely used category (n = 29) among all studied categories. Moran’s I clustering and cluster detection were the most frequent techniques (n = 10), followed by kernel/point density estimation (n = 5), spatial scan statistics (n = 4), and Getis-Ord Gi* statistics (n = 4).
Kernel/point Density Estimation
Kernel density estimation (KDE) generates a smooth surface to visualize areas of the most significant spatial intensity by calculating a distance-weighted count of events within a specified radius per unit area [66]. Several studies adopted KDE to analyze patterns, including cholera hospitalization [59], comparison of the spatial intensity of chronic kidney disease (CKD) with non-CKD patients [30], and comparison of the spatial intensity of breast cancer and non-breast cancer [52]. Using the point density function, [17] pinpointed hotspots of inpatient bed-day rates within a 2-mile radius of a medical center and [36] estimated the number of participants per square mile.
Global and Local Moran’s I
Global Moran’s I (GMI) evaluates the overall pattern for spatial autocorrelation [67] by inferring if a variable is spatially clustered or over-dispersed vs. being randomly distributed under the null hypothesis [67]. Local Moran’s I, often called LISA, is used to locate statistically significant clusters, including hotspots, cold spots, and outliers [68]. GMI has been adopted to analyze spatial clustering of health outcomes, including GDM [54], day-of-surgery cancellation [43], obesity [55], and COVID-19 [51]. All exhibited clustered patterns. [59] analyzed three groups: depression, obesity, and comorbid cases, confirmed clustering for all outcomes, and identified spatial clusters and outliers. [47] found random distributions for dermatomyositis (DM) and subtypes, classic DM (CDM), and clinically amyopathic DM (CADM). Meanwhile, [25] pinpointed clusters with higher or lower depression prevalence, and [58] identified a cluster of low utilization of acute pediatric mental health interventions in less-densely populated rural border areas.
GMI and (semi)variograms can also identify spatial autocorrelation in model residuals. If detected, the models are adjusted accordingly to avoid biased estimates. For example, [42] modeled nontuberculous mycobacteria (NTM) disease, shifting the use from a non-spatial Bayesian model to a spatial model when spatial autocorrelation was found in residuals. Similarly, [29] incorporated spatial random effects into a prostate cancer model due to significant autocorrelation in the residuals. [50] used variograms to assess spatial dependency in cleft lip and/or palate, leading to a geostatistical model over standard logistic regression. Conversely, [23] found no spatial autocorrelation in non-spatial model residuals.
The bivariate GMI quantifies the overall spatial dependence between two distinct variables (positive value indicates high values of one variable are surrounded by high values of the other or low values are surrounded by low values, while negative value implies high values of one variable are surrounded by low values of the other) [69]. Bivariate LISA assesses the relationship at the local level. [47] employed bivariate GMI for the prevalence of DM, CDM, and CADM with airborne toxics but found no overall spatial dependencies. However, bivariate LISA identified local dependencies at the zip code level. [31] applied bivariate GMI and found significant overall associations between longer (average) distances to the nearest supermarket and higher incidence of diabetes, and bivariate LISA identified significant “high-high” relationships at the zip code level. [28] utilized bivariate LISA and found no local association between radiation therapy interruption and social vulnerability index at the zip code level.
Getis-Ord Gi*
The Getis-Ord Gi* statistic identifies high or low-value clusters (hotspots and cold spots) by assessing deviations of health outcomes at locations from the average within a defined neighborhood [70]. [40] measured racial residential segregation by examining the deviations in the (proportion of) African American residents in each census tract from the mean of neighboring tracts. Similarly, [45] measured the racial residential segregation for the percentage of non-Hispanic Black residents. [7] identified significant community-onset methicillin-resistant Staphylococcus aureus (CO-MRSA) hotspots with distinct patterns between cases and controls. [37] detected the high and low values clusters for the child opportunity index and median household income.
Spatial Scan Statistics
The spatial scan statistics technique identifies high and low-risk clusters and estimates their relative risks [71]. It also can incorporate covariates to characterize underlying patterns [72]. [42] found that people living in zip codes within the primary cluster had an almost 2.5 times greater risk of NTM disease. [41] identified clusters of under-immunization and vaccine refusal among children, with rates ranging from 18% to 23% inside the clusters compared to 11% outside.
The technique can also pinpoint cold spots. [21] identified areas with significantly lower COVID-19 testing than expected, indicating a need for interventions. [60] observed significantly low rates of up-to-date colorectal cancer screening.
Spatial Modeling (Frequentist)
Among the included articles, the generalized additive models (GAMs) emerged as the most frequently employed spatial models. GAMs can account for spatial autocorrelation by incorporating smooth functions (such as thin-plate regression) of spatial coordinates [73], allowing the estimate of geographic variation with or without covariate adjustments. GAMs identified spatial variabilities in asthma prevalence [3, 8] and cytomegalovirus [38, 39], although such variations often diminished when adjusted for demographic factors such as race and age. Among less commonly used geospatial models were generalized linear mixed effects [51] and spatial error [43] models.
Spatiotemporal Analysis
Only two studies explored spatiotemporal patterns, and no spatiotemporal modeling was conducted. [46] employed space-time scan statistics to study the spatiotemporal patterns of childhood asthma and found a significant frequency increase (2009–2013) and a rising trend from 4 to 16 per 1,000 children (2005–2015). [7] employed the space-time cube tool and emerging hotspot analysis to analyze the spatial-temporal trends and evolving patterns of CO-MRSA from 2002 to 2010. They identified several types of space-time hotspots of CO-MRSA, including new, consecutive, intensifying, sporadic, and oscillating hotspots.
Bayesian Analysis
The articles employing Bayesian methods were categorized into Empirical Bayes smoothing (n = 5) and Bayesian modeling (n = 6).
The Empirical Bayes smoothing was employed in [40, 43, 55, 59] to stabilize estimated rates in areas with limited data points by borrowing information from the overall population [74]. [61] employed non-parametric kernel smoothing to estimate the prevalence of childhood obesity in areas with sparse observations (n<20 individuals).
Bayesian modeling can account for spatial and temporal dependencies and quantify uncertainty by specifying prior distributions [75]. Among the articles, the conditional autoregressive (CAR) prior emerged as the most used, with two variants: intrinsic and multivariate CAR. Intrinsic CAR was used to assess the spatial variations in diabetes in relationship with racial isolation [18], hypertension related to racial isolation [19], and type 2 diabetes mellitus with the built environment [20]. Multivariate CAR was employed to identify areas with higher or lower-than-expected prostate cancer while controlling for risk factors [29]. Moreover, hierarchical Bayesian that can incorporate hierarchical structures for modeling interactions in data with multiple levels [76] was used to investigate spatial distributions of patients admitted for drug-related reasons concerning the area deprivation index [24]. Bayesian negative binomial hurdle models that can account for excessive zeros and overdispersion were used by [26] to examine spatial variation between patient responses to the questions concerning unhealthy home environments and the mean number of emergency department visits after screening.
Phenotyping
Clinical Domain Characteristics and Themes
The largest category of articles was classified under the infectious disease (n = 7), endocrinology (n = 7) and oncology (n = 6) domains. Additionally, 19 articles had a pediatric domain or focus, as noted with an additional column in Table 3. Maternal and newborn care was classified as its own domain (n = 8), but it overlapped with other domains such as nephrology, endocrinology, and infectious disease.
Table 3.
Clinical domains and condition/problem of focus for each publication (row).
| Clinical Domain(s)* | Condition(s)/Problem of focus § | Reference | ||
|---|---|---|---|---|
| Primary | Secondary | Pediatric | ||
| Pediatric | ü | Day-of-surgery cancellation (DoSC) | ||
| ü | Elevated blood lead levels (EBLL) | [33] | ||
| ü | Disparities in inpatient bed-day rates | [17] | ||
| Maternal & Newborn Care | ü | Under immunization; vaccine refusal | [41] | |
| Preterm birth; small for gestational age; hypertensive disorder of pregnancy | [44] | |||
| Preterm birth; small for gestational age; low birth weight; low Apgar score | [23] | |||
| Hypertension | [57] | |||
| [19] | ||||
| Endocrinology | Hypertension; diabetes | [40] | ||
| Endocrine; Nephrology | Hypertension; diabetes; chronic kidney disease (CKD) | [31] | ||
| Maternal & Newborn Care | Hypertension, disorder of pregnancy | [45] | ||
| Endocrinology | Gestational diabetes mellitus (GDM) | [54] | ||
| Diabetes mellitus, type 2 (T2DM) | [18] | |||
| [20] | ||||
| Obesity | [55] | |||
| ü | [49] | |||
| ü | [35] | |||
| ü | [61] | |||
| Psychiatry | Obesity; depression | [59] | ||
| Psychiatry | ü | Acute pediatric mental health interventions or services | [58] | |
| Depression | [25] | |||
| ü | Telemedicine use in developmental-behavioral pediatrics | [53] | ||
| Emergency Medicine | Drug overdoses | [24] | ||
| Emergency Medicine | ü | Disparities in pediatric acute care visit frequency and diagnoses | [37] | |
| ü | Disparities in use of pediatric intensive care units (PICU) | [27] | ||
| Emergency department use | [26] | |||
| Pulmonary | Emergency Medicine | Asthma, emergency department asthma visits | [48] | |
| ü | Asthma | [32] | ||
| ü | [46] | |||
| [3] | ||||
| [8] | ||||
| Infectious Disease | Pulmonary | Coccidioidomycosis | [22] | |
| ü | Community-associated (CA)-MRSA | [34] | ||
| ü | Community-onset (CO)-MRSA | [7] | ||
| COVID-19 | [21] | |||
| [51] | ||||
| Maternal & Newborn Care | ü | Cytomegalovirus (CMV) | [39] | |
| ü | [38] | |||
| Nontuberculous mycobacterial Infection | [42] | |||
| Oncology | Radiation treatment interruption (RTI) | [56] | ||
| [28] | ||||
| Colorectal cancer screening | [60] | |||
| Prostate cancer | [29] | |||
| Triple-negative breast cancer (TNBC) | [52] | |||
| ü | Disparities in Genomic Answers for Kids (GA4K) | [36] | ||
| Maxillofacial | ü | Cleft lip/palate | [50] | |
| Nephrology | Chronic kidney disease | [30] | ||
| Rheumatology | Neurology; Derm. | Dermatomyositis | [47] | |
| All domains | Geospatial variation of disease incidence | [16] | ||
Publications with more than one clinical domain and those with a pediatric component are noted as such.
Condition/Problem of focus column displays the general condition of the article and may not directly correspond to the phenotype.
The relationship between the clinical domains and the “conditions/problems of focus” in each article was examined (Table 3). In some cases, direct correspondence was observed, while in other instances, the “condition/problems of focus” differed from the phenotype of the patient cohort. In many articles, one or more overlapping domains were observed (e.g., rheumatology, neurology, and dermatology for the study of dermatomyositis). Asthma (n = 5), hypertension (n = 5), and diabetes (n = 4) were studied most frequently. Three articles did not focus on any health condition but rather on examining disparities in either a data source or a specific domain or cohort (e.g., disparities in the use of pediatric intensive care units).
Every article was attributed to at least one prominent theme, with the possibility of multiple themes. SDOH themes were prevalent in many articles. To organize and present this information, we utilized the domains defined by the Healthy People 2030 framework [77]. There are five domains in the SDOH framework (Table 1), with the corresponding counts of these domains being seen as themes of the articles. Most articles had one or more SDOH themes (n = 42). Many articles focused either on all the domains or SDOH holistically without particular focus on any specific domain (n = 32). However, some articles contained prominent themes that were not directly related to SDOH, which were phenotypic features (n = 4), followed by environmental (n = 3), and ecological (n = 2), with climate, genomics, and microbiome, each contributing one article.
Clinical Phenotype Features
For each publication, clinical phenotype definitions were extracted (Supplementary Appendix S1). In almost all studies, phenotype definitions included demographic details such as patient age, race, and gender, along with some diagnostic characteristics (e.g., asthma diagnosis). Only a limited number of phenotypes were observed to be validated (n = 8). The most frequently observed method for phenotype validation was a manual chart review of all matches or a sample of matched charts. None of the articles with chart review as a validation method shared information on the match rate. Additionally, only two articles [20, 59] were observed to utilize validated eMERGE Network computable phenotypes from the Phenotype Knowledgebase (PheKB) [78–80].
Discussion
This systematic review is the first comprehensive investigation of spatial methodologies within EHR-derived data in the US. Spatial clustering and descriptive analysis were the most used methods, while space-time modeling, either frequentist or Bayesian, remained under-explored. The diverse use of spatial analysis for EHR-derived data in different health domains highlights the potential to incorporate spatial methods to enhance the context of individual patients for future biomedical research. We found limited use of EHR-derived data for spatial analysis, probably due to the challenge of safeguarding patient privacy. Address data, crucial for spatial analysis, is highly confidential and often restricted from sharing. Researchers and institutions often use geographic masking techniques [53, 81] to balance data utility and privacy protection by altering the precise geographic coordinates while preserving the overall spatial characteristics of data. Encouraging the adoption of spatial analysis could promote biomedical knowledge sharing and collaboration.
The application of spatiotemporal analysis of EHR-derived data was mainly limited to exploring spatiotemporal clusters with no spatiotemporal modeling. This might be due to the technical expertise required for analysis, data complexity, availability of longitudinal data, and computational challenges. The Bayesian framework offers a more adaptable framework to handle complex spatial and temporal dependencies, control confounding variables [82], and incorporate prior information, such as existing medical literature and expert opinions, resulting in more interpretable results [83, 84]. Moreover, spatiotemporal Bayesian modeling can aid in understanding disease trends and progressions, seasonality, and long-term shifts at the local levels [85]. Bayesian modeling can better account for uncertainty in parameter estimates and predictions to assess the reliability of findings before implementing interventions [86]. Future research should delve into spatial and spatiotemporal modeling, focusing on Bayesian approaches.
Among the health conditions studied, chronic and infectious diseases emerged as the most frequently investigated domains compared to others. This disparity may be attributed to the pressing public health concerns posed by diseases with immediate impacts that often attract more funding and resources for research initiatives [87, 88]. The historically high mortality rates of these conditions likely led to continuous research. Surprisingly, despite the plethora of funding in cancer research, we only found a small number of articles within the cancer domain, which may likewise be attributed to and indicative of the pressing needs of other domains, such as infectious disease.
We observed recurring and prominent themes related to the SDOH. This emphasis may result from the growing maturity and increased awareness within the biomedical informatics community regarding the significant influence of social, economic, and environmental factors on health outcomes. Understanding the roles of SDOH in health disparities will likely lead to the implementation of integrative health interventions that address the needs of individuals affected by these health disparities. These interventions can likewise be enhanced by incorporating spatial perspectives.
Another missed opportunity is the underutilization of computable phenotypes – automated algorithms designed for characterizing diseases and enrolling patients in studies. Most studies primarily depended on the manual application of inclusion and exclusion criteria to define phenotypes. While this method may be suitable in certain scenarios, it often necessitates greater depth and granularity to consistently and accurately capture the intended patient cohorts. The accuracy and precision of the manual approach can vary depending on the data sources and clinical domains. Notably, only two of the studies in our review used computable phenotypes, indicating a significant underutilization of this essential and potentially transformative approach, highlighting a noteworthy area for growth. Furthermore, only five articles carried out any form of chart review validation. Validation methods, including chart reviews, genetic markers, and clinical variables, are indispensable in phenotyping to guarantee the accurate characterization of the desired cohorts. This applies even to computable phenotypes within specific medical domains [89].
This study has several main limitations. First, we only considered English articles, possibly introducing language bias. Additionally, selection bias is possible due to database availability. However, we mitigated these limitations by searching Google Scholar and conducting backward reference checking to identify relevant studies that might yet be identified through our initial search strategy. Lastly, we used a query search strategy with limited keywords, which inherently restricted the scope of articles we could retrieve, potentially omitting studies that did not utilize these specific terms in their abstract or title.
Conclusion
This systematic review provided a comprehensive overview of the current utilization of spatial analysis in EHR-based research in the US and underscored the pivotal role that spatial analysis can play in clinical decision support and interventions. The utilization of EHR-derived spatial analysis is on an upward trajectory, parallel with the widespread adoption of EHR systems. The volume of articles on this topic is anticipated to continue to grow. The primary health outcomes investigated were asthma, hypertension, and diabetes. Notably, patient phenotypes involving genomics, imaging, and notes were rarely utilized. This review also highlighted the need for additional exploration of spatial analysis techniques, including but not limited to spatiotemporal Bayesian analysis and modeling, particularly in the cancer domain.
Table 4.
SDOH themes examined within the framework of Healthy People 2030 SDOH domains.
| Labels | SDOH Domains | Counts |
|---|---|---|
| SDOH1 | Economic Stability (Employment, Food Insecurity, Housing Instability, Poverty) | 2 |
| SDOH3 | Health Access & Quality (Access to Health Services, Access to Primary Care, Health Literacy) | 5 |
| SDOH4 | Neighborhood and Built Environment (Access to Foods that Support Healthy Dietary Patterns, Crime and Violence, Environmental Conditions, Quality of Housing) | 14 |
| SDOH5 | Social and Community Context (Civic Participation, Discrimination, Incarceration, Social Cohesion) | 5 |
| All 5 SDOH domains or SDOH as a whole | 36 | |
Acknowledgements
We would like to express our gratitude to Professor Gregory Glass from the University of Florida for his constructive review of the earlier version of the manuscript. We would also like to thank Clemson University librarian Karen Burton and MUSC librarian Ayaba Logan, MPH, MLIS, whose expertise in library and information sciences facilitated our systematic review.
Funding
AM, BH, and AVA are supported by the South Carolina SmartState Endowed Center for Environmental and Biomedical Panomics (CEABP); AVA is supported by South Carolina Cancer Disparities Research Center (SC CADRE) from NIH/NCI U54 CA210962; BH is a trainee supported by the SC Biomedical Informatics & Data Science for Health Equity Research Training (SC BIDS4HEALTH) from NIH/NLM T15 LM013977.
Funding Statement
AM, BH, and AVA are supported by the South Carolina SmartState Endowed Center for Environmental and Biomedical Panomics (CEABP); AVA is supported by South Carolina Cancer Disparities Research Center (SC CADRE) from NIH/NCI U54 CA210962; BH is a trainee supported by the SC Biomedical Informatics & Data Science for Health Equity Research Training (SC BIDS4HEALTH) from NIH/NLM T15 LM013977.
Footnotes
Supplementary Files
Conflicts of interest
None declared.
Additional Declarations: The authors declare no competing interests.
Contributor Information
Abolfazl Mollalo, Medical University of South Carolina.
Bashir Hamidi, Medical University of South Carolina.
Leslie Lenert, Medical University of South Carolina.
Alexander V. Alekseyenko, Medical University of South Carolina
References
- 1.Kuo A. and Dang S., Secure Messaging in Electronic Health Records and Its Impact on Diabetes Clinical Outcomes: A Systematic Review. Telemed J E Health, 2016. 22(9): p. 769–77. [DOI] [PubMed] [Google Scholar]
- 2.Dash S., et al. , Big data in healthcare: management, analysis and future prospects. Journal of big data, 2019. 6(1): p. 1–25. [Google Scholar]
- 3.Xie S., et al. , Enhancing Electronic Health Record Data with Geospatial Information. AMIA Jt Summits Transl Sci Proc, 2017. 2017: p. 123–132. [PMC free article] [PubMed] [Google Scholar]
- 4.He J., et al. , Evaluation of associations between asthma exacerbations and distance to roadways using geocoded electronic health records data. BMC Public Health, 2020. 20(1): p. 1626. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Schooley B.L., et al. , Rural veteran access to healthcare services: investigating the role of information and communication technologies in overcoming spatial barriers. Perspect Health Inf Manag, 2010. 7(Spring): p. 1f. [PMC free article] [PubMed] [Google Scholar]
- 6.Soares N., Dewalle J., and Marsh B., Utilizing patient geographic information system data to plan telemedicine service locations. J Am Med Inform Assoc, 2017. 24(5): p. 891–896. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Ali F., et al. , A Spatial Analysis of Health Disparities Associated with Antibiotic Resistant Infections in Children Living in Atlanta (2002–2010). EGEMS (Wash DC), 2019. 7(1): p. 50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Chang T.S., et al. , Sparse modeling of spatial environmental variables associated with asthma. J Biomed Inform, 2015. 53: p. 320–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Mollalo A., et al. , Spatial analysis of COVID-19 vaccination: a scoping review. International journal of environmental research and public health, 2021. 18(22): p. 12024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Anselin L., Varga A., and Acs Z., Geographical spillovers and university research: A spatial econometricperspective. Growth and change, 2000. 31(4): p. 501–515. [Google Scholar]
- 11.Mollalo A. and Tatar M., Spatial modeling of COVID-19 vaccine hesitancy in the United States. International journal of environmental research and public health, 2021. 18(18): p. 9488. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Shivade C., et al. , A review of approaches to identifying patient phenotype cohorts using electronic health records. J Am Med Inform Assoc, 2014. 21(2): p. 221–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Schinasi L.H., et al. , Using electronic health record data for environmental and place based population health research: a systematic review. Ann Epidemiol, 2018. 28(7): p. 493–502. [DOI] [PubMed] [Google Scholar]
- 14.Simpson C.L. and Novak L.L.. Place matters: the problems and possibilities of spatial data in electronic health records. in AMIA Annual Symposium Proceedings. 2013. American Medical Informatics Association. [PMC free article] [PubMed] [Google Scholar]
- 15.Hamidi B., et al. , Not all phenotypes are created equal: covariates of success in e-phenotype specification. Journal of the American Medical Informatics Association, 2023. 30(2): p. 213–221. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Patterson M.T. and Grossman R.L., Detecting Spatial Patterns of Disease in Large Collections of Electronic Medical Records Using Neighbor-Based Bootstrapping. Big Data, 2017. 5(3): p. 213–224. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Beck A.F., et al. , Pervasive Income-Based Disparities In Inpatient Bed-Day Rates Across Conditions And Subspecialties. Health Aff (Millwood), 2018. 37(4): p. 551–559. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Bravo M.A., et al. , Residential Racial Isolation and Spatial Patterning of Type 2 Diabetes Mellitus in Durham, North Carolina. Am J Epidemiol, 2018. 187(7): p. 1467–1476. [DOI] [PubMed] [Google Scholar]
- 19.Bravo M.A., Batch B.C., and Miranda M.L., Residential Racial Isolation and Spatial Patterning of Hypertension in Durham, North Carolina. Prev Chronic Dis, 2019. 16: p. E36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Bravo M.A., Anthopolos R., and Miranda M.L., Characteristics of the built environment and spatial patterning of type 2 diabetes in the urban core of Durham, North Carolina. J Epidemiol Community Health, 2019. 73(4): p. 303–310. [DOI] [PubMed] [Google Scholar]
- 21.Brooks M., et al. , Mapping the ChristianaCare response to COVID-19:: Clinical insights from the Value Institute’s Geospatial Analytics Core. Dela J Public Health, 2020. 6(2): p. 66–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Carey A., et al. , Epidemiology, clinical features, and outcomes of coccidioidomycosis, Utah, 2006– 2015. Emerging Infectious Diseases, 2021. 27(9): p. 2269. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Casey J.A., et al. , Greenness and Birth Outcomes in a Range of Pennsylvania Communities. Int J Environ Res Public Health, 2016. 13(3). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Cobert J., et al. , Geospatial Variations and Neighborhood Deprivation in Drug-Related Admissions and Overdoses. J Urban Health, 2020. 97(6): p. 814–822. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Davidson A.J., et al. , Monitoring Depression Rates in an Urban Community: Use of Electronic Health Records. J Public Health Manag Pract, 2018. 24(6): p. E6–e14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.DeMass R., et al. , Emergency department use and geospatial variation in social determinants of health: a pilot study from South Carolina. BMC Public Health, 2023. 23(1): p. 1527. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Epstein D., et al. , The effect of neighborhood and individual characteristics on pediatric critical illness. Journal of community health, 2014. 39: p. 753–759. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Gaudio E., et al. , Defining Radiation Treatment Interruption Rates During the COVID-19 Pandemic: Findings From an Academic Center in an Underserved Urban Setting. Int J Radiat Oncol Biol Phys, 2023. 116(2): p. 379–393. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Georgantopoulos P., et al. , Patient- and area-level predictors of prostate cancer among South Carolina veterans: a spatial analysis. Cancer Causes Control, 2020. 31(3): p. 209–220. [DOI] [PubMed] [Google Scholar]
- 30.Ghazi L., Drawz P.E., and Berman J.D., The association between fine particulate matter (PM(2.5)) and chronic kidney disease using electronic health record data in urban Minnesota. J Expo Sci Environ Epidemiol, 2022. 32(4): p. 583–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Garg G., et al. , Supermarket Proximity and Risk of Hypertension, Diabetes, and CKD: A Retrospective Cohort Study. Am J Kidney Dis, 2023. 81(2): p. 168–178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Grunwell J.R., et al. , Geospatial Analysis of Social Determinants of Health Identifies Neighborhood Hot Spots Associated With Pediatric Intensive Care Use for Life-Threatening Asthma. J Allergy Clin Immunol Pract, 2022. 10(4): p. 981–991.e1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Hanna-Attisha M., et al. , Elevated Blood Lead Levels in Children Associated With the Flint Drinking Water Crisis: A Spatial Analysis of Risk and Public Health Response. Am J Public Health, 2016. 106(2): p. 283–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Immergluck L.C., et al. , Geographic surveillance of community associated MRSA infections in children using electronic health record data. BMC Infect Dis, 2019. 19(1): p. 170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Jilcott S.B., et al. , The association between the food environment and weight status among eastern North Carolina youth. Public Health Nutr, 2011. 14(9): p. 1610–7. [DOI] [PubMed] [Google Scholar]
- 36.Kane N.J., et al. , Committing to genomic answers for all kids: Evaluating inequity in genomic research enrollment. Genetics in Medicine, 2023. 25(9): p. 100895. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Kersten E.E., et al. , Neighborhood Child Opportunity and Individual-Level Pediatric Acute Care Use and Diagnoses. Pediatrics, 2018. 141(5). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Lantos P.M., et al. , Neighborhood Disadvantage is Associated with High Cytomegalovirus Seroprevalence in Pregnancy. J Racial Ethn Health Disparities, 2018. 5(4): p. 782–786. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Lantos P.M., et al. , Geographic Disparities in Cytomegalovirus Infection During Pregnancy. J Pediatric Infect Dis Soc, 2017. 6(3): p. e55–e61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Lê-Scherban F., et al. , Identifying neighborhood characteristics associated with diabetes and hypertension control in an urban African-American population using geo-linked electronic health records. Prev Med Rep, 2019. 15: p. 100953. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Lieu T.A., et al. , Geographic clusters in underimmunization and vaccine refusal. Pediatrics, 2015. 135(2): p. 280–9. [DOI] [PubMed] [Google Scholar]
- 42.Lipner E.M., et al. , A Geospatial Epidemiologic Analysis of Nontuberculous Mycobacterial Infection: An Ecological Study in Colorado. Ann Am Thorac Soc, 2017. 14(10): p. 1523–1532. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Liu L., et al. , Understanding Pediatric Surgery Cancellation: Geospatial Analysis. J Med Internet Res, 2021. 23(9): p. e26231. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Mayne S.L., Pellissier B.F., and Kershaw K.N., Neighborhood Physical Disorder and Adverse Pregnancy Outcomes among Women in Chicago: a Cross-Sectional Analysis of Electronic Health Record Data. J Urban Health, 2019. 96(6): p. 823–834. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Mayne S.L., et al. , Racial Residential Segregation and Hypertensive Disorder of Pregnancy Among Women in Chicago: Analysis of Electronic Health Record Data. Am J Hypertens, 2018. 31(11): p. 1221–1227. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Oyana T.J., et al. , Spatiotemporal patterns of childhood asthma hospitalization and utilization in Memphis Metropolitan Area from 2005 to 2015. J Asthma, 2017. 54(8): p. 842–855. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Pearson D.R. and Werth V.P., Geospatial Correlation of Amyopathic Dermatomyositis With Fixed Sources of Airborne Pollution: A Retrospective Cohort Study. Front Med (Lausanne), 2019. 6: p. 85. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Samuels E.A., et al. , Mapping emergency department asthma visits to identify poor-quality housing in New Haven, CT, USA: a retrospective cohort study. The Lancet Public Health, 2022. 7(8): p. e694–e704. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Schwartz B.S., et al. , Body mass index and the built and social environments in children and adolescents using electronic health records. Am J Prev Med, 2011. 41(4): p. e17–28. [DOI] [PubMed] [Google Scholar]
- 50.Sharif-Askary B., et al. , Geospatial Analysis of Risk Factors Contributing to Loss to Follow-up in Cleft Lip/Palate Care. Plast Reconstr Surg Glob Open, 2018. 6(9): p. e1910. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Sidell M.A., et al. , Ambient air pollution and COVID-19 incidence during four 2020–2021 case surges. Environ Res, 2022. 208: p. 112758. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Siegel S.D., et al. , A Population Health Assessment in a Community Cancer Center Catchment Area: Triple-Negative Breast Cancer, Alcohol Use, and Obesity in New Castle County, Delaware. Cancer Epidemiol Biomarkers Prev, 2022. 31(1): p. 108–116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Soares N., Dewalle J., and Marsh B., Utilizing patient geographic information system data to plan telemedicine service locations. Journal of the American Medical Informatics Association, 2017. 24(5): p. 891–896. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Sun Y., et al. , Exposure to air pollutant mixture and gestational diabetes mellitus in Southern California: Results from electronic health record data of a large pregnancy cohort. Environ Int, 2022. 158: p. 106888. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Tabano D.C., et al. , The Spatial Distribution of Adult Obesity Prevalence in Denver County, Colorado: An Empirical Bayes Approach to Adjust EHR-Derived Small Area Estimates. EGEMS (Wash DC), 2017. 5(1): p. 24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Wakefield D.V., et al. , Location as Destiny: Identifying Geospatial Disparities in Radiation Treatment Interruption by Neighborhood, Race, and Insurance. Int J Radiat Oncol Biol Phys, 2020. 107(4): p. 815–826. [DOI] [PubMed] [Google Scholar]
- 57.Wilson W.W., et al. , Association Between Acute Exposure to Crime and Individual Systolic Blood Pressure. Am J Prev Med, 2022. 62(1): p. 87–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Winckler B., et al. , Geographic Variation in Acute Pediatric Mental Health Utilization. Acad Pediatr, 2023. 23(2): p. 448–456. [DOI] [PubMed] [Google Scholar]
- 59.Xie S.J., et al. , Geospatial divide in real-world EHR data: Analytical workflow to assess regional biases and potential impact on health equity. AMIA Jt Summits Transl Sci Proc, 2023. 2023: p. 572–581. [PMC free article] [PubMed] [Google Scholar]
- 60.Zhan F.B., et al. , Spatial Insights for Understanding Colorectal Cancer Screening in Disproportionately Affected Populations, Central Texas, 2019. Prev Chronic Dis, 2021. 18: p. E20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Zhao Y.-Q., Norton D., and Hanrahan L., Small area estimation and childhood obesity surveillance using electronic health records. Plos one, 2021. 16(2): p. e0247476. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Zhao P., Kwan M.P., and Zhou S., The Uncertain Geographic Context Problem in the Analysis of the Relationships between Obesity and the Built Environment in Guangzhou. Int J Environ Res Public Health, 2018. 15(2). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Yu W., Spatial co-location pattern mining for location-based services in road networks. Expert Systems with Applications, 2016. 46: p. 324–335. [Google Scholar]
- 64.Moazeni M., et al. , Spatiotemporal analysis of COVID-19, air pollution, climate, and meteorological conditions in a metropolitan region of Iran. Environ Sci Pollut Res Int, 2022. 29(17): p. 24911–24924. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Diggle P.J., Statistical analysis of spatial and spatio-temporal point patterns. 2013: CRC press. [Google Scholar]
- 66.Okabe A., Satoh T., and Sugihara K., A kernel density estimation method for networks, its computational method and a GIS–based tool. International Journal of Geographical Information Science, 2009. 23(1): p. 7–32. [Google Scholar]
- 67.Fu W.J., et al. , Using Moran’s I and GIS to study the spatial pattern of forest litter carbon density in a subtropical region of southeastern China. Biogeosciences, 2014. 11(8): p. 2401–2409. [Google Scholar]
- 68.Anselin L., Local indicators of spatial association—LISA. Geographical analysis, 1995. 27(2): p. 93–115. [Google Scholar]
- 69.Lee S.-I., Developing a bivariate spatial association measure: an integration of Pearson’s r and Moran’s I. Journal of geographical systems, 2001. 3: p. 369–385. [Google Scholar]
- 70.Ord J.K. and Getis A., Local spatial autocorrelation statistics: distributional issues and an application. Geographical analysis, 1995. 27(4): p. 286–306. [Google Scholar]
- 71.Kulldorff M., A spatial scan statistic. Communications in Statistics-Theory and methods, 1997. 26(6): p. 1481–1496. [Google Scholar]
- 72.Joseph Sheehan T., et al. , The geographic distribution of breast cancer incidence in Massachusetts 1988 to 1997, adjusted for covariates. Int J Health Geogr, 2004. 3(1): p. 17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.C. F. Dormann, et al. , Methods to account for spatial autocorrelation in the analysis of species distributional data: a review. Ecography, 2007. 30(5): p. 609–628. [Google Scholar]
- 74.Kumar V.S., et al. , Spatial mapping of acute diarrheal disease using GIS and estimation of relative risk using empirical Bayes approach. Clinical epidemiology and global health, 2017. 5(2): p. 87–96. [Google Scholar]
- 75.Wah W., Ahern S., and Earnest A., A systematic review of Bayesian spatial-temporal models on cancer incidence and mortality. Int J Public Health, 2020. 65(5): p. 673–682. [DOI] [PubMed] [Google Scholar]
- 76.Shiffrin R.M., et al. , A survey of model evaluation approaches with a tutorial on hierarchical bayesian methods. Cogn Sci, 2008. 32(8): p. 1248–84. [DOI] [PubMed] [Google Scholar]
- 77.McCarty C.A., et al. , The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC medical genomics, 2011. 4: p. 1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.McCarty C.A., et al. , The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC Med Genomics, 2011. 4: p. 13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) and National Institutes of Health. Overweight & Obesity Statistics. 2021. September 2023 [cited 2023 9/18/2023]; Available from: https://www.niddk.nih.gov/health-information/health-statistics/overweight-obesity.
- 80.KPWA/UW Depression (Phenotype ID 1095). 2018. 10/1/2018 [cited 2023 9/18/2023]; Available from: https://phekb.org/phenotype/depression.
- 81.Zandbergen P.A., Ensuring confidentiality of geocoded health data: assessing geographic masking strategies for individual-level data. Advances in medicine, 2014. 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Aswi A., et al. , Bayesian spatial and spatio-temporal approaches to modelling dengue fever: a systematic review. Epidemiol Infect, 2018. 147: p. e33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Bharadiya J.P., A Review of Bayesian Machine Learning Principles, Methods, and Applications. International Journal of Innovative Science and Research Technology, 2023. 8(5): p. 2033–2038. [Google Scholar]
- 84.Walsh A.S., Louis T.A., and Glass G.E., Detecting multiple levels of effect during survey sampling using a Bayesian approach: Point prevalence estimates of a hantavirus in hispid cotton rats (Sigmodon hispidus). Ecological modelling, 2007. 205(1–2): p. 29–38. [Google Scholar]
- 85.Hanzlicek G.A., et al. , Bayesian Space-Time Patterns and Climatic Determinants of Bovine Anaplasmosis. PLoS One, 2016. 11(3): p. e0151924. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Wintle B.A., et al. , The use of Bayesian model averaging to better represent uncertainty in ecological models. Conservation biology, 2003. 17(6): p. 1579–1590. [Google Scholar]
- 87.Carter A.J. and Nguyen C.N., A comparison of cancer burden and research spending reveals discrepancies in the distribution of research funding. BMC public health, 2012. 12(1): p. 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Varnousfaderani S.D., et al. , Alleviating effects of coenzyme Q10 supplements on biomarkers of inflammation and oxidative stress: results from an umbrella meta-analysis. Frontiers in Pharmacology, 2023. 14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Brown J.S., et al. , Using and improving distributed data networks to generate actionable evidence: the case of real-world outcomes in the Food and Drug Administration’s Sentinel system. J Am Med Inform Assoc, 2020. 27(5): p. 793–797. [DOI] [PMC free article] [PubMed] [Google Scholar]


