Enhancing Catchment Area Tools: A De-Identification Method for Integrating Clinical Trial Data with Cancer InFocus

Daniel Antonio; Todd Burus; Tarneka M Manning; Michael J Gurley; Giorgio Di Salvo; Jorge Andres Heneche; Carolyn Passaglia; Masha Kocherginsky; Melissa A Simon

doi:10.1080/28322134.2024.2388564

. Author manuscript; available in PMC: 2025 Aug 17.

Published in final edited form as: Prev Oncol Epidemiol. 2024 Aug 17;2(1):2388564. doi: 10.1080/28322134.2024.2388564

Enhancing Catchment Area Tools: A De-Identification Method for Integrating Clinical Trial Data with Cancer InFocus

Daniel Antonio ¹, Todd Burus ², Tarneka M Manning ¹, Michael J Gurley ¹, Giorgio Di Salvo ¹, Jorge Andres Heneche ¹, Carolyn Passaglia ¹, Masha Kocherginsky ³, Melissa A Simon ⁴

PMCID: PMC11870640 NIHMSID: NIHMS2029624 PMID: 40027469

Abstract

Background:

National Cancer Institute (NCI) designated cancer centers are entrusted with assessing the cancer burden within their catchment areas and using this information to guide research and outreach efforts. Data visualizations, like Cancer InFocus, have emerged as essential tools for facilitating this effort. Integrating clinical trial accrual data can further enhance our understanding of the catchment area. However, these data must be de-identified in accordance with the Health Insurance Portability and Accountability Act (HIPAA). This study introduces a de-identification method through geographic aggregation, ensuring HIPAA compliance and enabling comprehensive catchment area surveillance.

Methods:

Home addresses of patients enrolled in clinical trials at an NCI-designated Comprehensive Cancer Center were geocoded to census tracts. Tracts with less than 20 accruals were merged using the R geographic aggregation tool. A risk assessment was conducted to ensure low re-identification risk. Accrual rates were calculated and integrated into Cancer InFocus.

Results:

Successful aggregation exceeded the 20-patient threshold for all merged tracts with low re-identification risk. Disparities between clinical trial accruals and social determinants of health were identified.

Discussion:

The geographic aggregation method, compliant with HIPAA standards and integrated with Cancer InFocus, can enhance catchment area surveillance, furthering cancer research and outreach by pinpointing area-specific needs.

Keywords: NCI Cancer Center, Catchment Area, Data Visualization, Clinical Trial Accrual, Geographic Aggregation

INTRODUCTION

Catchment Area Data and Tools

Beginning in 2012, the National Cancer Institute (NCI) established a directive mandating comprehensive cancer centers to define their catchment areas as a population-based geographic region, focusing on research aimed at reducing the burden of cancer in these populations.¹ Expanding in 2016 to include community outreach and engagement activities, the catchment area grew to encompass the various community members impacted by cancer center initiatives and programs.² With the latest NCI guidance, cancer centers, informed by a comprehensive understanding of the cancer burden in their catchment areas, are expected to meaningfully engage communities and conduct research tailored to the needs of these populations.³ With these expectations comes the need for comprehensive data and tools that can help translate catchment area surveillance into actionable community outreach and engagement strategies.

Current approaches typically utilize publicly available data sources such as cancer registries and population surveys to describe the burden of cancer.^4,5 These data help inform community outreach and engagement strategies by fostering dialog with community members around addressing specific cancer needs and forming partnerships to provide interventions like targeted education, evidence-based prevention services, and improved resource navigation.⁵ In addition to these approaches, data visualizations have emerged as essential tools for catchment area surveillance and information dissemination across various cancer centers.^5,6

In particular, Cancer InFocus, developed by the University of Kentucky Markey Cancer Center as a comprehensive solution for data gathering and visualization, is an appealing tool for rapid implementation through its open-source format. The data gathering component of Cancer InFocus uses the Python programming language to scrape publicly available data from dozens of sources and curates them at the county and census tract levels to the specific catchment areas of over 70 US cancer centers. These data can then be used with a series of R programs to build and deploy multiple interactive R Shiny visualization applications. Catchment area datasets are updated monthly and made available through CancerInFocus.org. The R Shiny application code can be obtained by any cancer center through a no-cost licensing agreement with the University of Kentucky. As of March 2024, 23 cancer centers have completed agreements to adopt Cancer InFocus for their catchment area cancer surveillance needs.

Clinical Trial Accruals and HIPAA

In addition to publicly available data, cancer centers often use clinical trial data to describe the alignment between community engagement, research efforts, and the populations served by cancer centers.⁵ Some have utilized clinical trial data to evaluate how well their accrual proportions match their catchment area demographics.⁷ Others, like Stanford Comprehensive Cancer Center, applied spatial analysis to identify accrual disparities across race/ethnicity groups throughout their catchment area.⁸ However, these data are subject to county-level reporting and may overlook important variations across the populations within these counties. This is especially relevant for densely urban areas like Chicago and other metropolitan regions. Thus, more precision could better inform targeted community outreach and research.

Efforts to provide more granular data must balance utility with ensuring patient privacy and confidentiality, as mandated by the Health Insurance Portability and Accountability Act (HIPAA). ⁹ For example, state cancer registries, although excluded from HIPAA regulations under the public health exemption due to their role in public health surveillance, usually report cancer statistics at the county level. Registries also suppress rates with small counts to protect patient confidentiality and ensure reliability.^10–12 On the other hand, NCI-designated cancer centers, classified as covered entities under HIPAA, must adhere to de-identification standards for the use and disclosure of protected health information in situations where an IRB or Privacy Board Waiver is not in place.^10,13,14

Given the limitation of county-level reporting and potential benefit to cancer center initiatives, our study aims to enhance catchment area surveillance by using a HIPAA-compliant approach to geographic data aggregation. This method seeks to overcome the limitation of county-level reporting by providing finer geographic detail to visualizations in Cancer InFocus that can reveal important variations within a densely populated catchment area. Also, by implementing this approach, we explore disparities between clinical trial accruals and social determinants of health at a more granular level.

METHODS AND MATERIALS

Expert Determination

The de-identification standard outlined in the HIPAA Privacy Rule offers two methods for covered entities to render health information non-identifiable: expert determination and safe harbor.¹⁴ The safe harbor method involves the removal of 18 types of identifiers, while expert determination applies statistical or scientific principles to minimize the risk of re-identification.¹⁴ Commonly used techniques to satisfy expert determination requirements include generalization, suppression, randomization, and subsampling.¹⁵ In addition to applying these techniques, the statistical or scientific methods and results that justify a minimal risk of re-identification must be documented.¹⁴

In this study, the clinical trial data underwent de-identification using expert determination, utilizing the generalization technique. Generalization was chosen for its ease of implementation, transparent nature, and capacity to preserve data utility. This process involved aggregating individual years into a five-year period (e.g., 2018–2023) and merging census tracts into larger geographic areas to meet a minimum required value or count per geography. Census tracts were merged using the R geographic aggregation tool (GAT).¹⁶ A conservative risk threshold of 20 clinical trial accruals per geography was selected as the minimum required value for data to be considered non-identifiable. This decision was based on previous literature indicating its high level of de-identification.^15,17 Any geography with counts falling below this threshold would be suppressed.

In line with HIPAA guidelines, a risk assessment was conducted after the aggregation process to ensure minimal risk of re-identification. The method and results of the expert determination were documented and shared with the data steward. Figure 1 provides an overview of the expert determination process utilized in this study.

Figure 1. — Expert Determination Process for De-identifying Clinical Trial Accrual Data

Clinical Trial Dataset

To comprehensively assess clinical trials across our institution, this analysis encompassed various trials conducted at Northwestern Medicine (NM) facilities over multiple years. Home addresses of patients enrolled in cancer clinical trials at Northwestern University’s Robert H. Lurie Comprehensive Cancer Center (LCC) and affiliated NM sites from 2018–2023 were geocoded to their respective census tract using the Decentralized Geomarker Assessment for Multi-Site Studies.¹⁸ This included interventional, non-interventional, treatment, and non-treatment trials. Data were approved for release from the NM data steward under the condition that the data be de-identified in accordance with HIPAA standards.

Statistical analysis was conducted using R.¹⁹ Data were filtered to include only accruals within the Robert H. Lurie Comprehensive Cancer Center Catchment Area (LCC-CA), which encompasses the greater Chicago regions of Cook, DuPage, Lake, Will, McHenry, Kane, Kendall, Grundy, and Kankakee counties. Over 90% of the LCC patient population resides within this catchment area. Direct identifiers such as address, patient ID, and protocol ID were removed. United States Census Bureau 2022 TIGER/Line census tract shapefiles were joined and utilized to calculate the number of clinical trial accruals per census tract.²⁰

To assess the relationship between social determinants of health and clinical trial accrual rates, population-level characteristics of sociodemographic and healthcare equity factors were joined at the census tract level. Census tract sociodemographic factors, including sex, age, race/ethnicity, household income, educational attainment, poverty rate, and unemployment rate, were joined with data from the United States Census Bureau 2018–2022 American Community Survey.²¹ Additionally, healthcare equity factors, including health insurance, cancer screening, and cancer diagnosis history, were joined with PLACES 2023 census tract data from the Centers for Disease Control and Prevention.²²

Geographic Aggregation

GAT safeguards confidentiality by merging areas to meet population and case count criteria.¹⁶ For example, if an area has fewer cases than a specified threshold, it is merged with adjacent areas until it reaches the required threshold. The tool offers the flexibility to customize various parameters, such as minimum and maximum aggregation values, exclusions, aggregation algorithms, aggregation boundaries, and rate calculations.

This study used GAT to aggregate census tracts within the clinical trial enrollment dataset to meet a minimum requirement of 20 accruals per geography. Tracts with zero accruals and those with accruals surrounded by non-accruing neighbors were excluded from the aggregation process. The least value algorithm, which prioritizes merging areas with the lowest values first, was selected for its ability to create the most granular geographies possible. Counties were utilized as boundaries for aggregating census tracts.

Subsequently, we calculated the clinical trial accrual rate within each geography as the number of accruals per 10,000 residents. By default, GAT summarizes numerical variables by summing the values from all aggregated areas. To control for this and standardize comparisons of social determinants of health across geographies, we calculated the weighted average value for each social determinant of health across the aggregated geographies. Weights were based on the proportion of the population attributed to an individual census tract in an aggregated geography. These weights were then applied to the social determinants of health values in each census tract, and the resulting weighted values were summed to calculate the weighted average. Finally, we compared the newly aggregated areas to the original dataset to evaluate the risk of re-identification and distribution of clinical trial accruals. We used Pearson’s Chi-square test to compare the proportion of clinical trial accruals falling below the minimum required value before and after aggregation. The Wilcoxon rank sum test was utilized to compare non-normal distributed accrual counts and average risk levels before and after aggregation.

Risk Assessment

A risk assessment evaluated the probability of re-identifying an individual from the de-identified dataset by linking external information, such as voter registration or vital statistics, to ensure compliance with HIPAA regulations. Estimating the risk of re-identification involves considering both the re-identification risk threshold and the extent of overlap between potential external linkage data sources.¹⁷ Our risk assessment conservatively assumes the highest-risk scenario where an external data source completely overlaps with the de-identified data. Under this scenario, the risk of re-identification for any individual enrolled in a clinical trial can be estimated as 1/k, where k represents the number of accruals in an aggregated or non-aggregated census tract.¹⁷ Therefore, in an attempt to identify an individual, there would be a probability of 1/k for a correct match within the de-identified data. This probability can be averaged across all groups and defined as average re-identification risk.¹⁵ Maximum risk defines k as the minimum required value or smallest group of k in the data, setting the upper bound of re-identification risk.¹⁵ Our assessment utilized both of these calculations to evaluate re-identification risk.

Cancer InFocus Integration and Analysis

After confirming that the risk of re-identification for the de-identified clinical trial data was equal to or below the maximum risk threshold, the aggregated shapefile outputted by GAT was joined with the existing Cancer InFocus shapefiles for catchment area visualization. To compare clinical trial accrual rates across the LCC-CA, rates per 10,000 were visualized using quintile breaks. NM facility locations were overlaid across the catchment area, and drive time from the nearest facility was calculated to analyze geographic patterns of access to enrollment sites. The relationship between social determinants of health and clinical trial accrual rates was assessed by comparing the averages of health determinants across quintiles of clinical trial accrual rates and among areas with no accruals. The Kruskal-Wallis rank sum test was used to control for non-normal distribution.

RESULTS

De-identification

Before aggregation, the majority (85%) of the 2,110 census tracts across the LCC-CA fell below the minimum required value of at least 20 clinical trial accruals (Table 1). Following aggregation, all aggregated census tracts met the minimum required value. There were 214 census tracts, representing approximately 7% of the catchment area population, with no clinical trial accruals. The risk assessment showed a substantial decrease in average and maximum risk after aggregation (average risk: 0.20 to 0.04; maximum risk: 1.0 to 0.05). The maximum risk post aggregation equates to a re-identification probability of 5% across all geographies, successfully meeting the de-identification risk threshold.

Table 1.

Clinical Trial Accruals and Re-Identification Risk per Census Tract Before and After Aggregation

	Before Aggregation, N = 2,110¹	After Aggregation, N = 692¹	p-value²
Number of Accruals	5 (3 – 9)	26 (23 – 31)	<0.001
Minimum Required Value			<0.001
Less than 20	1,793 (85%)	0 (0%)
More than 20	103 (5%)	478 (69%)
No Accruals	214 (10%)	214 (31%)
Average Risk	0.20 (0.11 – 0.33)	0.04 (0.03 – 0.04)	<0.001
Maximum Risk	1.00	0.05

Open in a new tab

Median (IQR); n (%)

Wilcoxon rank sum test; Pearson’s Chi-squared test

Cancer InFocus Visualization

Figure 2 depicts clinical trial accrual rates per 10,000 across aggregated census tracts in the LCC-CA. NM facilities are represented as white points, while county borders are outlined in black. Census tracts with no accruals are colored grey. The visualization illustrates the clustering of accruals near NM facilities and the concentration of tracts with no accruals within Cook County. The southern counties of Kankakee, Will, Grundy, and Kendall, which lack an NM facility, exhibit relatively low accrual rates compared to other areas close to NM facilities. Notably, despite Cook County’s multiple facilities, areas with low accrual rates persist, particularly in the south and west regions, suggesting that distance may not be the only factor impacting accruals.

Social Determinants of Health and Clinical Trial Accruals

Demographic Factors

Significant disparities emerged across geographies with no accruals and across quintiles. We observed a pronounced inverse relationship between accrual rates and proportions of Black (non-Hispanic) and Hispanic or Latino individuals. In areas with the highest accruals, these minority populations accounted for only 12% of the total population, whereas in areas with no accruals, they accounted for 47% (Table 2). Among White (non-Hispanic) individuals, the opposite trend was observed between regions with the highest accruals and those with none (81% vs. 10%, respectively). Additionally, age distribution varied significantly by accrual rates, with an increase in the proportion of individuals over the age of 64 and a decrease in those under 18 as accrual rates increased.

Table 2.

Distribution of Sociodemographic and Healthcare Equity Factors by Quintiles of Clinical Trial Accrual Rates Across Geographies, Including No-Accrual Areas

	Quintiles of Clinical Trial Accrual Rates per 10,000
	No Accruals, N = 214¹	5 to 11, N = 95¹	11 to 16, N = 96¹	16 to 23, N = 94¹	23 to 38, N = 97¹	38 to 387, N = 96¹	p-value ²
Population	2,572 (1,589 – 3,667)	4,740 (4,364 – 5,261)	4,710 (3,963 – 5,115)	4,252 (3,763 – 4,814)	3,792 (3,078 – 5,195)	4,318 (2,999 – 5,863)	<0.001
Sex
Percent Male	50 (45 – 52)	50 (48 – 50)	49 (48 – 51)	49 (47 – 51)	50 (48 – 51)	50 (46 – 52)	0.7
Percent Female	50 (48 – 55)	50 (50 – 52)	51 (49 – 52)	51 (49 – 53)	50 (49 – 52)	50 (48 – 54)	0.7
Age
Under 18	24 (20 – 29)	24 (22 – 26)	23 (21 – 25)	22 (20 – 24)	20 (15 – 24)	14 (7 – 23)	<0.001
18 – 64	62 (58 – 66)	62 (60 – 63)	61 (59 – 64)	61 (57 – 64)	62 (56 – 74)	70 (60 – 79)	<0.001
Over 64	13 (9 – 17)	14 (12 – 16)	16 (13 – 19)	17 (13 – 21)	15 (9 – 20)	15 (9 – 20)	<0.001
Race/Ethnicity
White (non-Hispanic)	10 (2 – 36)	53 (22 – 72)	67 (49 – 81)	73 (46 – 84)	81 (69 – 87)	81 (72 – 87)	<0.001
Black (non-Hispanic)	18 (4 – 77)	8 (4 – 19)	5 (2 – 13)	4 (2 – 13)	3 (2 – 7)	4 (2 – 9)	<0.001
Hispanic or Latino	29 (10 – 69)	24 (15 – 43)	15 (9 – 25)	12 (7 – 22)	9 (6 – 15)	8 (6 – 13)	<0.001
Asian (non-Hispanic)	0 (0 – 4)	3 (1 – 9)	6 (3 – 18)	9 (3 – 16)	7 (3 – 10)	7 (4 – 15)	<0.001
Other (non-Hispanic)	1 (0 – 3)	3 (2 – 4)	3 (3 – 5)	3 (2 – 5)	4 (2 – 5)	4 (3 – 7)	<0.001
Education
Less than High School	25 (14 – 39)	13 (7 – 20)	9 (6 – 13)	6 (4 – 10)	3 (2 – 6)	2 (1 – 5)	<0.001
High School	44 (32 – 53)	32 (25 – 37)	24 (19 – 31)	20 (15 – 25)	10 (5 – 18)	5 (3 – 10)	<0.001
Above High School	56 (47 – 68)	68 (63 – 75)	76 (69 – 81)	80 (75 – 85)	90 (82 – 95)	95 (90 – 97)	<0.001
Household Income	50,723 (34,302 – 66,237)	82,128 (66,352 – 104,501)	93,788 (80,816 – 114,262)	104,292 (83,694 – 123,949)	125,804 (99,678 – 165,784)	127,297 (93,056 – 159,385)	<0.001
Living Below Poverty	16 (7 – 26)	7 (4 – 10)	6 (3 – 9)	5 (3 – 8)	3 (1 – 5)	3 (1 – 6)	<0.001
Unemployment Rate	9 (5 – 16)	5 (4 – 8)	5 (4 – 6)	5 (4 – 7)	4 (3 – 5)	3 (2 – 5)	<0.001
Uninsured	12 (7 – 17)	9 (5 – 12)	6 (4 – 10)	5 (3 – 8)	3 (2 – 6)	3 (2 – 4)	<0.001
Medicaid	34 (24 – 45)	19 (12 – 26)	14 (10 – 21)	13 (8 – 18)	7 (4 – 11)	5 (3 – 10)	<0.001
Drive Time to NM Facility, Minutes	20 (16 – 24)	23 (18 – 28)	19 (15 – 22)	16 (12 – 21)	13 (10 – 17)	9 (5 – 14)	<0.001
Breast Cancer Screening	79 (77 – 82)	78 (77 – 79)	78 (77 – 80)	79 (78 – 80)	80 (79 – 81)	81 (80 – 82)	<0.001
Cervical Cancer Screening	78 (76 – 79)	80 (79 – 82)	81 (79 – 83)	82 (80 – 84)	84 (82 – 85)	84 (81 – 85)	<0.001
Colorectal Cancer Screening	62 (58 – 65)	66 (64 – 67)	67 (65 – 69)	69 (67 – 71)	71 (69 – 72)	72 (71 – 73)	<0.001
History of Cancer Diagnosis	5 (4 – 5)	6 (5 – 7)	6 (5 – 7)	6 (5 – 7)	6 (4 – 8)	5 (4 −7)	<0.001

Open in a new tab

Median (IQR)

Kruskal-Wallis rank sum test

Socioeconomic Factors

Across socioeconomic factors, increased rates of accruals were linked to higher household income and lower rates of poverty and unemployment. For example, compared to areas within the highest quintile of accruals, those within the lowest quintile exhibited more than twice the poverty rate and over a 50% higher unemployment rate (Figure 3). Areas with high accrual rates also demonstrated higher education levels, with nearly all individuals in the highest quintile having an above-high school education. In contrast, only 56% within the lowest quintile and 46% in areas with no accruals achieved the same education level.Top of Form

Healthcare Equity Factors

Higher accruals were associated with better health insurance coverage, more cancer screenings, and closer drive time to enrollment facilities. In areas with zero accruals, the uninsured rate was four times greater, and the Medicaid rate was almost seven times greater than in areas within the highest quintile of accruals. Areas with the highest accrual rates were an average of 9 minutes away by car from the nearest NM facility, compared to 23 and 20 minutes in areas with the lowest accruals and those with none, respectively. Among cancer screenings, the largest difference was observed in colorectal cancer between regions with zero accruals and those in the top quintile (62% vs. 72%, respectively).

DISCUSSION

This study successfully implemented a geographic aggregation method to de-identify clinical trial accrual data, facilitating visualization of rates across the catchment area and revealing significant disparities in clinical trial enrollment and social determinants of health. Before aggregating census tracts, 85% of the data used for the visualization would have been suppressed, severely limiting the ability to identify sub-county geographies with low accrual rates. The conservative risk assessment demonstrates that the risk of re-identification would be at most 5 percent. These findings indicate that this method not only maintains patient confidentiality but also provides a more nuanced understanding of clinical trial accrual rates across the catchment area.

Our analysis of social determinants of health revealed that clinical trial enrollment predominantly occurred in regions characterized by higher socioeconomic status, greater insurance coverage, increased cancer screenings, proximity to facilities, and less diverse populations. Interestingly, the history of cancer diagnosis did not vary as extensively as other factors across accrual rates or in geographies with no clinical trial accruals. This could indicate that the differences in accrual rates are more influenced by disparities in social determinants of health rather than differences in inherent cancer burden.

These findings are consistent with existing research indicating the significant role these factors play in influencing cancer clinical trial enrollment. In their analysis of disparities, the Stanford Comprehensive Cancer Center found that accrual rates are highest in counties closest to the institute, underscoring the importance of proximity to enrollment sites.⁸ Studies of clinical trials for brain cancer and myelodysplastic syndromes (MDS) also observed that closer distance to enrollment sites is associated with increased participation. ^23,24 Addressing distance as a barrier can be achieved by implementing evidence-based strategies, such as patient navigators and community partnerships that provide patient-centered support.²⁵ Patient navigators can play a critical role in arranging transportation by connecting patients to hospital services or external support systems, including Medicaid, social services, and religious organizations.²⁶

Socioeconomic status is also a determinant of cancer clinical trial participation. A systematic review of 16 studies of socioeconomic status and enrollment in cancer clinical trials, 13 showed an association between high socioeconomic status and trial enrollment.²⁷ The increased financial cost associated with trial participation, like travel and lodging, coupled with concerns regarding these expenses and the financial strain associated with cancer treatment, are all probable drivers contributing to the under-enrollment of low-income patients.^27,28 However, successful interventions providing financial assistance have overcome these barriers, increasing enrollment numbers and decreasing financial concerns related to clinical trial participation.^29,30 A three-component approach to improving cancer health equity consisting of outreach and education, patient navigation, and financial assistance saw increased enrollment in clinical trials, supporting patients with the greatest financial need.²⁹ Those enrolled in the program were more likely to report financial concerns related to travel, lodging, and medical costs associated with clinical trial participation.²⁹

Often connected to issues stemming from socioeconomic status, health insurance coverage is likewise widely cited as a significant barrier to clinical trial participation.^31–33 With the enactment of the Clinical Treatment Act in 2020, Medicaid, now alongside Medicare, covers routine costs associated with clinical trial participation.³⁴ Supporting enrollment into Medicare and Medicaid for eligible patients could help bridge the gap in enrollment among these underinsured populations. Navigating the health insurance landscape was often cited as an issue among cancer patients participating in clinical trials by patient navigators.²⁶ Providing patients with enrollment assistance alongside financial assistance for treatment could help address concerns relating to insurance.

The lack of diversity in clinical trial participation has been a long-standing observation, attributed in part to disparities in the aforementioned social determinants of health.^25,32 Apart from engaging in strategies to address the disparities in social determinants among minority populations, developing community partnerships and engaging in community-focused communications and education could further decrease disparities in accruals.²⁵ In an intervention trial focused on African American breast cancer survivors, retention and recruitment strategies focused on community engagement and access enhancement resulted in a dramatic 373% increase in accruals in 11 months.³⁵ Intentionally engaging in these strategies throughout all clinical trials could have a similarly dramatic impact on closing the gap of under-enrolled minority populations.

Important limitations must be considered in this study. Firstly, it’s important to note that the health determinants data used are population-based, representing statistics across communities where patients reside rather than individual patient statuses. Secondly, the nature of aggregation introduces the modifiable areal unit problem (MAUP), potentially obscuring spatial variations in clinical trial accrual rates and leading to biased or imprecise estimates of enrollment determinants. To mitigate this, we opted for the least value algorithm to provide the most granular geographies while balancing de-identification standards. Future research could address this limitation by deploying GAT to aggregate geographies with similar relevant population characteristics.³⁶

Additionally, using a general dataset of various clinical trial types and facilities and grouping multiple years of data limits the ability to differentiate and track enrollments across these characteristics. However, the scalability, flexibility, and data privacy of this approach allows for further exploration of these attributes by implementing the same aggregation method. Likewise, other NCI-designated centers could readily apply this approach to these and other valuable data sources to optimize their research and outreach strategies.

It’s worth acknowledging that low accrual rates in southwest Cook County and the surrounding areas near the recently opened Orland Park Cancer Center in 2022 could be attributed to enrollments only increasing in this region post-opening. Also, low accruals in these regions and elsewhere in the catchment area may not necessarily indicate low enrollment. Overlapping catchment areas with other institutions, such as the University of Chicago Comprehensive Cancer Center and the University of Illinois Cancer Center, could suggest that patients are enrolling at these and other organizations. Finally, to fully understand the determinants of clinical trial enrollment, it is essential to explore these factors among our patient population and specific cancer sites. Future research will focus on identifying the factors influencing pre- and post-enrollment disparities.³⁷

The successful de-identification of clinical trial accrual data through geographic aggregation enabled the visualization of accrual rates across the LCC-CA and the identification of low enrollment areas and influential social determinants of health. These findings provide an opportunity to implement targeted community outreach and research efforts to address these influential factors in low-enrollment communities. De-identifying data through geographic aggregation can facilitate more effective community outreach and research by improving catchment area surveillance, ultimately enabling a more efficient approach to addressing the burden of cancer within the catchment area.

Supplementary Material

NIHMS2029624-supplement-1.pdf^{(1MB, pdf)}

Footnotes

Disclosure of Interest

The authors report no conflicts of interest to disclose.

References

1.National Institutes of Health, National Cancer Institute. Cancer Center Support Grants (CCSGs) for NCI Designated Cancer Centers (P30). PAR-12–298, (2012). [Google Scholar]
2.National Institutes of Health, National Cancer Institute. Cancer Center Support Grants (CCSGs) for NCI Designated Cancer Centers (P30). PAR-17–095, (2016). [Google Scholar]
3.National Institutes of Health, National Cancer Institute. Cancer Center Support Grants (CCSGs) for NCI-designated Cancer Centers (P30). PAR-21–321, (2021). [Google Scholar]
4.Tai CG & Hiatt RA The Population Burden of Cancer: Research Driven by the Catchment Area of a Cancer Center. Epidemiol. Rev 39, 108–122 (2017). [DOI] [PubMed] [Google Scholar]
5.Manne SL et al. Current Approaches to Serving Catchment Areas in Cancer Centers: Insights from the Big Ten Cancer Research Consortium Population Science Working Group. Cancer Epidemiol. Biomarkers Prev. 32, 465–472 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Geyer NR & Lengerich EJ LionVu: A Data-Driven Geographical Web-GIS Tool for Community Health and Decision-Making in a Catchment Area. Geographies 3, 286–302 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Hawk ET et al. Five National Cancer Institute–designated cancer centers’ data collection on racial/ethnic minority participation in therapeutic trials: A current view and opportunities for improvement. Cancer 120, 1113–1121 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Holguin D Spatial Analysis of Clinical Trial Accrual Within an NCI Comprehensive Cancer Center Catchment Area by Race and Ethnicity. (2022). [Google Scholar]
9.Office for Civil Rights, HHS. Standards for privacy of individually identifiable health information. Final rule. Fed Regist. 45 CFR, Parts 160–4, (2002). [PubMed] [Google Scholar]
10.Department of Health and Human Services. Uses and disclosures for which an authorization or opportunity to agree or object is not required. 45 CFR 164.512, (2000). [Google Scholar]
11.National Cancer Institute & Center for Disease Control. Suppression of Incidence (and Death) Rates (and Trends) and Case Counts. (2024). [Google Scholar]
12.McLaughlin CC Confidentiality Protection in Publicly Released Central Cancer Registry Data. J. Regist. Manag 29, 84–88 (2002). [Google Scholar]
13.Department of Health and Human Services. Definitions. 45 CFR 160.103, (2000). [Google Scholar]
14.Department of Health and Human Services. Other requirements relating to uses and disclosures of protected health information. 45 CFR 164.514, (2000). [Google Scholar]
15.Committee on Strategies for Responsible Sharing of Clinical Trial Data, Board on Health Sciences Policy, & Institute of Medicine. Sharing Clinical Trial Data: Maximizing Benefits, Minimizing Risk. 18998 (National Academies Press, Washington, D.C., 2015). doi: 10.17226/18998. [DOI] [PubMed] [Google Scholar]
16.Stamm A & Babcock G Gatpkg: Geographic Aggregation Tool (GAT). (2023). [Google Scholar]
17.Simon G et al. Toolkit for Assessing and Mitigating Risk of Re-identification when Sharing Data Derived from Health Records. [Google Scholar]
18.Brokamp C, Wolfe C, Lingren T, Harley J & Ryan P Decentralized and reproducible geocoding and characterization of community and environmental exposures for multisite studies. J. Am. Med. Inform. Assoc 25, 309–314 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
19.R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing (2023). [Google Scholar]
20.United States Census Bureau. 2022 TIGER/Line Shapefiles: Census Tracts. (2022). [Google Scholar]
21.U.S. Census Bureau. 2018–2022 American Community Survey 5-Year Estimates. (2022). [Google Scholar]
22.Centers for Disease Control and Prevention. PLACES: Local Data for Better Health, Census Tract Data 2023 release. (2023). [Google Scholar]
23.Brierley CK et al. Low participation rates and disparities in participation in interventional clinical trials for myelodysplastic syndromes. Cancer 126, 4735–4743 (2020). [DOI] [PubMed] [Google Scholar]
24.Morshed RA et al. The influence of race and socioeconomic status on therapeutic clinical trial screening and enrollment. J. Neurooncol 148, 131–139 (2020). [DOI] [PubMed] [Google Scholar]
25.Vuong I et al. Overcoming Barriers: Evidence-Based Strategies to Increase Enrollment of Underrepresented Populations in Cancer Therapeutic Clinical Trials—a Narrative Review. J. Cancer Educ 35, 841–849 (2020). [DOI] [PubMed] [Google Scholar]
26.Cartmell KB et al. Patient barriers to cancer clinical trial participation and navigator activities to assist. in Advances in Cancer Research vol. 146 139–166 (Elsevier, 2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Donzo MW et al. Effects of socioeconomic status on enrollment in clinical trials for cancer: A systematic review. Cancer Med 13, e6905 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Deng L et al. The financial cost of cancer clinical trial participation. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Nipp RD et al. Financial Burden of Cancer Clinical Trial Participation and the Impact of a Cancer Care Equity Program. The Oncologist 21, 467–474 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Nipp RD et al. Addressing the Financial Burden of Cancer Clinical Trial Participation: Longitudinal Effects of an Equity Intervention. The Oncologist 24, 1048–1055 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Ford JG et al. Barriers to recruiting underrepresented populations to cancer clinical trials: A systematic review. Cancer 112, 228–242 (2008). [DOI] [PubMed] [Google Scholar]
32.Rivers D, August EM, Sehovic I, Lee Green B & Quinn GP A systematic review of the factors influencing African Americans’ participation in cancer clinical trials. Contemp. Clin. Trials 35, 13–32 (2013). [DOI] [PubMed] [Google Scholar]
33.Unger JM et al. “When Offered to Participate”: A Systematic Review and Meta-Analysis of Patient Agreement to Participate in Cancer Clinical Trials. JNCI J. Natl. Cancer Inst 113, 244–257 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Consolidated Appropriations Act, 2021. Public and Private Laws 116th Congress. H.R. 133, (2020). [Google Scholar]
35.Germino BB et al. Engaging African American breast cancer survivors in an intervention trial: culture, responsiveness and community. J. Cancer Surviv 5, 82–91 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Tatalovich Z et al. Developing Geographic Areas for Cancer Reporting Using Automated Zone Design. Am. J. Epidemiol 191, 2109–2119 (2022). [DOI] [PubMed] [Google Scholar]
37.Hantel A et al. Racial and ethnic associations with comprehensive cancer center access and clinical trial enrollment for acute leukemia. JNCI J. Natl. Cancer Inst djae067 (2024) doi: 10.1093/jnci/djae067. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHMS2029624-supplement-1.pdf^{(1MB, pdf)}

[R1] 1.National Institutes of Health, National Cancer Institute. Cancer Center Support Grants (CCSGs) for NCI Designated Cancer Centers (P30). PAR-12–298, (2012). [Google Scholar]

[R2] 2.National Institutes of Health, National Cancer Institute. Cancer Center Support Grants (CCSGs) for NCI Designated Cancer Centers (P30). PAR-17–095, (2016). [Google Scholar]

[R3] 3.National Institutes of Health, National Cancer Institute. Cancer Center Support Grants (CCSGs) for NCI-designated Cancer Centers (P30). PAR-21–321, (2021). [Google Scholar]

[R4] 4.Tai CG & Hiatt RA The Population Burden of Cancer: Research Driven by the Catchment Area of a Cancer Center. Epidemiol. Rev 39, 108–122 (2017). [DOI] [PubMed] [Google Scholar]

[R5] 5.Manne SL et al. Current Approaches to Serving Catchment Areas in Cancer Centers: Insights from the Big Ten Cancer Research Consortium Population Science Working Group. Cancer Epidemiol. Biomarkers Prev. 32, 465–472 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Geyer NR & Lengerich EJ LionVu: A Data-Driven Geographical Web-GIS Tool for Community Health and Decision-Making in a Catchment Area. Geographies 3, 286–302 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Hawk ET et al. Five National Cancer Institute–designated cancer centers’ data collection on racial/ethnic minority participation in therapeutic trials: A current view and opportunities for improvement. Cancer 120, 1113–1121 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Holguin D Spatial Analysis of Clinical Trial Accrual Within an NCI Comprehensive Cancer Center Catchment Area by Race and Ethnicity. (2022). [Google Scholar]

[R9] 9.Office for Civil Rights, HHS. Standards for privacy of individually identifiable health information. Final rule. Fed Regist. 45 CFR, Parts 160–4, (2002). [PubMed] [Google Scholar]

[R10] 10.Department of Health and Human Services. Uses and disclosures for which an authorization or opportunity to agree or object is not required. 45 CFR 164.512, (2000). [Google Scholar]

[R11] 11.National Cancer Institute & Center for Disease Control. Suppression of Incidence (and Death) Rates (and Trends) and Case Counts. (2024). [Google Scholar]

[R12] 12.McLaughlin CC Confidentiality Protection in Publicly Released Central Cancer Registry Data. J. Regist. Manag 29, 84–88 (2002). [Google Scholar]

[R13] 13.Department of Health and Human Services. Definitions. 45 CFR 160.103, (2000). [Google Scholar]

[R14] 14.Department of Health and Human Services. Other requirements relating to uses and disclosures of protected health information. 45 CFR 164.514, (2000). [Google Scholar]

[R15] 15.Committee on Strategies for Responsible Sharing of Clinical Trial Data, Board on Health Sciences Policy, & Institute of Medicine. Sharing Clinical Trial Data: Maximizing Benefits, Minimizing Risk. 18998 (National Academies Press, Washington, D.C., 2015). doi: 10.17226/18998. [DOI] [PubMed] [Google Scholar]

[R16] 16.Stamm A & Babcock G Gatpkg: Geographic Aggregation Tool (GAT). (2023). [Google Scholar]

[R17] 17.Simon G et al. Toolkit for Assessing and Mitigating Risk of Re-identification when Sharing Data Derived from Health Records. [Google Scholar]

[R18] 18.Brokamp C, Wolfe C, Lingren T, Harley J & Ryan P Decentralized and reproducible geocoding and characterization of community and environmental exposures for multisite studies. J. Am. Med. Inform. Assoc 25, 309–314 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing (2023). [Google Scholar]

[R20] 20.United States Census Bureau. 2022 TIGER/Line Shapefiles: Census Tracts. (2022). [Google Scholar]

[R21] 21.U.S. Census Bureau. 2018–2022 American Community Survey 5-Year Estimates. (2022). [Google Scholar]

[R22] 22.Centers for Disease Control and Prevention. PLACES: Local Data for Better Health, Census Tract Data 2023 release. (2023). [Google Scholar]

[R23] 23.Brierley CK et al. Low participation rates and disparities in participation in interventional clinical trials for myelodysplastic syndromes. Cancer 126, 4735–4743 (2020). [DOI] [PubMed] [Google Scholar]

[R24] 24.Morshed RA et al. The influence of race and socioeconomic status on therapeutic clinical trial screening and enrollment. J. Neurooncol 148, 131–139 (2020). [DOI] [PubMed] [Google Scholar]

[R25] 25.Vuong I et al. Overcoming Barriers: Evidence-Based Strategies to Increase Enrollment of Underrepresented Populations in Cancer Therapeutic Clinical Trials—a Narrative Review. J. Cancer Educ 35, 841–849 (2020). [DOI] [PubMed] [Google Scholar]

[R26] 26.Cartmell KB et al. Patient barriers to cancer clinical trial participation and navigator activities to assist. in Advances in Cancer Research vol. 146 139–166 (Elsevier, 2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] 27.Donzo MW et al. Effects of socioeconomic status on enrollment in clinical trials for cancer: A systematic review. Cancer Med 13, e6905 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] 28.Deng L et al. The financial cost of cancer clinical trial participation. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Nipp RD et al. Financial Burden of Cancer Clinical Trial Participation and the Impact of a Cancer Care Equity Program. The Oncologist 21, 467–474 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] 30.Nipp RD et al. Addressing the Financial Burden of Cancer Clinical Trial Participation: Longitudinal Effects of an Equity Intervention. The Oncologist 24, 1048–1055 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] 31.Ford JG et al. Barriers to recruiting underrepresented populations to cancer clinical trials: A systematic review. Cancer 112, 228–242 (2008). [DOI] [PubMed] [Google Scholar]

[R32] 32.Rivers D, August EM, Sehovic I, Lee Green B & Quinn GP A systematic review of the factors influencing African Americans’ participation in cancer clinical trials. Contemp. Clin. Trials 35, 13–32 (2013). [DOI] [PubMed] [Google Scholar]

[R33] 33.Unger JM et al. “When Offered to Participate”: A Systematic Review and Meta-Analysis of Patient Agreement to Participate in Cancer Clinical Trials. JNCI J. Natl. Cancer Inst 113, 244–257 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] 34.Consolidated Appropriations Act, 2021. Public and Private Laws 116th Congress. H.R. 133, (2020). [Google Scholar]

[R35] 35.Germino BB et al. Engaging African American breast cancer survivors in an intervention trial: culture, responsiveness and community. J. Cancer Surviv 5, 82–91 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] 36.Tatalovich Z et al. Developing Geographic Areas for Cancer Reporting Using Automated Zone Design. Am. J. Epidemiol 191, 2109–2119 (2022). [DOI] [PubMed] [Google Scholar]

[R37] 37.Hantel A et al. Racial and ethnic associations with comprehensive cancer center access and clinical trial enrollment for acute leukemia. JNCI J. Natl. Cancer Inst djae067 (2024) doi: 10.1093/jnci/djae067. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Enhancing Catchment Area Tools: A De-Identification Method for Integrating Clinical Trial Data with Cancer InFocus

Daniel Antonio

Todd Burus

Tarneka M Manning

Michael J Gurley

Giorgio Di Salvo

Jorge Andres Heneche

Carolyn Passaglia

Masha Kocherginsky

Melissa A Simon

Abstract

Background:

Methods:

Results:

Discussion:

INTRODUCTION

Catchment Area Data and Tools

Clinical Trial Accruals and HIPAA

METHODS AND MATERIALS

Expert Determination

Figure 1.

Clinical Trial Dataset

Geographic Aggregation

Risk Assessment

Cancer InFocus Integration and Analysis

RESULTS

De-identification

Table 1.

Cancer InFocus Visualization

Figure 2.

Social Determinants of Health and Clinical Trial Accruals

Demographic Factors

Table 2.

Socioeconomic Factors

Figure 3.

Healthcare Equity Factors

DISCUSSION

Supplementary Material

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases