Pancreatic cancer is a leading cause of cancer-related mortality in the United States.1 Due to its aggressive nature and the absence of an effective population-based screening strategy, risk prediction models have been developed to help identify high-risk patients for early detection.2, 3, 4 While these models hold promise, their performance may vary across different subgroups of the population, potentially exacerbating health-care disparities.5,6 This study aimed to access whether previously developed population-based machine learning and regression models for pancreatic cancer risk2,3 performance differently across diverse racial and ethnic groups.
We previously developed and externally validated 2 machine learning models (random survival forests (RSFs), eXtreme gradient boosting (XGB)) and one regression model (Cox proportional hazards regression (COX)) to predict the 18-month risk of pancreatic ductal adenocarcinoma (PDAC) following a clinic-based health-care visit (2007–2017) using data from Kaiser Permanente Southern California (KPSC) and the Veterans Affairs.2,3 The eligibility criteria, data sources, and predictor selection for these models were previously reported.2,3 In this study, we applied these 3 models to 6 independent validation datasets2,3 representing 4 racial and ethnic groups (Hispanic; Asian/Pacific Islander (PI); Black/African American; non-Hispanic White), with data sourced exclusively from KPSC.
Model performance was evaluated using multiple metrics, including the c-index, sensitivity, positive predictive value (PPV), false positive rate, and false negative rate. Measures of fairness, such as equalized odds, predictive parity, and predictive equality, were also assessed. To enable meaningful comparisons, specificity was fixed at 97.5%, except for the c-index, which is threshold-independent. Calibration plots further illustrated model performance across 5 risk groups (<50th, 50–74th, 75–89th, 90–94th, 95–100th percentiles), offering insights into the alignment between predicted risks and observe outcomes. Detailed metric definitions are available in the Online Supplemental Document. The study was approved by the Institutional Review Board of KPSC.
The validation datasets varied slightly in size, reflecting the demographic composition of the KPSC patient population. Non-Hispanic White patients comprised the largest group (163k), followed by Hispanic (107k), Asian/PI (39k), and Black or African American patients (34k–35k). PDAC cases within 18 months ranged from 167 to 186 for non-Hispanic White patients, 46–64 for Black or African American patients, 26–32 for Asian/PI patients, and 86–91 for Hispanic patients. Incident rates were notably higher among Black or African American patients (1.01–1.41 per 1000 person-years), nearly double those observed in Asian/PI (0.51–0.64/1000 person-years) and Hispanic patients (0.58–0.66/1000 person-years). Mean ages were 63, 62, 61, and 60 years for non-Hispanic White, Black or African American, Asian/PI, and Hispanic patients, respectively. Female representation ranged from 52% in non-Hispanic White to 58% in Black or African American patients (Table A1).
The analysis revealed several key findings. C-index measures, reflecting the models’ discriminative ability, were comparable across racial and ethnic groups, suggesting overall consistency in identifying high-risk patients. Similarly, false positive rates remained relatively uniform, indicating similar error rates across populations. However, notable differences emerged in PPV, particularly among Black or African American patients, who exhibited the highest PPV values across all models, with predictive parity values of 0.55%–0.78% when compared to non-Hispanic White patients. This elevation in PPV was driven by the higher PDAC incidence rates in this group, underscoring a fairness challenge. Adjusting risk thresholds during implementation could help address this issue if achieving equality in PPV across racial and ethnic groups is deemed important. Sensitivity, another critical measure, was slightly higher for Hispanic patients when using the RSF and XGB models, although this trend was not observed with the COX model (Table).
Table.
Model Performance Measures And Model Fairness Measures Based on the Average of the 6 Validation Datasets for Each Modeling Method, Mean (SD)
| Hispanic | Asian/PI Only, Non-Hispanic | Black or African American Only, Non-Hispanic | White Only, Non-Hispanic | |
|---|---|---|---|---|
| Model performance measures (%), mean (SD) | ||||
| RSF | ||||
| c-index | 0.79 (0.03) | 0.78 (0.02) | 0.77 (0.04) | 0.78 (0.02) |
| Sensitivity | 22.55 (3.68) | 21.03 (5.29) | 20.83 (2.93) | 18.45 (1.58) |
| PPV | 1.07 (0.22) | 0.90 (0.25) | 1.88 (0.43) | 1.10 (0.13) |
| False positive rate | 2.48 (0.02) | 2.48 (0.01) | 2.46 (0.02) | 2.50 (0.05) |
| XGB | ||||
| c-index | 0.79 (0.02) | 0.78 (0.02) | 0.75 (0.03) | 0.76 (0.01) |
| Sensitivity | 24.28 (3.68) | 22.81 (1.98) | 19.87 (4.39) | 20.44 (1.85) |
| PPV | 1.15 (0.20) | 0.97 (0.10) | 1.8 (0.54) | 1.25 (0.10) |
| False positive rate | 2.47 (0.01) | 2.48 (0.01) | 2.46 (0.01) | 2.47 (0.01) |
| COX | ||||
| c-index | 0.77 (0.02) | 0.77 (0.01) | 0.75 (0.03) | 0.76 (0.01) |
| Sensitivity | 18.41 (2.40) | 20.57 (5.68) | 20.38 (5.41) | 19.04 (1.58) |
| PPV | 0.87 (0.12) | 0.88 (0.25) | 1.85 (0.66) | 1.17 (0.08) |
| False positive rate | 2.48 (0.01) | 2.48 (0.01) | 2.46 (0.02) | 2.48 (0.01) |
| Model fairness assessment using non-Hispanic White as the reference (%), mean (95% CI) | ||||
| RSF | ||||
| Equalized odds | 4.10 (0.91, 7.29) | 2.58 (−2.72, 7.88) | 2.38 (−0.52, 5.29) | Ref |
| Predictive parity | −0.03 (−0.25, 0.18) | −0.20 (−0.47, 0.07) | 0.78 (0.39, 1.17) | Ref |
| Predictive equality | −0.02 (−0.04, 0.01) | −0.02 (−0.06, 0.02) | −0.04 (−0.10, 0.01) | Ref |
| XGB | ||||
| Equalized odds | 3.84 (0.77, 6.92) | 2.38 (−0.12, 4.87) | −0.57 (−4.06, 2.92) | Ref |
| Predictive parity | −0.10 (−0.29, 0.09) | −0.28 (−0.43, −0.13) | 0.55 (0.08, 1.01) | Ref |
| Predictive equality | 0.00 (0.00, 0.01) | 0.01 (0.00, 0.01) | −0.01 (−0.02, −0.00) | Ref |
| COX | ||||
| Equalized odds | −0.06 (−2.67, 1.40) | 1.53 (−3.63, 6.69) | 1.34 (−3.12, 5.80) | Ref |
| Predictive parity | −0.30 (−0.39, −0.20) | −0.29 (−0.54, −0.03) | 0.68 (0.11, 1.26) | Ref |
| Predictive equality | 0.00 (0.00, 0.01) | 0.01 (0.00, 0.02) | −0.01 (−0.03, 0.00) | Ref |
Please refer to the online supplement document for the definitions of the model performance measures and model fairness measures (link). Except for c-index, all the other measures were estimated based on a fixed specificity (97.5%).
CI, confidence interval; SD, standard deviation.
In terms of calibration, the models demonstrated general consistency across groups; however, the RSF and COX models tended to underestimate PDAC risk among Black/African American patients in the highest risk categories, whereas the XGB model overestimated risk for most groups. Interestingly, among Black or African American patients, the XGB model demonstrated more accurate risk predictions, as calibration plots showed alignment with observed outcomes (Figure).
Figure.
Model calibration by racial and ethnic groups for each of the 3 risk prediction models. x-axis: predicted; y-axis: observed. The 5 clusters represent the 5 risk groups defined by the ranges of predicted risks. Non-Hispanic White: green; Black or African American: Orange; Asian/PI: black; Hispanic: blue.
Our findings provide important context for understanding disparities in risk prediction. For instance, the elevated PPV observed among Black or African American patients highlights a need for tailored strategies to address fairness challenges. The disparities observed in our risk prediction model have significant implications for clinical decision making and health outcomes. If a model underestimates risk in certain racial or ethnic groups, these patients may be less likely to be flagged for early detection, leading to delays in diagnoses, and consequently, poorer survival outcomes. Conversely, overestimation of risk in other groups may lead to unnecessary diagnostic workups, increased patient anxiety, and inefficient use of health-care resources. These disparities in model predictions could further exacerbate existing inequities in early detection and treatment of cancer. To mitigate these issues, it is critical to consider bias-aware implementation strategies when deploying predictive models in clinical settings. Potential approaches include adjusting risk thresholds, incorporating socioeconomic and access-to-care variables, and developing fairness-aware risk prediction models, which aim to mitigate bias during the model training process.7,8
Despite the XGB model’s tendency to overestimate PDAC risk, its utility in identifying high-risk patients through relative ranking, such as the top 5% of patients, remains valid. This adaptability underscores the importance of selecting appropriate implementation strategies based on the specific goals of risk prediction.
Several limitations of this study warrant consideration. First, the validation datasets were derived from the same population and the same time frame as the development datasets. In addition, the study was conducted in an integrated health-care system with relatively low PDAC incidence,2 potentially limiting generalizability to populations with higher PDAC prevalence. Finally, other crucial factors contributing to health inequities, such as socioeconomic status were not considered in our analysis.
In conclusion, the 3 previously validated risk prediction models demonstrated comparable overall discrimination and calibration across all racial and ethnic groups. However, Black or African American patients experienced slightly compromised predictive parity compared to other racial and ethnic groups due to elevated PDAC incidence rate, highlighting the need for further refinement to enhance fairness in predictive modeling. Future research incorporating multiple-center data and diverse health-care settings can improve the model’s generalizability. In addition, future efforts should prioritize fairness alongside model performance and explore disparities across broader dimensions of health equity, including socioeconomic factors. Finally, attention should also be given to disparities in cancer surveillance and strategies to improve access for underserved populations, as these challenges have been well-documented in prior studies.9
Footnotes
Conflicts of Interest: The authors disclose no conflicts.
Funding: Research reported in this publication was supported by the National Cancer Institute of the National Institutes of Health under Award Number R01CA230442. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Ethical Statement: This study was reviewed and approved by the Kaiser Permanente Southern California Institutional Review Board. Informed consent for this work was waived.
Data Transparency Statement: Anonymized data, analytic methods, and study materials that support the findings of this study may be made available from the investigative team in the following conditions: (1) agreement to collaborate with the study team on all publications, (2) provision of external funding for administrative and investigator time necessary for this collaboration, (3) demonstration that the external investigative team is qualified and has documented evidence of training for human subjects protections, and (4) agreement to abide by the terms outlined in data use agreements between institutions.
Reporting Guidelines: STROBE.
Material associated with this article can be found, in the online version, at https://doi.org/10.1016/j.gastha.2025.100657.
Supplementary materials
References
- 1.National Cancer Institute Surveillance Epidemiology and end results program. Cancer Stat Facts: Pancreatic Cancer [November 17, 2024] https://seer.cancer.gov/statfacts/html/pancreas.html Available at:
- 2.Chen W., et al. Am J Gastroenterol. 2023;118(1):157–167. doi: 10.14309/ajg.0000000000002050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Chen W., et al. Pancreatology. 2023;23(4):396–402. doi: 10.1016/j.pan.2023.04.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Jia K., et al. EBioMedicine. 2023;98:104888. doi: 10.1016/j.ebiom.2023.104888. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Kartoun U., Khurshid S., Kwon B.C., et al. Sci Rep. 2022;12(1):12542. doi: 10.1038/s41598-022-16615-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Evans C.V., Johnson E.S., Lin J.S. Assessing algorithmic bias and fairness in clinical prediction models for preventive services a health equity methods project for the U.S. Preventive Services Task Force. Agency for Healthcare Research and Quality (US); Rockville, MD: 2023. AHRQ Publication No. 23-05308-EF-1. [Google Scholar]
- 7.Zafar MBV I., Gomez-Rodriguez M., Gummadi K.P. J Mach Learn Res. 2019;20:1–42. [Google Scholar]
- 8.Agarwal AB A., Dudik M., Langford J., et al., editors. Proceedings of the 35 th international conference on machine learning. PMLR; Stockholm, Sweden: 2018. A reductions approach to fair classification. [Google Scholar]
- 9.Katona B.W., et al. Cancer Prev Res (Phila) 2023;16(6):343–352. doi: 10.1158/1940-6207.CAPR-22-0529. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

