Skip to main content
JNCI Journal of the National Cancer Institute logoLink to JNCI Journal of the National Cancer Institute
editorial
. 2022 Jul 25;114(10):1317–1319. doi: 10.1093/jnci/djac143

Cancer Risk Prediction Paradigm Shift: Using Artificial Intelligence to Improve Performance and Health Equity

Christoph I Lee 1,2,3,, Joann G Elmore 4
PMCID: PMC9552274  PMID: 35876797

In his 1962 book, The Structure of Scientific Revolutions, Thomas Kuhn articulated his views on how science changes over time—incremental developments of “normal science” and scientific revolutions or paradigm shifts that punctuate these more stable periods (1). We may be in an intriguing and very active era filled with scientific revolutions related to artificial intelligence (AI) in medicine. In this issue of the Journal, Lehman et al. (2) provide data to suggest a paradigm shift may be coming to breast cancer risk assessment methods.

For mammography screening benefits to outweigh harms, we believe that the paradigm must shift beyond an age-based approach to more personalized risk-based screening. However, our current risk assessment tools fall short of supporting this ideal. Thus far, several breast cancer risk prediction models have been validated and are clinically available, including the Tyrer-Cuzick model (IBIS) and National Cancer Institute Breast Cancer Risk Assessment Tool (BCRAT). These traditional risk models, which have modest predictive accuracy, rely heavily on the memory of patients and self-reported information on characteristics such as race and ethnicity, age of onset of menstruation, previous breast procedures, and family history of breast cancer (3).

These traditional models, unfortunately, are not accessible by all and do not work equally for all. For instance, these models cannot be used by those with a personal history of breast cancer or known genetic mutations, and they overestimate risk for women with a history of high-risk lesions, including atypia (4). Recently, risk models have come under fire for their worse predictive performance among Asian, Black, and Hispanic populations, likely a consequence of the models being trained and validated using mostly White populations (5).

Unlike traditional models that rely heavily on self-reported characteristics, AI-based models, or deep learning models, can be trained to predict future breast cancer risk based solely on mammography images alone and have the potential to be more objective and useable by any women regardless of their personal breast or genetic histories. These deep learning models can be automated to generate a risk prediction score for future cancers immediately after mammography images are obtained, potentially simplifying the risk stratification process that can guide women into more-or-less intensive future screening regimens.

Lehman et al. (2) compared the accuracy of their open access, deep learning, 5-year breast cancer risk model to 2 popular traditional risk models (the IBIS and BCRAT models, both for 5-year and lifetime risk). Their deep learning model was previously shown to have equivalent predictive accuracy across patient ages, races, and breast densities and was trained and validated on all women who routinely undergo screening regardless of their personal or genetic histories (6). Using 119 139 exams from 57 617 consecutive women seen at 5 facilities across the Mass General Brigham health system, they assessed the performance of their AI model to the 2 popular traditional risk models. In addition to a head-to-head-to-head comparison, the study investigators also simulated outcomes from a risk-based screening program where the top 50th percentile of patients based on each risk model (deep learning, IBIS, and BCRAT) were screened with mammography and then compared the resulting number of cancers detected and false-positive exams avoided.

Several key results are relevant to the burgeoning field of risk-based breast cancer screening. First, the 5-year deep learning model outperformed the 2 traditional clinical risk factor models, with an area under the receiver operating characteristic curve (AUC) for cancer prediction of 0.68 vs 0.57 for both traditional models (2). The AUC provides a measure of overall diagnostic accuracy, with 0.50 representing random chance and 1.00 representing perfect accuracy. When comparing outcomes across the 3 models for 5-year risk, the deep learning model categorized about 2 times more patients who would go on to develop cancer within 5 years than either of the traditional risk models (8.6/1000 screened vs 3.8/1000 screened with BCRAT and 4.4/1000 screened with IBIS). Second, in a simulation of a risk-based screening program where only the top 50th percentile of women based on 5-year risk predictions are invited to screen, the deep learning model captured the highest percentage of cancers of all 3 risk models (75% vs 35% for BCRAT and 39% for IBIS). The deep learning model also captured the same percentage of cancers in the simulation regardless of the woman’s race, whereas the traditional risk-based screening simulations captured more cancers among White vs non-White patients. Third, both the BCRAT and IBIS lifetime risk prediction models were no better than random chance (AUC values of 0.50), suggesting very poor discriminatory performance of lifetime cancer risk models compared with 5-year risk models. This finding is concerning because lifetime risk based on traditional risk models is currently used to determine the accepted threshold for triggering supplemental MRI screening.

Taken together, these key study findings portend potentially major shifts in the future practices and policies of risk-based breast cancer screening. If AI models based solely on mammography images have considerably improved accuracy over traditional risk models, shifting to AI-driven risk prediction could provide improved screening outcomes while also streamlining current workflows. Primary care providers may not need to spend valuable time during clinic visits to help patients calculate their breast cancer risk, and patients may not need to complete lengthy surveys about their family history, previous biopsies, hormone use, menopausal status, or other risk factors at the time of imaging. Instead, a patient’s most recent mammogram can be used to automatically assess the patient’s risk for future breast cancer and schedule a screening regimen that fits their risk level: more intensive screening for higher risk patients and less intensive screening for the average risk.

The study by Lehman et al. (2) also highlights the potential for AI to provide greater health equity in the practice of risk-based screening. Deep learning models do not require disclosure of self-reported race. At the same time, one recent report suggests that AI can distinguish race based solely on medical images and can inadvertently perpetuate racial biases in health care (7). Thus, going forward, having population diversity in external validation datasets will be critical to ensure that risk models are accurate across different populations and that any existing disparities are closed rather than widened. This study’s deep learning model was trained and validated in a more racially and ethnically diverse population than traditional risk models, and its code is open source, allowing for yet another layer of transparency and trust. However, one limitation of the current study is that non-White races and ethnicities were grouped together due to small percentages of specific minority groups. Future efforts are needed to ensure that this and other deep learning models work just as well for predicting cancers among Asian, Black, and Hispanic patients.

Moreover, it is our opinion that the threshold for more intensive screening should shift away from lifetime risk estimates and toward a shorter, more immediate 5-year risk estimate. Lehman et al. (2) found that all three 5-year risk estimates (deep learning, IBIS, and BCRAT) outperformed the 2 lifetime risk estimates (IBIS and BCRAT) for predicting future breast cancers. An individual’s cancer risk changes over time, and a more immediate timeframe for risk prediction would allow for more dynamic, personalized screening. Such a paradigm shift would have major implications for insurance coverage of services. For instance, supplemental screening MRI is currently covered by major payers only for those women with 20% or greater lifetime risk based on traditional risk models. New thresholds for supplemental screening would need to be determined for elevated 5-year risk. A movement toward this shorter timeframe for more intensive screening would align with the current timeframe used for consideration of chemoprevention (eg, considered for women with ≥1.67% or ≥3% 5-year risk).

This study by Lehman et al. (2) highlights deep learning’s promise in improving breast cancer risk estimates, workflow efficiency, and health equity to support future breast cancer screening programs. However, before widespread deployment, these deep learning tools require further refinement, performance improvement, independent external validation, and prospective assessment of comparative clinical outcomes. Of note, this study’s open-source deep learning model detected more cancers than traditional risk models in the simulated screening program but also led to many more false-positive exams, suggesting increases in both screening benefits and harms. Moreover, it is uncertain if additional cancers detected by deep learning will ultimately represent overdiagnosis of indolent cancers or represent the types of invasive cancers that can lead to lives saved if intervened upon earlier. If deep learning models can lead to more benefits and less harm, then we should embrace a paradigm shift and turn breast cancer risk prediction over to AI.

Funding

This work was supported in part by the National Cancer Institute at the National Institutes of Health (grant numbers R01 CA240403 to CIL and JGE, R01CA262023 to CIL).

Notes

Role of the funder: The funding source had no role in the preparation, review, or approval of the manuscript, and the decision to submit the manuscript for publication.

Disclosures: The authors report no financial conflicts of interest directly related to this editorial. CIL reports personal fees from GRAIL, Inc, for service on a data safety monitoring board, textbook royalties from McGraw Hill, Inc, Oxford University Press, and UpToDate, Inc, and personal fees from the American College of Radiology for service on a journal editorial board; all outside the submitted work. JGE reports royalties from UpToDate, Inc. JGE, who is a JNCI Associate Editor and coauthor on this editorial, was not involved in the editorial review or decision to publish this manuscript.

Author contributions: CIL, JGE: writing—original draft, writing—review & editing.

Contributor Information

Christoph I Lee, Department of Radiology, University of Washington School of Medicine, Seattle, WA, USA; De partment of Health Systems and Population Health, University of Washington School of Public Health, Seattle, WA, USA; Hutchinson Institute for Cancer Outcomes Research, Fred Hutchinson Cancer Center, Seattle, WA, USA.

Joann G Elmore, Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA.

Data Availability

There were no new data generated in the drafting of this manuscript.

References

  • 1. Kuhn TS.  The Structure of Scientific Revolutions. Chicago: University of Chicago Press; 1962. [Google Scholar]
  • 2. Lehman C, Mercaldo S, Lamb LR, et al.  Deep learning vs traditional breast cancer risk models to support risk-based mammography screening. J Natl Cancer Inst. 2022; 114(10):1355-1363. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Terry MB, Liao Y, Whittemore AS, et al.  10-year performance of four models of breast cancer risk: a validation study. Lancet Oncol. 2019;20(4):504-517. [DOI] [PubMed] [Google Scholar]
  • 4. Boughey JC, Hartmann LC, Anderson SS, et al.  Evaluation of the Tyrer-Cuzick (International Breast Cancer Intervention Study) model for breast cancer risk prediction in patients with atypical hyperplasia. J Clin Oncol. 2010;28(22):3591-3359. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Obermeyer Z, Powers B, Vogeli C, Mullainathan S.  Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447-453. [DOI] [PubMed] [Google Scholar]
  • 6. Yala A, Mikhael PG, Strand F, et al.  Multi-institutional validation of a mammography-based breast cancer risk model. J Clin Oncol. 2022;40(16):1732-1740. doi:10.1200/J Clin Oncol.21.01337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Gichoya JW, Banerjee I, Bhimireddy AR, et al.  AI recognition of patient race in medical imaging: a modelling study. Lancet Digit Health. 2022;4(6):e406-e414. doi: 10.1016/S2589-7500(22)00063-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

There were no new data generated in the drafting of this manuscript.


Articles from JNCI Journal of the National Cancer Institute are provided here courtesy of Oxford University Press

RESOURCES