Artificial Intelligence-Driven Mammography-Based Future Breast Cancer Risk Prediction: A Systematic Review

Cody M Schopf; Ojas A Ramwala; Kathryn P Lowry; Solveig Hofvind; M Luke Marinovich; Nehmat Houssami; Joann G Elmore; Brian N Dontchos; Janie M Lee; Christoph I Lee

doi:10.1016/j.jacr.2023.10.018

. Author manuscript; available in PMC: 2025 Feb 1.

Published in final edited form as: J Am Coll Radiol. 2023 Nov 8;21(2):319–328. doi: 10.1016/j.jacr.2023.10.018

Artificial Intelligence-Driven Mammography-Based Future Breast Cancer Risk Prediction: A Systematic Review

Cody M Schopf ¹, Ojas A Ramwala ², Kathryn P Lowry ¹, Solveig Hofvind ³, M Luke Marinovich ⁴, Nehmat Houssami ⁴, Joann G Elmore ⁵, Brian N Dontchos ¹, Janie M Lee ¹, Christoph I Lee ^1,⁶

PMCID: PMC10926179 NIHMSID: NIHMS1944123 PMID: 37949155

Abstract

Purpose:

To summarize the literature regarding the performance of mammography-image based artificial intelligence (AI) algorithms, with and without additional clinical data, for future breast cancer risk prediction.

Materials and Methods:

A systematic literature review was performed using six databases (medRixiv, bioRxiv, Embase, Engineer Village, IEEE Xplore, and PubMed) from 2012 through September 30, 2022. Studies were included if they used real-world screening mammography exams to validate AI algorithms for future risk prediction based on images alone or in combination with clinical risk factors. The quality of studies was assessed, and predictive accuracy was recorded as the area under the receiver operating characteristic curve (AUC).

Results:

16 studies met inclusion and exclusion criteria, of which 14 studies provided AUC values. The median AUC performance of AI image-only models was 0.72 (range 0.62–0.90) compared to 0.61 for breast density or clinical risk factor-based tools (range 0.54–0.69). Of the 7 studies that compared AI image-only performance directly to combined image + clinical risk factor performance, 6 demonstrated no significant improvement, while 1 study demonstrated increased improvement.

Conclusions:

Early efforts for predicting future breast cancer risk based on mammography images alone demonstrate comparable or better accuracy to traditional risk tools with little or no improvement when adding clinical risk factor data. Transitioning from clinical risk factor-based to AI image-based risk models may lead to more accurate, personalized risk-based screening approaches.

Introduction

There is a growing movement towards tailoring breast cancer screening to individual future cancer risk.¹ In current clinical practice, there are multiple traditional risk assessment tools to estimate an individual’s future risk of breast cancer, including the Tyrer-Cuzick, Gail, and Breast Cancer Surveillance Consortium risk models.^2–4 These clinical risk assessment tools are based on self-reported factors, including family history of breast cancer, age at first childbirth, race/ethnicity, and history of previous breast biopsy.^4,5 Yet, self-reported factors are prone to recall bias. Moreover, these clinical risk assessment tools have been shown to have variable predictive accuracy.^5,6

More recently, based on evidence that mammography image features have shown risk associations with breast cancer,^7,8 deep learning artificial intelligence (AI) algorithms have been developed to use features from screening mammography images themselves to predict future breast cancer risk. If these image-based AI models are more accurate than traditional clinical risk assessment tools, AI models may provide a pathway to more objective and potentially improved personalized risk-based screening regimens. For instance, more accurate short-term risk models may help identify women who would benefit from more frequent (annual versus biennial) or more intensive (supplemental MRI) screening to help detect breast cancers in earlier stages.⁹

However, prior to their clinical implementation, emerging AI risk prediction models will have to be validated on independent, real-world screening exams to demonstrate generalizability. Thus, our primary objective was to perform a systematic review of the literature regarding the predictive accuracy of AI-driven image-only breast cancer risk models that use screening mammograms for validation. Our secondary objective was to determine the incremental improvement in accuracy, if any, when adding traditional clinical risk factor data to image-only AI data for future breast cancer risk prediction in real-world screening cohorts.

Methods

We used the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement¹⁰ for performing this literature review. Our study used only publicly available data and was exempt from Institutional Review Board approval.

Databases and Search Terms

With the assistance of a professional medical librarian, we searched the English literature from inception through September 30, 2022, using six major databases: Embase, PubMed, IEEE Xplore, Engineer Village, medRixiv, and bioRxiv. These databases include traditional medical literature databases as well as online archives where many artificial intelligence and data science efforts are published in pre-print form.¹¹ For each database, subject headings and free-text keywords were used to search across the following broad topics: mammography, artificial intelligence and deep learning, and breast cancer risk prediction. The detailed search terms are available in the online Supplement (see eTable 1).

Study Inclusion and Exclusion

Patient population: We included all studies that tested and reported on the performance of deep learning image-based AI algorithms for future breast cancer risk prediction (both in situ and invasive cancers) on screening mammograms (both digital mammography and digital breast tomosynthesis) from real-world screening cohorts following the screening guidelines of the countries of origin. We excluded studies that reported on risk prediction algorithms that only used publicly available mammography datasets, as these are datasets that are frequently used for training and development. We also excluded studies that only reported on the training and development of new AI algorithms without validation. If there were separate publications for training, testing, and validation of the same deep learning algorithm, we only included the study reporting the external validation performance of the algorithm using an independent test set of screening mammograms.

Intervention:

Studies had to include the evaluation of a deep learning AI algorithm that predicted future breast cancer risk using analysis of whole, full-view breast mammography images. Studies were excluded if they focused on the detection of breast cancer on images (versus future cancer risk prediction on negative mammograms), used image analysis techniques focused only on specific imaging features (such as mass or calcification evaluation rather than all abnormalities on the entire mammogram image), or studies that focused on AI to improve radiology workflow rather than risk prediction.

Comparison:

We did not require comparison of AI performance based on images alone to clinical risk factors (e.g., breast density, family history) or existing traditional clinical risk prediction models. However, for those studies with comparator data available, we recorded the performance of future cancer risk prediction based on clinical risk factors alone and/or in combination with the AI algorithm.

Outcomes:

The primary outcome of interest was the overall accuracy of AI for future breast cancer risk prediction as defined by the area under the receiver operating characteristic (ROC) curve (AUC). Qualified studies that did not provide AUC values were included in the descriptive summary. The secondary outcome of interest was any incremental improvement in predictive accuracy when combining AI with traditional clinical risk factors.

Data Extraction

Two reviewers (C.M.S. and C.I.L.) independently assessed all titles from the literature search for inclusion and exclusion criteria, which was further refined by a review of article abstracts. A third author (O.A.R.) resolved any conflicts during the independent review. If the same study was reported in conference proceedings or online pre-print and then a subsequent peer-reviewed paper, only the subsequent peer-reviewed paper was included.

We collected key study characteristics using a standardized data extraction tool (see Supplement). The two reviewers independently extracted data from each study at the time of full-text review with any disagreements of extraction parameters resolved by discussion and consensus. Data systematically collected for each study included title, author name(s), publication date, a description of the overall mammography dataset including date range, a description of the mammography datasets used for training, internal validation, and/or external validation, and a general description of the deep learning AI model.

Breast cancer risk prediction-specific data collection included a description of the clinical cohort (including number of exams, number of patients, and number of known cancers within cohort), the risk prediction period after negative screening mammography (e.g., 1-year risk, 5-year risk), imaging data type (e.g., full-field digital mammography, digital breast tomosynthesis), the follow-up period for determining future cancers (e.g., 1-year of follow-up data for negative screening exams), and comparison of AI performance with that of any traditional clinical risk factor (e.g., breast density) or traditional clinical risk prediction tool (e.g., Gail, Tyrer-Cuzick). We recorded how studies determined breast cancer ground truth (e.g., biopsy/pathology, regional cancer registry linkage).

Data Synthesis and Analysis

We have provided a narrative synthesis of our literature review and descriptive summary data. We reported estimates of accuracy for future cancer risk prediction by AUCs and presented descriptive plots for study-level differences in AUCs. We generated descriptive plots for the AUCs from clinical risk factor prediction tools, if reported. We also reported clinical risk predictive accuracy based on breast density (e.g., BI-RADS density determined by radiologists or a quantitative calculation of volumetric density) in the descriptive plot as clinical risk comparators, as some studies used it as the main risk-based performance comparator. For applicable studies, we generated a descriptive plot for AUCs of combined AI imaging and clinical risk factor models.

Quality Assessment

The overall quality of the studies was determined by two reviewers (C.M.S. and C.I.L) independently with the application of the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool.¹² Briefly, this tool enables transparent rating for bias and applicability of diagnostic accuracy studies in four key domains: (1) patient selection, (2) the index test itself, (3) the reference standard used, and (4) patient flow and timing of the test. The two reviewers, and the third reviewer if needed, resolved conflicts in quality assessment in discussion.

Results

A total of 7,633 citations resulted from our search after deduplication (Figure 1). After independent title and abstract review by two investigators, 51 studies remained. After the full-text review, 16 studies met all inclusion and exclusion criteria and were included for analysis (Table 1).^13–28

Table 1.

Summary of AI Risk Prediction Validation Study Designs and Populations

Author	Study Design	Study Setting	Exam Modality	Exam Years	Number of Exams	Number of Patients	Number of Cancers	Risk Prediction Period	Ground Truth Source
Arasu et al (2022)	Retrospective Cohort	1 regional HMO	DM	2016	13881	13881	4672	5 years	Registry
Arefan et al (2020)	Retrospective Case Control	1 academic center	DM	2013	226	226	113	NR	Pathology
Dadsetan et al (2021)	Retrospective Case Control	1 academic center	DM	NR	1224	306	153	1–3 years	Biopsy
Dadsetan et al (2022)	Retrospective Case Control	1 academic center	DM	2007–2014	1000	200	100	1 year	Pathology
Dembower et al (2020)	Retrospective Cohort	1 academic center	DM	2008–2015	2283	2283	278	NR	Registry
Gastounioti et al (2022)	Retrospective Case Control	1 academic center	DM	2010–2015	5139	5139	176	2 years	Biopsy and Registry
Ha et al (2019)	Retrospective Case Control	1 academic center	DM	2011–2017	1474	737	210	NR	NR
Hinton et al (2019)	Retrospective Case Control	4 radiology facilities	DM	2006–2015	1420	355	355	NR	Registry
Lang et al (2021)	Retrospective Cohort	4 radiology facilities	DM	2013–2017	429	429	429	1.5–2 years	Pathology
Lehman et al (2022)	Retrospective Cohort	5 radiology facilities	DM	2017–2021	119139	57617	681	5 years	Registry
Mohamed et al (2022)	Retrospective Cohort	1 institution	DM	2015–2019	271	271	141	> 1 year	Hospital Information System
Wanders et al (2022)	Retrospective Case Control	2 regions of national screening program	DM	2011–2015	6883	6883	2222	<= 3 years	Pathology and Registry
Yala et al (2019)	Retrospective Cohort	1 academic center	DM	2009–2012	88994	39571	3314	5 years	Registry
Yala et al (Sci Trans Med, 2021)	Retrospective Cohort	3 institutions	DM	2008–2016	262318^*	70811^*	6579^*	1–5 years	Pathology and Registry
Yala et al (JCO, 2022)	Retrospective Cohort	7 facilities (multiple countries)	DM	2008–2020	128793	62185	3815	5 years	Registry or Hospital Information System
Zhu et al (2021)	Retrospective Case Control	2 radiology facilities	DM	2006–2014	25096	6369	1609 screen, 351 interval	NR	Pathology or Registry

Open in a new tab

DM = digital mammography. HMO = health maintenance organization. NR = not reported.

2 additional datasets were used in this study for external validation, not included in the sample sizes above.

Study Characteristics

All included studies were published in 2019 or later and used a retrospective study design. The full range of examination years used within the datasets spanned 2007 to 2021. Sample sizes across the studies varied greatly with the number of exams ranging from 226 to greater than 260,000. The number of total cancers evaluated also varied ranging from 100 to more than 6000 known cancer cases. Of these retrospective studies, 50% (8/16) provided consistent ground truth of breast cancer ascertainment through linkage to regional or state cancer registries, while the other half did not directly report their determination of ground truth or relied at least in part on institutional data to determine breast cancer status rather than registry linkage. All studies reported on short-term breast cancer risk prediction (range: 1–5 years).

Of the 16 studies included in our systematic review, most (88%, 14/16) provided AUC values as the primary metric for predictive accuracy (Table 2). Within this study group, 71% (10/14) of studies provided a direct comparison to a clinical risk predictor tool, and 50% (7/14) also included an AI model that incorporated both image analysis and clinical risk factor data.

Table 2.

Maximum Predictive Accuracy of Image-Only AI Algorithms and Clinical Risk Models

Author	Clinical Risk Prediction Comparator	Max Clinical Risk Factor Prediction Performance (95% CI)	Max Image Only AI Prediction Performance (95% CI)	Max Image + Clinical Risk AI Prediction Performance (95% CI)	Statistic
Arasu et al (2022)	BCSC	0.62 (0.58–0.66)	0.67 (0.66–0.68)	0.67 (0.66–0.68)	AUC
Arefan et al (2020)	Breast Density	0.54 (0.49–0.59)	0.73 (0.68–0.78)
Dadsetan et al (2021)			0.62
Dadsetan et al (2022)			0.67 (0.59–0.75)
Dembower et al (2020)	Breast Density	0.6 (0.58–0.61)	0.65 (0.63–0.66)	0.66 (0.64–0.67)
Gastounioti et al (2022)	Gail	0.55 (0.50–0.60)	0.68 (0.64–0.72)
Hinton et al (2019)	Breast Density	0.65	0.82	0.82
Lehman et al (2022)	Tyrer-Cuzick v8 NCI BCRAT	0.56 (0.53–0.59) 0.58 (0.54–0.61)	0.68 (0.66–0.70)
Mohamed et al (2022)			0.75
Wanders et al (2022)	Breast Density	0.69 (0.67–0.71)	0.73 (0.71–0.76)	0.79 (0.77–0.81)
Yala et al (2019)	Tyrer-Cuzick v8	0.62 (0.57–0.66)	0.68 (0.64–0.73)	0.7 (0.66–0.75)
Yala et al (STM, 2021)	Tyrer-Cuzick v8	0.66 (0.61–0.71)	0.90 (0.87–0.93)	0.84 (0.81–0.88)
Yala et al (JCO, 2022)			0.87 (0.84–0.91)
Ha et al (2019)	Breast Density	0.62 (0.59–0.65)	0.66 (0.63–0.69)	0.66 (0.63–0.69)	C Statistic
Lang et al (2021)			19.3% (15.9–23.4%)		Percent Reduction Interval Cancers
Zhu et al (2021)	Breast Density	1.67 (1.4–1.9)	4.42 (3.4–5.7)		Odds Ratio

Open in a new tab

Eight of the studies provided demographic information of their study populations that included race and ethnicity data. Five of these studies also compared the predictive accuracy of their AI models between White and non-White populations.

Quality Assessment

All studies within the systematic review had high risk or unclear risk for bias or applicability concerns (Table 3), commonly related to patient selection. All studies relied upon a retrospective study approach, with half (8/16) utilizing a case-control or case-case methodology that are categorized as being prone to bias per QUADAS-2. Most studies also demonstrated a high or unclear risk of bias related to the reference standard, often due to a lack of registry linkage ground truth.

Table 3.

Quality Assessment of Diagnostic Accuracy Studies Using QUADAS-2¹²

Author	Patient Selection Risk of Bias	Index Test Risk of Bias	Reference Standard Risk of Bias	Flow and Timing Risk of Bias
Arasu et al (2022)	Low	Unclear	Low	Unclear
Arefan et al (2020)	High	Unclear	Unclear	Unclear
Dadsetan et al (2021)	High	Unclear	Unclear	Unclear
Dadsetan et al (2022)	High	Unclear	Unclear	Low
Dembower et al (2020)	Unclear	Unclear	Low	Unclear
Gastounioti et al (2022)	High	Unclear	Unclear	Unclear
Ha et al (2019)	High	Unclear	High	Unclear
Hinton et al (2019)	High	Unclear	Unclear	Unclear
Lang et al (2021)	High	Unclear	Unclear	Unclear
Lehman et al (2022)	Low	Unclear	Low	Unclear
Mohamed et al (2022)	High	High	High	High
Wanders et al (2022)	Unclear	Unclear	Unclear	Unclear
Yala et al (2019)	Low	Unclear	Low	Low
Yala et al (STM, 2021)	Low	Unclear	Low	Low
Yala et al (JCO, 2022)	Low	Unclear	Low	Low
Zhu et al (2021)	High	Unclear	Unclear	Unclear

Open in a new tab

Risk of bias and applicability is measured as High risk, Unclear risk, or Low risk.

Predictive Accuracy

The most common metric reported for predictive accuracy was AUC (Table 2, Figure 2). A single study provided C-statistic values, determined equivocal to AUC on review as the study used a binary outcome. The maximum AUC performance of AI image-only models ranged from 0.62–0.90 with a median of 0.72 compared to clinical risk factor-based maximum AUC performance of 0.54–0.69 with a median of 0.61 (Figure 3). Of the seven studies with a combined image and clinical risk factor AI tool, the maximum AUC performance ranged from 0.66–0.84, with a median of 0.73. Two studies were included within the systematic review that did not include AUC values, instead providing the odds ratio or percent reduction of interval cancers.

Figure 3. — Box and whisker plot comparison of the maximum AUC reported for breast cancer risk prediction across all studies based on image-only AI models, image + clinical risk factor AI models, and clinical risk factors alone. Provided plots include interquartile range, median value denoted as “x” and average denoted by horizontal line.

Of the seven studies with AUC as a primary metric that compared AI image-only performance directly to a combination of image + clinical risk factor performance, six demonstrated no significant improvement, while the remaining study demonstrated increased improvement (AUC improvement: +0.06).

A descriptive subgroup analysis was performed comparing AUC values between studies that provided consistent regional cancer registry linkage as their ground truth versus studies without consistent registry linkage (eTable 3). The maximum AUC values within this comparison showed slightly higher median predictive accuracy of AI image-only models for studies that used consistent registry linkage versus studies without consistent registry linkage.

For all five studies that compared predictive accuracy between race and ethnic groups, the AI models showed similar predictive accuracy between White and non-White populations (eTable 4). Four of these studies included a clinical risk factor tool comparison, of which three (75%) showed significant improvement in risk prediction with the AI model compared to clinical risk factors for non-White women.

Discussion

Our systematic review demonstrates that early AI efforts for predicting future breast cancer risk based on mammography images alone may provide comparable or better accuracy to risk based on breast density or traditional clinical risk factor-based measures being used today. However, the addition of traditional clinical risk factor data to the AI models did not seem to provide meaningful improved performance compared to the image-only AI models across the included studies. If these early findings are supported by larger prospective studies, AI-driven risk prediction may lead to a more personalized approach for screening, with women at higher risk receiving more frequent or more intensive screening for earlier cancer detection.

Currently published reports on AI for predicting future breast cancer risk remain limited because they mostly consist of small cohorts and are mainly composed of retrospective, case-control studies which are prone to selection bias. In addition, determination of a robust ground truth (capture of future cancers during a specific follow-up period) was mostly lacking or inconsistent across the studies. Both study design and ground truth factors currently lead to uncertainty when considering the accuracy of these models to real-world breast cancer screening settings. Implementing AI algorithms for risk prediction may have health equity implications. Yala et al (2019) demonstrated equivalent performance of AI algorithms across race/ethnicity and menopausal status, whereas Tyrer-Cuzick showed significant differences in AUC across these same subgroups.²⁵ Thus, future studies should include prospective study designs and include larger, more diverse screening populations with robust long-term follow-up data for more generalizable results.^29,30

Our finding that breast density and other included traditional clinical risk factors did not, for the most part, improve image-only AI breast cancer risk assessment has practical implications. If subjective self-reported traditional clinical risk factors do not provide substantial incremental improvements to risk prediction, then their collection may not be needed.³¹ This could eliminate much of the effort and human subjectivity that contribute to the current traditional clinical factor-based risk prediction practices performed at the time of screening mammography or in primary care clinics. If not used as standalone risk assessment tools, AI image-based risk prediction may provide complementary information to risk prediction based on traditional clinical risk factors.³²

Our review was comprehensive, searching the literature across multiple clinical and data science databases that include engineering databases frequented by AI data scientists. By including these databases, we ensured the capture of literature in data science repositories in addition to medical journals. Our study did, however, have limitations. We limited our review to English language publications. The available publications for review are also at risk of publication bias, as they are more likely to be published if able to demonstrate improved risk prediction with their AI models. Since AI for risk prediction is a fast-moving field, there may have been more recent publications not included in this review. While some studies compared performance by race and ethnicity, most collapsed the comparison to White versus non-White given small minority populations. The reported literature also was sparse on comparisons to other available risk models beyond the Tyrer-Cuzick model and breast density alone. Future studies should compare AI risk prediction to several different traditional clinical risk models. Finally, we do not report on the impact of AI on the direct detection of breast cancer on mammography images or for improving radiologist workflow efficiency, as these topics are beyond the scope of our review.

Supplementary Material

NIHMS1944123-supplement-1.docx^{(45.7KB, docx)}

Take-Home Points.

Early reports of mammography image-based AI breast cancer risk prediction suggest that they rival the predictive accuracy of traditional breast density or results of clinical risk prediction models.
Most published external validation studies for image-based AI breast cancer risk prediction used retrospective case-cohort study designs, and more than half of all studies lacked sufficient follow-up periods for cancer outcomes.
If image-based AI breast cancer risk prediction models can be validated in larger clinical studies, they offer the potential for a paradigm shift in breast cancer screening via a streamlined, potentially equitable, and accurate approach for more personalized, risk-based screening regimens.

Summary Sentence.

Early studies of image-based AI breast cancer risk prediction algorithms suggest comparable performance to breast density or traditional clinical risk models; however, the quality of early evidence remains prone to biases.

Acknowledgements:

We thank Teresa E. Jewell, MLIS, for her assistance in designing and executing the comprehensive literature search.

Funding:

This study was funded in part by the National Cancer Institute (R01CA262023, R37CA240403). MLM is funded by a National Breast Cancer Foundation Investigator Initiated Research Scheme grant (IIRS-20-011). NH is funded via NBCF Chair in Breast Cancer Prevention grant (EC-21-001) and NHMRC Investigator Leader grant (#1194410). KPL is funded by the American Cancer Society (21-078-01-CPSH).

Declaration of interests

Christoph Lee reports financial support was provided by National Institutes of Health. Christoph Lee receives personal fees for JACR editorial board work from the American College of Radiology.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Conflict of interest disclosures:

CIL receives personal fees from the American College of Radiology for JACR editorial board work.

Data statement:

The authors declare that they had full access to all of the data in this study and the authors take complete responsibility for the integrity of the data and the accuracy of the data analysis.

References

1.Engmann NJ, Golmakani MK, Miglioretti DL, Sprague BL, Kerlikowske K, Breast Cancer Surveillance C. Population-Attributable Risk Proportion of Clinical Risk Factors for Breast Cancer. JAMA Oncol. Sep 1 2017;3(9):1228–1236. doi: 10.1001/jamaoncol.2016.6326 [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Lee CI, Chen LE, Elmore JG. Risk-based Breast Cancer Screening: Implications of Breast Density. Med Clin North Am. Jul 2017;101(4):725–741. doi: 10.1016/j.mcna.2017.03.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
3.McCarthy AM, Guan Z, Welch M, et al. Performance of Breast Cancer Risk-Assessment Models in a Large Mammography Cohort. J Natl Cancer Inst. May 1 2020;112(5):489–497. doi: 10.1093/jnci/djz177 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Tice JA, Bissell MCS, Miglioretti DL, et al. Validation of the breast cancer surveillance consortium model of breast cancer risk. Breast Cancer Res Treat. Jun 2019;175(2):519–523. doi: 10.1007/s10549-019-05167-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Paige JS, Lee CI, Wang PC, et al. Variability Among Breast Cancer Risk Classification Models When Applied at the Level of the Individual Woman. J Gen Intern Med. Feb 7 2023;doi: 10.1007/s11606-023-08043-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Terry MB, Liao Y, Whittemore AS, et al. 10-year performance of four models of breast cancer risk: a validation study. Lancet Oncol. Apr 2019;20(4):504–517. doi: 10.1016/S1470-2045(18)30902-1 [DOI] [PubMed] [Google Scholar]
7.Anandarajah A, Chen Y, Stoll C, Hardi A, Jiang S, Colditz GA. Repeated measures of mammographic density and texture to evaluate prediction and risk of breast cancer: a systematic review of the methods used in the literature. Cancer Causes Control. Jun 20 2023;doi: 10.1007/s10552-023-01739-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Acciavatti RJ, Lee SH, Reig B, et al. Beyond Breast Density: Risk Measures for Breast Cancer in Multiple Imaging Modalities. Radiology. Mar 2023;306(3):e222575. doi: 10.1148/radiol.222575 [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Lee CI, Elmore JG. Cancer Risk Prediction Paradigm Shift: Using Artificial Intelligence to Improve Performance and Health Equity. J Natl Cancer Inst. Oct 6 2022;114(10):1317–1319. doi: 10.1093/jnci/djac143 [DOI] [PMC free article] [PubMed] [Google Scholar]
10.McInnes MDF, Moher D, Thombs BD, et al. Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies: The PRISMA-DTA Statement. JAMA. Jan 23 2018;319(4):388–396. doi: 10.1001/jama.2017.19163 [DOI] [PubMed] [Google Scholar]
11.Anderson AW, Marinovich ML, Houssami N, et al. Independent External Validation of Artificial Intelligence Algorithms for Automated Interpretation of Screening Mammography: A Systematic Review. J Am Coll Radiol. Feb 2022;19(2 Pt A):259–273. doi: 10.1016/j.jacr.2021.11.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Whiting PF, Rutjes AW, Westwood ME, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. Oct 18 2011;155(8):529–36. doi: 10.7326/0003-4819-155-8-201110180-00009 [DOI] [PubMed] [Google Scholar]
13.Arasu VA, Habel LA, Achacoso NS, et al. Comparison of Mammography AI Algorithms with a Clinical Risk Model for 5-year Breast Cancer Risk Prediction: An Observational Study. Radiology. Jun 2023;307(5):e222733. doi: 10.1148/radiol.222733 [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Arefan D, Mohamed AA, Berg WA, Zuley ML, Sumkin JH, Wu S. Deep learning modeling using normal mammograms for predicting breast cancer risk. Med Phys. Jan 2020;47(1):110–118. doi: 10.1002/mp.13886 [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Dadsetan S, Arefan D, Berg WA, Zuley ML, Sumkin JH, Wu S. Deep learning of longitudinal mammogram examinations for breast cancer risk prediction. Pattern Recognit. Dec 2022;132doi: 10.1016/j.patcog.2022.108919 [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Dadsetan SAD, Zuley M, Sumkin J, Sun M, Wu S. Learning knowledge from longitudinal data of mammograms to improving breast cancer risk prediction. Proceedings of the SPIE, volume 11601, February 2021. doi: 10.1117/12.2582267. [DOI] [Google Scholar]
17.Dembrower K, Liu Y, Azizpour H, et al. Comparison of a Deep Learning Risk Score and Standard Mammographic Density Score for Breast Cancer Risk Prediction. Radiology. Feb 2020;294(2):265–272. doi: 10.1148/radiol.2019190872 [DOI] [PubMed] [Google Scholar]
18.Gastounioti A, Eriksson M, Cohen EA, et al. External Validation of a Mammography-Derived AI-Based Risk Model in a U.S. Breast Cancer Screening Cohort of White and Black Women. Cancers (Basel). Sep 30 2022;14(19)doi: 10.3390/cancers14194803 [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Ha R, Chang P, Karcich J, et al. Convolutional Neural Network Based Breast Cancer Risk Stratification Using a Mammographic Dataset. Acad Radiol. Apr 2019;26(4):544–549. doi: 10.1016/j.acra.2018.06.020 [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Hinton B, Ma L, Mahmoudzadeh AP, et al. Deep learning networks find unique mammographic differences in previous negative mammograms between interval and screen-detected cancers: a case-case study. Cancer Imaging. Jun 22 2019;19(1):41. doi: 10.1186/s40644-019-0227-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Lang K, Hofvind S, Rodriguez-Ruiz A, Andersson I. Can artificial intelligence reduce the interval cancer rate in mammography screening? Eur Radiol. Aug 2021;31(8):5940–5947. doi: 10.1007/s00330-021-07686-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Lehman CD, Mercaldo S, Lamb LR, et al. Deep Learning vs Traditional Breast Cancer Risk Models to Support Risk-Based Mammography Screening. J Natl Cancer Inst. Oct 6 2022;114(10):1355–1363. doi: 10.1093/jnci/djac142 [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Mohamed A, Fakhry S, Basha T. Bilateral Analysis Boosts the Performance of Mammography-based Deep Learning Models in Breast Cancer Risk Prediction. Annu Int Conf IEEE Eng Med Biol Soc. Jul 2022;2022:1440–1443. doi: 10.1109/EMBC48229.2022.9872011 [DOI] [PubMed] [Google Scholar]
24.Wanders AJT, Mees W, Bun PAM, et al. Interval Cancer Detection Using a Neural Network and Breast Density in Women with Negative Screening Mammograms. Radiology. May 2022;303(2):269–275. doi: 10.1148/radiol.210832 [DOI] [PubMed] [Google Scholar]
25.Yala A, Lehman C, Schuster T, Portnoi T, Barzilay R. A Deep Learning Mammography-based Model for Improved Breast Cancer Risk Prediction. Radiology. Jul 2019;292(1):60–66. doi: 10.1148/radiol.2019182716 [DOI] [PubMed] [Google Scholar]
26.Yala A, Mikhael PG, Strand F, et al. Multi-Institutional Validation of a Mammography-Based Breast Cancer Risk Model. J Clin Oncol. Jun 1 2022;40(16):1732–1740. doi: 10.1200/JCO.21.01337 [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Yala A, Mikhael PG, Strand F, et al. Toward robust mammography-based models for breast cancer risk. Sci Transl Med. Jan 27 2021;13(578)doi: 10.1126/scitranslmed.aba4373 [DOI] [PubMed] [Google Scholar]
28.Zhu X, Wolfgruber TK, Leong L, et al. Deep Learning Predicts Interval and Screening-detected Cancer from Screening Mammograms: A Case-Case-Control Study in 6369 Women. Radiology. Dec 2021;301(3):550–558. doi: 10.1148/radiol.2021203758 [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Paulus JK, Kent DM. Predictably unequal: understanding and addressing concerns that algorithmic clinical prediction may increase health disparities. NPJ Digit Med. 2020;3:99. doi: 10.1038/s41746-020-0304-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Waters EA, Colditz GA, Davis KL. Essentialism and Exclusion: Racism in Cancer Risk Prediction Models. J Natl Cancer Inst. Nov 29 2021;113(12):1620–1624. doi: 10.1093/jnci/djab074 [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Houssami N, Kerlikowske K. AI as a new paradigm for risk-based screening for breast cancer. Nat Med. Jan 2022;28(1):29–30. doi: 10.1038/s41591-021-01649-3 [DOI] [PubMed] [Google Scholar]
32.Vachon CM, Scott CG, Norman AD, et al. Impact of Artificial Intelligence System and Volumetric Density on Risk Prediction of Interval, Screen-Detected, and Advanced Breast Cancer. J Clin Oncol. Jun 10 2023;41(17):3172–3183. doi: 10.1200/JCO.22.01153 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHMS1944123-supplement-1.docx^{(45.7KB, docx)}

Data Availability Statement

The authors declare that they had full access to all of the data in this study and the authors take complete responsibility for the integrity of the data and the accuracy of the data analysis.

[R1] 1.Engmann NJ, Golmakani MK, Miglioretti DL, Sprague BL, Kerlikowske K, Breast Cancer Surveillance C. Population-Attributable Risk Proportion of Clinical Risk Factors for Breast Cancer. JAMA Oncol. Sep 1 2017;3(9):1228–1236. doi: 10.1001/jamaoncol.2016.6326 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Lee CI, Chen LE, Elmore JG. Risk-based Breast Cancer Screening: Implications of Breast Density. Med Clin North Am. Jul 2017;101(4):725–741. doi: 10.1016/j.mcna.2017.03.005 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.McCarthy AM, Guan Z, Welch M, et al. Performance of Breast Cancer Risk-Assessment Models in a Large Mammography Cohort. J Natl Cancer Inst. May 1 2020;112(5):489–497. doi: 10.1093/jnci/djz177 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Tice JA, Bissell MCS, Miglioretti DL, et al. Validation of the breast cancer surveillance consortium model of breast cancer risk. Breast Cancer Res Treat. Jun 2019;175(2):519–523. doi: 10.1007/s10549-019-05167-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Paige JS, Lee CI, Wang PC, et al. Variability Among Breast Cancer Risk Classification Models When Applied at the Level of the Individual Woman. J Gen Intern Med. Feb 7 2023;doi: 10.1007/s11606-023-08043-4 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Terry MB, Liao Y, Whittemore AS, et al. 10-year performance of four models of breast cancer risk: a validation study. Lancet Oncol. Apr 2019;20(4):504–517. doi: 10.1016/S1470-2045(18)30902-1 [DOI] [PubMed] [Google Scholar]

[R7] 7.Anandarajah A, Chen Y, Stoll C, Hardi A, Jiang S, Colditz GA. Repeated measures of mammographic density and texture to evaluate prediction and risk of breast cancer: a systematic review of the methods used in the literature. Cancer Causes Control. Jun 20 2023;doi: 10.1007/s10552-023-01739-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Acciavatti RJ, Lee SH, Reig B, et al. Beyond Breast Density: Risk Measures for Breast Cancer in Multiple Imaging Modalities. Radiology. Mar 2023;306(3):e222575. doi: 10.1148/radiol.222575 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Lee CI, Elmore JG. Cancer Risk Prediction Paradigm Shift: Using Artificial Intelligence to Improve Performance and Health Equity. J Natl Cancer Inst. Oct 6 2022;114(10):1317–1319. doi: 10.1093/jnci/djac143 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.McInnes MDF, Moher D, Thombs BD, et al. Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies: The PRISMA-DTA Statement. JAMA. Jan 23 2018;319(4):388–396. doi: 10.1001/jama.2017.19163 [DOI] [PubMed] [Google Scholar]

[R11] 11.Anderson AW, Marinovich ML, Houssami N, et al. Independent External Validation of Artificial Intelligence Algorithms for Automated Interpretation of Screening Mammography: A Systematic Review. J Am Coll Radiol. Feb 2022;19(2 Pt A):259–273. doi: 10.1016/j.jacr.2021.11.008 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Whiting PF, Rutjes AW, Westwood ME, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. Oct 18 2011;155(8):529–36. doi: 10.7326/0003-4819-155-8-201110180-00009 [DOI] [PubMed] [Google Scholar]

[R13] 13.Arasu VA, Habel LA, Achacoso NS, et al. Comparison of Mammography AI Algorithms with a Clinical Risk Model for 5-year Breast Cancer Risk Prediction: An Observational Study. Radiology. Jun 2023;307(5):e222733. doi: 10.1148/radiol.222733 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Arefan D, Mohamed AA, Berg WA, Zuley ML, Sumkin JH, Wu S. Deep learning modeling using normal mammograms for predicting breast cancer risk. Med Phys. Jan 2020;47(1):110–118. doi: 10.1002/mp.13886 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Dadsetan S, Arefan D, Berg WA, Zuley ML, Sumkin JH, Wu S. Deep learning of longitudinal mammogram examinations for breast cancer risk prediction. Pattern Recognit. Dec 2022;132doi: 10.1016/j.patcog.2022.108919 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Dadsetan SAD, Zuley M, Sumkin J, Sun M, Wu S. Learning knowledge from longitudinal data of mammograms to improving breast cancer risk prediction. Proceedings of the SPIE, volume 11601, February 2021. doi: 10.1117/12.2582267. [DOI] [Google Scholar]

[R17] 17.Dembrower K, Liu Y, Azizpour H, et al. Comparison of a Deep Learning Risk Score and Standard Mammographic Density Score for Breast Cancer Risk Prediction. Radiology. Feb 2020;294(2):265–272. doi: 10.1148/radiol.2019190872 [DOI] [PubMed] [Google Scholar]

[R18] 18.Gastounioti A, Eriksson M, Cohen EA, et al. External Validation of a Mammography-Derived AI-Based Risk Model in a U.S. Breast Cancer Screening Cohort of White and Black Women. Cancers (Basel). Sep 30 2022;14(19)doi: 10.3390/cancers14194803 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Ha R, Chang P, Karcich J, et al. Convolutional Neural Network Based Breast Cancer Risk Stratification Using a Mammographic Dataset. Acad Radiol. Apr 2019;26(4):544–549. doi: 10.1016/j.acra.2018.06.020 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Hinton B, Ma L, Mahmoudzadeh AP, et al. Deep learning networks find unique mammographic differences in previous negative mammograms between interval and screen-detected cancers: a case-case study. Cancer Imaging. Jun 22 2019;19(1):41. doi: 10.1186/s40644-019-0227-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.Lang K, Hofvind S, Rodriguez-Ruiz A, Andersson I. Can artificial intelligence reduce the interval cancer rate in mammography screening? Eur Radiol. Aug 2021;31(8):5940–5947. doi: 10.1007/s00330-021-07686-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Lehman CD, Mercaldo S, Lamb LR, et al. Deep Learning vs Traditional Breast Cancer Risk Models to Support Risk-Based Mammography Screening. J Natl Cancer Inst. Oct 6 2022;114(10):1355–1363. doi: 10.1093/jnci/djac142 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.Mohamed A, Fakhry S, Basha T. Bilateral Analysis Boosts the Performance of Mammography-based Deep Learning Models in Breast Cancer Risk Prediction. Annu Int Conf IEEE Eng Med Biol Soc. Jul 2022;2022:1440–1443. doi: 10.1109/EMBC48229.2022.9872011 [DOI] [PubMed] [Google Scholar]

[R24] 24.Wanders AJT, Mees W, Bun PAM, et al. Interval Cancer Detection Using a Neural Network and Breast Density in Women with Negative Screening Mammograms. Radiology. May 2022;303(2):269–275. doi: 10.1148/radiol.210832 [DOI] [PubMed] [Google Scholar]

[R25] 25.Yala A, Lehman C, Schuster T, Portnoi T, Barzilay R. A Deep Learning Mammography-based Model for Improved Breast Cancer Risk Prediction. Radiology. Jul 2019;292(1):60–66. doi: 10.1148/radiol.2019182716 [DOI] [PubMed] [Google Scholar]

[R26] 26.Yala A, Mikhael PG, Strand F, et al. Multi-Institutional Validation of a Mammography-Based Breast Cancer Risk Model. J Clin Oncol. Jun 1 2022;40(16):1732–1740. doi: 10.1200/JCO.21.01337 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] 27.Yala A, Mikhael PG, Strand F, et al. Toward robust mammography-based models for breast cancer risk. Sci Transl Med. Jan 27 2021;13(578)doi: 10.1126/scitranslmed.aba4373 [DOI] [PubMed] [Google Scholar]

[R28] 28.Zhu X, Wolfgruber TK, Leong L, et al. Deep Learning Predicts Interval and Screening-detected Cancer from Screening Mammograms: A Case-Case-Control Study in 6369 Women. Radiology. Dec 2021;301(3):550–558. doi: 10.1148/radiol.2021203758 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Paulus JK, Kent DM. Predictably unequal: understanding and addressing concerns that algorithmic clinical prediction may increase health disparities. NPJ Digit Med. 2020;3:99. doi: 10.1038/s41746-020-0304-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] 30.Waters EA, Colditz GA, Davis KL. Essentialism and Exclusion: Racism in Cancer Risk Prediction Models. J Natl Cancer Inst. Nov 29 2021;113(12):1620–1624. doi: 10.1093/jnci/djab074 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] 31.Houssami N, Kerlikowske K. AI as a new paradigm for risk-based screening for breast cancer. Nat Med. Jan 2022;28(1):29–30. doi: 10.1038/s41591-021-01649-3 [DOI] [PubMed] [Google Scholar]

[R32] 32.Vachon CM, Scott CG, Norman AD, et al. Impact of Artificial Intelligence System and Volumetric Density on Risk Prediction of Interval, Screen-Detected, and Advanced Breast Cancer. J Clin Oncol. Jun 10 2023;41(17):3172–3183. doi: 10.1200/JCO.22.01153 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Artificial Intelligence-Driven Mammography-Based Future Breast Cancer Risk Prediction: A Systematic Review

Cody M Schopf, MD

Ojas A Ramwala, BTech

Kathryn P Lowry, MD

Solveig Hofvind, PhD

M Luke Marinovich, PhD, MPH

Nehmat Houssami, MBBS, PhD

Joann G Elmore, MD, MPH

Brian N Dontchos, MD

Janie M Lee, MD, MSc

Christoph I Lee, MD, MS

Abstract

Purpose:

Materials and Methods:

Results:

Conclusions:

Introduction

Methods

Databases and Search Terms

Study Inclusion and Exclusion

Intervention:

Comparison:

Outcomes:

Data Extraction

Data Synthesis and Analysis

Quality Assessment

Results

Figure 1.

Table 1.

Study Characteristics

Table 2.

Quality Assessment

Table 3.

Predictive Accuracy

Figure 2.

Figure 3.

Discussion

Supplementary Material

Take-Home Points.

Summary Sentence.

Acknowledgements:

Funding:

Declaration of interests

Footnotes

Data statement:

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases