Abstract
Importance:
Contemporary approaches to artificial intelligence (AI) based on deep learning have generated interest in the application of AI to breast cancer screening (BCS). The U.S. Food and Drug Administration (FDA) has approved several next-generation AI products with an indication for BCS in recent years. However, questions related to AI’s accuracy, appropriate use, and clinical utility remain.
Objectives:
To describe FDA’s current regulatory process for AI products, summarize the evidence used to support FDA clearance and approval of AI products indicated for BCS, consider the advantages and limitations of current regulatory approaches, and suggest ways to improve on the current system.
Evidence:
We reviewed premarket notifications and other publicly available FDA documents used in the clearance and approval of AI products with an indication for BCS from January 1, 2017 through December 31, 2021.
Findings:
Nine AI products indicated for BCS suspicious lesion identification and mammogram triage were included. The majority of products were cleared through the 510(k) pathway, and all clearances were based on previously collected, retrospective data. Six products used multicenter designs. Enriched data were used for eight devices, and four devices lacked details on whether products were externally validated. Test performance measures, including sensitivity, specificity, and area under the curve, were the main outcomes reported. The majority of devices used tissue biopsy as their gold standard for BCS accuracy evaluation. Other measures of clinical utility, including cancer stage at detection, interval cancer detection, or other outcomes, were not reported for any of the devices.
Conclusions and Relevance:
Important gaps in reporting of data sources, dataset type, validation approach, and clinical utility assessment were identified. As AI-assisted reading becomes more widespread in BCS and other radiologic exams, strengthened FDA evidentiary regulatory standards, development of post-marketing surveillance, a focus on clinically meaningful outcomes, and stakeholder engagement of will be critical for ensuring safety and efficacy of these products.
Introduction
Screening mammography has a number of limitations including imperfect sensitivity and specificity and inter-reader variability. Artificial intelligence (AI) applications have been proposed both for improving accuracy (sensitivity and specificity) and for reducing radiologist reading time and cognitive load by, for example, identifying regions of interest or triaging images with abnormal findings. While AI-based technologies may hold promise, there are still many unanswered questions about AI’s accuracy, efficacy, appropriate use, and impact on health outcomes.1–3 Despite these questions, a number of AI-based products for breast cancer screening (BCS) have already been authorized by the US Food and Drug Administration (FDA) and are available for clinical use to assist radiologists.4,5
We characterized FDA’s regulatory evaluation process for marketed AI products and summarized the publicly available evidence supporting AI product clearance and approval for BCS, with particular attention to study design, population/dataset, validation, gold standards, and outcomes. Additionally, we consider the advantages and limitations of current regulatory approaches. Finally, we suggest ways to strengthen the current system to ensure that products ultimately used in routine BCS contribute to improved health.
A Brief History of Automated Image Analysis in Breast Cancer Screening
The development of computer software for radiographic image analysis began as early as the 1960s in the US.6,7 However, it was not until three decades later, in 1998, that FDA approved the first computer-aided detection (CAD) system for screening mammography, which used early AI algorithms to identify suspicious lesions for second review by radiologists.8,9 Although evidence supporting CAD was limited to small studies focused on test characteristics, Congress mandated that Medicare cover CAD.10 By 2016, about 92% of mammography facilities in the United States had adopted CAD into their clinical practice.11 However, once adopted, these CAD systems did not appear to improve diagnostic accuracy, generating more false positives and subsequent unnecessary clinical workup.11,12 CAD also increased detection of ductal carcinoma in situ (DCIS), a precancerous lesion with low risk of mortality, and increased the biopsy rate compared to interpretation without CAD.13,14 Importantly, CAD has not been shown to reduce mortality from breast cancer and has been associated with rising screening costs.12 Despite the historical shortcomings of CAD, interest in next-generation AI for BCS has grown substantially in recent years with the development of convolutional neural networks (CNNs), which use a filter methodology to identify image features and have greater accuracy than other methods.15,16 CNNs have been successfully applied to a wide range of image analysis applications and have renewed enthusiasm for automated mammography interpretation.11,16–19
Current FDA Regulation of AI
Although AI-based diagnostic products are distinct from typical therapeutic medical devices, FDA has recognized that AI has the potential to impact human health. FDA has adopted a standard known as Software as a Medical Device (SaMD) for identifying software that may be regulated under its authority. SaMD is defined as “software intended to be used for one or more medical purposes that performs these purposes without being part of a hardware medical device.”20
Products classified as SaMD may be reviewed through three existing FDA medical device pathways: 510(k), De Novo, and Premarket Approval (PMA). The specific pathway chosen for review depends on whether there is a predicate device and the device’s risk (Figure).21 PMA is used for the highest risk devices and requires “valid scientific evidence” to inform safety and effectiveness. This evidence can comprise a range of study designs, from randomized, controlled trials to single arm studies.22 The De Novo pathway provides a regulatory approach to clearance for novel low to moderate risk devices without an established predicate on the market.23 The majority of AI software products reviewed by FDA have been cleared through 510(k) review.5
Figure. Overview of U.S. FDA Medical Device Regulatory Process with AI in BCS Product Examples.

Most FDA-approved AI products to date have gone through 1510(k) clearance. Abbreviations: AI, artificial intelligence; AUC, area under the curve; BCS, breast cancer screening; DBT, digital breast tomosynthesis; MRMC, multi-reader, multi-case; MRI, magnetic resonance imaging
FDA Standards under the 510(k) Pathway
Products submitted for clearance through the 510(k) pathway must demonstrate “substantial equivalence” to a predicate device that is already FDA approved, cleared, or grandfathered in. For this to occur, the agency must deem the intended use of the new device and predicate to be the same. The technological characteristics of the devices must either also be deemed the same or not raise questions pertaining to safety and effectiveness.24
FDA has also provided specific guidance for demonstrating substantial equivalence for AI applications for computer-aided detection (CADe), for which assessment of performance is central.25 A range of performance metrics may be used, with a receiver operating characteristic curve summary metric (e.g., area under the curve) recommended as the primary endpoint, and sensitivity and specificity as secondary endpoints. Performance may be assessed through a variety of study designs including retrospective reader studies, prospective reader studies, and stress tests, which include difficult-to-read images. Multi-reader, multi-case (MRMC) study design, in which radiologists read a number of mammograms with AI and then, after a washout period, read the same mammograms without (or vice versa) is cited as commonly used to assess CADe clinical performance. Importantly these guidelines do not apply to computer-aided diagnosis (CADx) devices, which are intended to provide more detailed diagnostic information.
Other Regulatory Approaches under Consideration:
Although FDA has used its medical device regulatory pathways for SaMD, AI products have features that create unique challenges for regulation. In particular, AI may use adaptive algorithms, which are dynamic and can change in response to new information. Adaptive algorithms are difficult to regulate in the current framework because the product initially reviewed may be different from the one a consumer ultimately encounters.26 In addition, the current regulatory framework is cumbersome even for locked algorithms because it does not allow for rapid iterative updates in response to new information about use or performance.
FDA has proposed a voluntary program known as the Software Pre-Cert Pilot Program, which is designed to address the specific challenges of regulating SaMD. The Pre-Cert program would focus on vetting software developers and processes, and would allow for streamlined review of some products from participating developers.27 The program also calls for developers to engage in “real world performance analytics” to evaluate product performance once in use. Drawing on this framework, FDA has also proposed an approach to regulating adaptive algorithms in which FDA would similarly evaluate the quality of the software developer and require a priori plans for re-review by regulators when there are significant changes in intended use and performance.4,28,29
Characteristics of Evidence Used in FDA Clearance of AI Products for Breast Cancer Screening
Because AI products for BCS may come to market via several different pathways, data supporting the efficacy of these products may be varied. We assessed the strength of evidence supporting AI products for BCS that have been cleared and to identify systematic or consistent gaps in evidence. We specifically focused on AI products with an indication for suspicious lesion identification or mammogram triage. Understanding and assessing the evidence supporting these products will be critical as more products come to market, existing products gain market share, and the presence of next-generation AI in BCS grows.
To this end, we reviewed all products cleared or approved by FDA Center for Devices and Radiological Health via radiology panels for the 510(k), De Novo, and PMA review pathways between January 1, 2017 and December 31, 2021 to focus our review on next-generation AI products. BCS indication determination was made from product name or from the product summary if not apparent from product name. We considered products to use AI if the device classification name included “radiological computer assisted detection” or “diagnosis software for lesions suspicious for cancer.” If the device classification name included “radiological image processing system,” the product summary was consulted for possible inclusion as an AI product. We cross-referenced our product list with published databases of FDA-approved AI products.2,4,5,30 We also searched PubMed, arXiv and medRxiv for published reports of AI products and cross-referenced products with FDA listings.3 If multiple versions existed for a product, we included only the initial version of the product.
We abstracted information about the evidence used to support product clearance from publicly available data including FDA premarket notifications, decision summaries, and summaries of safety and effectiveness. We specifically evaluated the clearance or approval pathway used, study characteristics including study design, data source and type, gold standards, and outcomes. We evaluated whether products were clearly noted to be externally validated during reader testing, in which diagnostic accuracy is assessed on a sample independent from that used in development.
Product Overview:
We identified 9 products with applications in BCS (Table; eTable). Of these products, seven were cleared through the 510(k) pathway, one was approved through the De Novo pathway, and one was approved through full premarket review. Products were designed for use on varied imaging modalities including full-field digital mammography (FFDM), digital breast tomosynthesis (DBT), and magnetic resonance imaging (MRI). Importantly, all products are indicated for use by radiologists in an assistive or concurrent reading capacity rather than a diagnostic capacity.
Table.
Summary of Data Used for FDA Clearance or Approval of AI Products Indicated for Suspicious Lesion Identification or Triage during Breast Cancer Screening
| Product name (Company) | Year of FDA clearance or approval | FDA pathway | Predicate device (Company) | Population | Dataset type | Validation type | Outcomes |
|---|---|---|---|---|---|---|---|
| PowerLook Tomo Detection (iCAD, Inc.) |
2017 | Pre-market approval | -- | Multicenter | Enriched | External | Sensitivity, specificity, AUC, recall rate, radiologist reading time |
| QuantX (Quantitative Insights, Inc.) | 2017 | De Novo | -- | Multicenter | Enriched | External | Sensitivity, specificity, AUC |
| Transpara (ScreenPoint Medical BV) | 2018 | 510(k) | OsteoDetect (Imagen Technologies, Inc.) | Multicenter | Enriched | Not reported | Sensitivity, specificity, AUC, radiologist reading time |
| cmTriage (CureMetrix, Inc.) | 2019 | 510(k) | ContaCT (Viz.AI) | Multicenter | Enriched | External* | Sensitivity, specificity, AUC, recall rate, processing time |
| Genius AI Detection (Hologic, Inc.) | 2020 | 510(k) | PowerLook Tomo Detection V2 Software (iCAD, Inc.) | Not reported | Enriched | Not reported | Sensitivity, AUC, recall rate, radiologist reading time |
| HealthMammo (Zebra Medical Vision, Ltd.) | 2020 | 510(k) | cmTriage (CureMetrix, Inc.) | Multicenter | Enriched | Not reported | Sensitivity, specificity, AUC, processing time |
| Mammoscreen (Therapixel) | 2020 | 510(k) | Transpara (ScreenPoint Medical BV) | Single center | Not reported | Not reported | Sensitivity, specificity, AUC, radiologist reading time |
| Lunit INSIGHT MMG (Lunit, Inc.) | 2021 | 510(k) | Transpara (ScreenPoint Medical BV) | Not reported | Not reported | External | Sensitivity, specificity, AUC, recall rate |
| Saige-Q (DeepHealth, Inc.) | 2021 | 510(k) | cmTriage (CureMetrix, Inc.) | Multicenter | Enriched | External | Sensitivity, specificity, AUC, processing time |
Asterisk (*) denotes testing in populations or datasets that were likely to be independent from development datasets, based on developers’ descriptions. Abbreviation: AUC, area under the curve
Characteristics of Studies Used to Support Clearance:
All nine products were cleared or approved on the basis of clinical studies, all of which used previously collected datasets. Of these, all products with an indication for suspicious lesion identification (n=6) used a MRMC design. The remaining three products were intended for triage (i.e., identification of potentially abnormal lesions to improve workflow) and used a blinded multicenter study design in which performance was compared to historical controls. Six products were evaluated using data from two or more clinical sites; one product was evaluated using data from one clinical site, and the remaining (22%) products did not have information reported on the number of sites. Seven products were examined with datasets enriched for breast cancer cases, and the remainder (22%) did not have information reported on dataset type. Five products appeared to be externally validated.
Characteristics of Reported Outcomes:
Publicly available FDA documents that provided quantitative outcome measures primarily reported test performance measures and radiologist reading time. All products reported sensitivity, specificity, and/or area under the curve (AUC). Seven reported processing or radiologist reading time. Three products also reported recall rate (i.e., the proportion of exams for which follow-up imaging was recommended). Eight reported biopsy as the gold standard for cancer cases, and the remainder (11%) did not report information on how cancer cases were ascertained. Additionally, six reported a follow-up period for non-cancer cases, and the remainder (33%) did not report information on non-cancer case ascertainment. None of the studies examined clinical outcomes, such as long-term cancer outcomes, cancer stage, or interval cancer detection.
Limitations:
We note that although we reviewed publicly available data, companies may have collected and/or submitted more extensive materials supporting product clearance with greater detail on study design, validation, populations studied, etc.
Advantages and Limitations of Current Standards for Device Approval
Currently, the majority of FDA-approved AI products for BCS have reported test accuracy for identifying breast cancer as the key metric for demonstrating substantial equivalence. Accuracy is an essential feature for any AI product designed to detect breast cancer, and a product that is less accurate than the current standard would clearly be detrimental. Evaluating accuracy also has other advantages. Test characteristics are relatively simple to measure, can be captured in a short time horizon, and are simple to compare between a device and the predicate technology or status quo approach. A focus on performance can be aligned with outcomes that improve care for patients. For example, improved specificity reduces false positives, thereby limiting the number of women who have to return for additional workup.
Current approaches to demonstrating substantial equivalence, though, have several important weaknesses. First, studies supporting the approval of AI products currently on the market for BCS incorporate features that increase the risk of bias when assessing performance. For example, all data submitted for product approval relied on previously assembled annotated datasets. This approach risks potential ascertainment bias if breast cancer is only detected in cases when a human reader has recommended biopsy.1 A number of studies also used enriched datasets, which contain more true positive cases than would be seen in a typical screening population. While they are convenient and increase statistical power, enriched datasets do not reflect the typical spectrum and prevalence of breast cancer and can lead to an AUC higher than what would be observed in real-world conditions.31–33 Thus, a number of common features of study design may lead to bias that could result in overestimation or underestimation of test accuracy.
Second, studies of test performance may incorporate design features that limit generalizability. Ideally, AI products are externally validated using a distinct dataset. Products that reuse data for development and validation may be overfit and perform poorly when applied to other populations. Relatedly, not all products were studied using data from multiple clinical sites. Reliance on a single center or population may mean that results do not generalize across different patient populations with different demographics, or different sites where clinical practice differs.4
Lastly, and perhaps most importantly, focusing on cancer detection does not necessarily translate to improved health. First, increased sensitivity may mean that some breast cancers are diagnosed earlier, but simply diagnosing a breast cancer earlier does not always improve outcomes. Second, greater sensitivity could actually result in harm by way of more false positive tests, resulting in more callbacks, biopsies, costs, and associated distress. AI may also contribute to overdiagnosis of indolent breast cancers that may not affect women in their lifetimes. Third, AI may miss clinically important, aggressive breast cancers while finding slower-growing tumors or DCIS—a change that would not be noticed if only focusing on performance. Lastly, AI may influence radiologist practice patterns in unpredictable ways that ultimately change diagnostic outcomes, but are not captured in limited, controlled studies of test accuracy.
Recommendations and Future Directions
Improving Evidentiary Standards
In the next decade, AI products are likely to become more common for BCS applications and many other diagnostic and therapeutic uses in radiology and other fields. It is crucial to consider whether current regulatory standards offer sufficient protection for patients against products that ultimately provide limited utility or are even harmful. Within its current statutory authority, FDA has the ability to alter evidentiary standards for AI product clearance by including specific requirements for study design, outcomes, populations included in studies, and approaches to validation. FDA could also consider modifying its voluntary guidance to which AI product manufacturers are not bound, but have strong incentive to adhere.
Should FDA require randomized trials of AI products for breast cancer screening prior to clearance or approval? Prospective randomized trials would address many of the shortcomings of current approaches to evaluation of AI products. Trials, however, are typically expensive and complex, and require significant expertise and engagement from multiple groups. In addition, trials may not be well suited to products like AI, which may have frequent updates or change over time. Proposals from FDA for alternate approaches to clearance, such as the Pre-Cert program or the proposed framework for regulating AI/machine learning, shift emphasis toward evaluating processes rather than products in response to concerns that AI products are not well suited to trials and traditional evaluation frameworks.27,28,34
However, without trials, we may implement technology based on evidence that incorporates bias and without a full understanding of potential risks and benefits. Pragmatic trial designs, such as stepped wedge trials, point-of-care trials, or other designs emphasizing provider or practice-level randomization, can address some clinical trial challenges. Health system networks that leverage common data models for electronic health record data infrastructure, such as PCORnet or NESTcc, might be useful to support trials in real-world clinical environments to assess effectiveness, given questions about how AI products might perform once adopted. However, even if feasible, clinical trials would require significant coordination, along with substantial investment from developers.
In addition, FDA could consider strengthening requirements for and reporting of other study design features. Importantly, greater diversity of study settings and participants is needed. The majority of the data that supported FDA clearance of AI products for BCS did not report the race and ethnicity of the patients included. However, most available training datasets used in AI development contain mammograms largely from White women.35–37 The inclusion of diverse populations when developing AI products is critical to ensure performance is maintained across populations.38,39 Clinical diversity is also important. Mammographic performance changes with age and breast density, and including women across the spectrum of age, breast density, and breast cancer risk will be essential. Recommending or requiring diverse populations for algorithm training and testing would improve generalizability. In addition, reporting the characteristics of these populations would allow end users to fully evaluate products.
The Role of Post-Marketing Surveillance
In addition to strengthening evidentiary standards, a robust system of post-marketing surveillance is needed. Post-marketing surveillance could help detect unintended consequences of AI when applied by physicians, deviations in performance compared to what was reported in controlled studies, or changes in intended use. Post-marketing surveillance by developers is already a central feature of proposed FDA frameworks for regulating AI.27,28,34 Post-marketing surveillance could leverage existing registries, such as the Breast Cancer Surveillance Consortium, registries developed by professional societies, or reporting structures used for quality assurance. Health system networks also have the potential to contribute to post-marketing surveillance infrastructure. Developing post-marketing surveillance systems will rely on engagement and investment from developers. However, public-sector investment will also be important for developing and maintaining infrastructure for surveillance. Post-marketing surveillance will be increasingly important as adaptive products are adopted into clinical practice, as the initial product approved may change over time.
The Role of Modeling Studies
In addition to empirical studies, modeling studies can also play a role in evaluation. Modeling studies, using established simulation models such as those maintained by CISNET, have historically been used to evaluate new BCS technologies. Simulation models can project the long-term impact of a screening technology on outcomes like morbidity, mortality, and cost, based on short-term parameters like sensitivity and specificity. Models have some important limitations—they make assumptions about the natural history of breast cancer and the modeled population. Modeling results are also dependent on the parameters associated with the screening technology. Thus, models can be useful but require accurate inputs to be informative. Improving the strength of evidence around AI for BCS would help improve the value of modeling approaches.
Focusing on Patient-Centered Outcomes
Across the spectrum of evidence generation, a key question is what outcomes should be evaluated. No single outcome measure is paramount, and different outcomes may suit different purposes. While mortality is the most meaningful long-term outcome for studies of BCS, its use as an endpoint is challenging and increasingly impractical given the rapid pace of new technology development, especially in the field of AI. Mortality is also a relatively rare outcome and requires very large sample sizes to evaluate. More contemporary studies of BCS have focused on surrogate endpoints that can be measured in a shorter time scale, such as interval cancers and late-stage diagnosis.40–43 However, whether reducing interval cancers improves mortality is uncertain, and important questions remain about the value of focusing on these surrogate outcomes. Quality of life measures have also been noted as helpful, patient-centered adjunctive outcome measures.44 Finally, although focusing on test accuracy alone may not be sufficient to understand whether a product improves health outcomes, measuring characteristics of performance will still be important and meaningful. These measures could be extended in the post-marketing surveillance phase to understand whether products perform as expected with diverse patients and in varied settings.
The Role of Other Interested Parties
Increasing regulatory requirements for approval has important tradeoffs. More stringent regulatory standards would also likely mean that fewer products will enter the market and the pace of new product entry would be slower. Fundamentally, FDA’s goal is to ensure the safety and efficacy of medical devices. Demonstrating that an AI product has similar performance to the predicate may be sufficient for current regulatory standards but may fall short of ultimately ensuring a product improves health, reflective of a broader criticism of the 510(k) pathway by the National Academy of Medicine.45 If FDA is viewed to be responsible only for maintaining the minimum bar for AI product approval, other parties, including payers, the medical and scientific community, and patients, may play a role in either encouraging or constraining AI adoption. Payers can set coverage criteria and determine reimbursement rates for the use of medical technologies including AI in practice. Currently, CAD is not separately reimbursed for screening mammography. Going forward, if reimbursement were to be considered, payers could request additional evidence from AI product manufacturers to better inform coverage decisions. The professional community can also influence practice and adoption of AI through guidelines, opinion leaders, and use by influential organizations. Finally, last but not least, patients are of central importance and should be involved in discussions about new BCS technologies. Patients deserve to know about changes in medical technology such as AI influencing their care, and more work must be done to inform them of these changes and empower them to make informed decisions.
Conclusions
The role of AI in breast cancer screening is only likely to grow over the next decade. Increased FDA evidentiary regulatory standards, development of improved post-marketing surveillance and trials, a focus on clinically meaningful outcomes, and engagement of key stakeholders all can help ensure that new AI products meet the goal of improving breast cancer screening outcomes and, ultimately, advancing health for all women.
Supplementary Material
Funding:
Dr. Richman received funding from the National Institutes of Health (NCI K08 CA248725). The funder had no role in any aspect of the study including design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication.
Footnotes
Conflict of Interest Disclosures
Mr. Potnis has received research funding from the National Institutes of Health (T35HL007649) and the American Society of Hematology, both outside of this work. Dr. Ross currently receives research support through Yale University from Johnson and Johnson to develop methods of clinical trial data sharing, from the Medical Device Innovation Consortium as part of the National Evaluation System for Health Technology (NEST), from the Food and Drug Administration for the Yale-Mayo Clinic Center for Excellence in Regulatory Science and Innovation (CERSI) program (U01FD005938), from the Agency for Healthcare Research and Quality (R01HS022882), from the National Heart, Lung and Blood Institute of the National Institutes of Health (NIH) (R01HS025164, R01HL144644), and from the Laura and John Arnold Foundation to establish the Good Pharma Scorecard at Bioethics International; in addition, Dr. Ross is an expert witness at the request of Relator’s attorneys, the Greene Law Firm, in a qui tam suit alleging violations of the False Claims Act and Anti-Kickback Statute against Biogen Inc. Dr. Aneja receives research funding from the MedNet, Inc, American Cancer Society, National Science Foundation, Agency for Healthcare Research and Quality, National Cancer Institute, American Society for Clinical Oncology (ASCO), Patterson Foundation, and Amazon Web Services. Dr. Gross has received research funding from the NCCN Foundation (Astra-Zeneca) and Genentech, as well as funding from Johnson and Johnson to help devise and implement new approaches to sharing clinical trial data. Dr. Richman receives salary support to develop health care quality measures from the Centers for Medicare and Medicaid Services outside of this work.
References
- 1.Freeman K, Geppert J, Stinton C, et al. Use of artificial intelligence for image analysis in breast cancer screening programmes: systematic review of test accuracy. BMJ. 2021;374:n1872. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Lamb LR, Lehman CD, Gastounioti A, Conant EF, Bahl M. Artificial Intelligence (AI) for Screening Mammography, From the AJR Special Series on AI Applications. AJR Am J Roentgenol. 2022:1–12. [DOI] [PubMed] [Google Scholar]
- 3.Anderson AW, Marinovich ML, Houssami N, et al. Independent External Validation of Artificial Intelligence Algorithms for Automated Interpretation of Screening Mammography: A Systematic Review. Journal of the American College of Radiology : JACR. 2022;19(2 Pt A):259–273. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Wu E, Wu K, Daneshjou R, Ouyang D, Ho DE, Zou J. How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals. Nature Medicine. 2021;27(4):582–584. [DOI] [PubMed] [Google Scholar]
- 5.Benjamens S, Dhunnoo P, Meskó B. The state of artificial intelligence-based FDA-approved medical devices and algorithms: an online database. npj Digital Medicine. 2020;3(1):118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Meyers PH, Charles M. Nice J, Becker HC, Wilson J. Nettleton J, Sweeney JW, Meckstroth GR. Automated Computer Analysis of Radiographic Images. Radiology. 1964;83(6):1029–1034. [DOI] [PubMed] [Google Scholar]
- 7.Winsberg F, Elkin M, Josiah Macy J, Bordaz V, Weymouth W. Detection of Radiographic Abnormalities in Mammograms by Means of Optical Scanning and Computer Analysis. Radiology. 1967;89(2):211–215. [Google Scholar]
- 8.Muralidhar GS, Haygood TM, Stephens TW, Whitman GJ, Bovik AC, Markey MK. Computer-aided detection of breast cancer - have all bases been covered? Breast Cancer (Auckl). 2008;2:5–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.U.S. Food and Drug Administration. M1000 ImageChecker: Premarket Approval. 2022. Available at: https://www.accessdata.fda.gov/scrIpts/cdrh/cfdocs/cfpma/pma.cfm?id=P970058. Accessed 5 February 2022.
- 10.Fenton JJ, Foote SB, Green P, Baldwin L-M. Diffusion of Computer-Aided Mammography After Mandated Medicare Coverage. Archives of Internal Medicine. 2010;170(11):987–989. [DOI] [PubMed] [Google Scholar]
- 11.Geras KJ, Mann RM, Moy L. Artificial Intelligence for Mammography and Digital Breast Tomosynthesis: Current Concepts and Future Perspectives. Radiology. 2019;293(2):246–259. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Gross CP, Long JB, Ross JS, et al. The Cost of Breast Cancer Screening in the Medicare Population. JAMA Internal Medicine. 2013;173(3):220–226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Lehman CD, Wellman RD, Buist DS, Kerlikowske K, Tosteson AN, Miglioretti DL. Diagnostic Accuracy of Digital Screening Mammography With and Without Computer-Aided Detection. JAMA Intern Med. 2015;175(11):1828–1837. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Fenton JJ, Taplin SH, Carney PA, et al. Influence of Computer-Aided Detection on Performance of Screening Mammography. New England Journal of Medicine. 2007;356(14):1399–1409. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Soffer S, Ben-Cohen A, Shimon O, Amitai MM, Greenspan H, Klang E. Convolutional Neural Networks for Radiologic Images: A Radiologist’s Guide. Radiology. 2019;290(3):590–606. [DOI] [PubMed] [Google Scholar]
- 16.Kohli A, Jha S. Why CAD Failed in Mammography. J Am Coll Radiol. 2018;15(3 Pt B):535–537. [DOI] [PubMed] [Google Scholar]
- 17.Chan HP, Samala RK, Hadjiiski LM. CAD and AI for breast cancer-recent development and challenges. Br J Radiol. 2020;93(1108):20190580. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Joel MZ, Umrao S, Chang E, et al. Using Adversarial Images to Assess the Robustness of Deep Learning Models Trained on Diagnostic Images in Oncology. JCO Clin Cancer Inform. 2022;6:e2100170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Kann BH, Hicks DF, Payabvash S, et al. Multi-Institutional Validation of Deep Learning for Pretreatment Identification of Extranodal Extension in Head and Neck Squamous Cell Carcinoma. J Clin Oncol. 2020;38(12):1304–1311. [DOI] [PubMed] [Google Scholar]
- 20.U.S. Food and Drug Administration. Software as a Medical Device (SaMD). 2018. vailable at: https://www.fda.gov/medical-devices/digital-health-center-excellence/software-medical-device-samd. Accessed 18 November 2021.
- 21.U.S. Food and Drug Administration. Overview of Device Regulation. 2020. Available at: https://www.fda.gov/medical-devices/device-advice-comprehensive-regulatory-assistance/overview-device-regulation. Accessed 18 November 2021.
- 22.U.S. Food and Drug Administration. PMA Clinical Studies. 2020. Available at: https://www.fda.gov/medical-devices/premarket-approval-pma/pma-clinical-studies#data. Accessed 22 January 2022.
- 23.U.S. Food and Drug Administration. De Novo Classification Request. 2022. Available at: https://www.fda.gov/medical-devices/premarket-submissions-selecting-and-preparing-correct-submission/de-novo-classification-request. Accessed 30 July 2022.
- 24.U.S. Food and Drug Administration. The 510(k) Program: Evaluating Substantial Equivalence in Premarket Notifications [510(k)]. 2014. Available at: https://www.fda.gov/media/82395/download. Accessed 1 August 2022.
- 25.U.S. Food and Drug Administration. Clinical Performance Assessment: Considerations for Computer-Assisted Detection Devices Applied to Radiology Images and Radiology Device Data in Premarket Notification (510(k)) Submissions. 2020. Available at: https://www.fda.gov/media/77642/download. Accessed 1 August 2022.
- 26.Pew Charitable Trusts. How FDA Regulates Artificial Intelligence in Medical Products. 2021. [Google Scholar]
- 27.U.S. Food and Drug Administration. Digital Health Software Precertification (Pre-Cert) Program. 2021. Available at: https://www.fda.gov/medical-devices/digital-health-center-excellence/digital-health-software-precertification-pre-cert-program. Accessed 3 August 2022.
- 28.U.S. Food and Drug Administration. Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) Action Plan. 2021. Available at: https://www.fda.gov/files/medical%20devices/published/US-FDA-Artificial-Intelligence-and-Machine-Learning-Discussion-Paper.pdf. Accessed 5 February 2022.
- 29.U.S. Food and Drug Administration. Artificial Intelligence and Machine Learning in Software as a Medical Device. 2021. Available at: https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-software-medical-device. Accessed 5 February 2022.
- 30.American College of Radiology. AI Central. 2022. Available at: https://aicentral.acrdsi.org/. Accessed 28 January 2022. [Google Scholar]
- 31.Adhikari S, Normand S-L, Bloom J, Shahian D, Rose S. Revisiting performance metrics for prediction with rare outcomes. Statistical Methods in Medical Research. 2021;30(10):2352–2366. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Freeman K, Geppert J, Stinton C, et al. Use of artificial intelligence for mammographic image analysis in breast cancer screening UK National Screening Committee;2022. [Google Scholar]
- 33.Marinovich ML, Wylie E, Lotter W, et al. Artificial intelligence (AI) to enhance breast cancer screening: protocol for population-based cohort study of cancer detection. BMJ Open. 2022;12(1):e054005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.U.S. Food and Drug Administration. Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device. 2019. Available at: https://www.fda.gov/media/122535/download. Accessed 1 August 2022.
- 35.Schaffter T, Buist DSM, Lee CI, et al. Evaluation of Combined Artificial Intelligence and Radiologist Assessment to Interpret Screening Mammograms. JAMA Network Open. 2020;3(3):e200265–e200265. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Lee RS, Gimenez F, Hoogi A, Miyake KK, Gorovoy M, Rubin DL. A curated mammography data set for use in computer-aided detection and diagnosis research. Scientific Data. 2017;4(1):170177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.McKinney SM, Sieniek M, Godbole V, et al. International evaluation of an AI system for breast cancer screening. Nature. 2020;577(7788):89–94. [DOI] [PubMed] [Google Scholar]
- 38.Kaushal A, Altman R, Langlotz C. Geographic Distribution of US Cohorts Used to Train Deep Learning Algorithms. JAMA. 2020;324(12):1212–1213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.U.S. Food and Drug Administration. Diversity Plans to Improve Enrollment of Participants from Underrepresented Racial and Ethnic Populations in Clinical Trials Guidance for Industry U.S. Department of Health and Human Services;2022. [Google Scholar]
- 40.Kerlikowske K, Su Y-R, Sprague BL, et al. Association of Screening With Digital Breast Tomosynthesis vs Digital Mammography With Risk of Interval Invasive and Advanced Breast Cancer. JAMA. 2022;327(22):2220–2230. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Lee C, McCaskill-Stevens W. Tomosynthesis mammographic Imaging Screening Trial (TMIST): An Invitation and Opportunity for the National Medical Association Community to Shape the Future of Precision Screening for Breast Cancer. J Natl Med Assoc. 2020;112(6):613–618. [DOI] [PubMed] [Google Scholar]
- 42.McCarthy AM, Barlow WE, Conant EF, et al. Breast Cancer With a Poor Prognosis Diagnosed After Screening Mammography With Negative Results. JAMA Oncology. 2018;4(7):998–1001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Niraula S, Biswanger N, Hu P, Lambert P, Decker K. Incidence, Characteristics, and Outcomes of Interval Breast Cancers Compared With Screening-Detected Breast Cancers. JAMA Network Open. 2020;3(9):e2018179–e2018179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Jatoi I, Pinsky PF. Breast Cancer Screening Trials: Endpoints and Overdiagnosis. JNCI: Journal of the National Cancer Institute. 2020;113(9):1131–1135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Institute of Medicine. Medical Devices and the Public’s Health: The FDA 510(k) Clearance Process at 35 Years. 2011. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
