Abstract
Current regulatory frameworks for artificial intelligence-based clinical decision support (AICDS) are insufficient to ensure safety, effectiveness, and equity at the bedside. The oversight of clinical laboratory testing, which requires federal- and hospital-level involvement, offers many instructive lessons for how to balance safety and innovation and warnings regarding the fragility of this balance. We propose an AICDS oversight framework, modeled after clinical laboratory regulation, that is deliberative, inclusive, and collaborative.
Subject terms: Health policy, Policy, Computational science, Information technology, Software, Laboratory techniques and procedures
With the growing availability of large electronic health record (EHR) datasets and powerful machine learning methods, the promise of effective and reliable artificial intelligence-based clinical decision support (AICDS) may be soon attainable. However, because of biases in EHR data, shifts in model performance over space and time, and the lack of evidence about the benefits and harms of AICDS in practice, this promise comes with considerable risk1–4. As AICDS regulatory approaches continue to evolve, there is still no clear solution that balances safety and innovation and federal and local oversight. In this Comment, we review the oversight of clinical laboratory testing and discuss several lessons for developing novel regulatory approaches for AICDS to achieve these goals5.
Oversight of AICDS
In the United States (US), although the Food and Drug Administration (FDA) and the Office of the National Coordinator for Health Information Technology (ONC) continue to develop regulatory frameworks for AICDS, these approaches alone will not guarantee the safety, effectiveness, and equity of all AICDS when deployed at the bedside6–10. First, many AICDS tools do not currently fall under the FDA or ONC purview, but still need oversight to ensure basic standards of quality and safety. Second, the predictive performance of many AICDS tools varies considerably in new settings and over time1,3. Thus, there is a need for a regulatory framework that ensures there is sufficient oversight of AICDS as it is deployed locally in individual hospitals and clinics. Many health systems are working to figure out how to oversee AICDS, but these efforts are mostly at well-resourced academic centers and vary widely in their maturity11. Emerging medical AI networks9 are taking on important roles in developing standards and assessing AI performance, but the networks themselves cannot directly fulfill the need for local oversight in every clinical practice to ensure clinical effectiveness and safety12.
Thus, the optimal oversight of AICDS requires both rigorous federal standards and local evaluations. The former will ensure minimum safety criteria are met and applied equitably across the US. The latter will ensure that AICDS tools function safely and effectively everywhere they are used. Traditional regulatory pathways for medical devices serve as a useful model for these federal efforts but lack a locally applicable counterpart. The oversight of clinical laboratory testing, however, is a mature oversight model that better parallels the regulatory needs of AICDS.
The laboratory medicine regulatory model
For over 50 years the Centers for Medicare & Medicaid Services (CMS) has overseen the performance of clinical laboratory testing, complementing the role of the FDA in authorizing devices for laboratory testing. This oversight approach ensures the safety and efficacy of laboratory testing in every practice via close partnerships between federal regulators, organizations with laboratory testing expertise, and local hospital experts.
The performance of clinical laboratory testing, which often involves the use of FDA-authorized manufactured devices, is regulated by CMS under the Clinical Laboratory Improvement Amendments (CLIA) in Title 42, Part 493 of the Code of Federal Regulations13. Because of the large number of laboratories and the complexity of laboratory testing, CMS delegates some oversight tasks to third-party organizations with expertise in laboratory testing, such as the College of American Pathology.
Under CMS, the safety and efficacy of all laboratory tests are ensured through a combination of local expertise, policies and procedures, documentation, and external review. More specifically, this includes certification requirements and accreditation standards for laboratories; licensing requirements for laboratory directors and supervisors; licensing, training, and explicit competency assessments for laboratory testing staff; and requirements for comprehensive procedures and process documentation (Table 1). In addition, for many tests there are further quality systems requirements for monitoring performance, including frequent verification that evaluations of quality control specimens are consistent with pre-defined expectations; periodic evaluation for shifts or drifts in quality control or patient testing results; comparison of test results to those of other laboratories; and acceptability criteria for what specimens are fit for testing. Compliance with these federal standards and best practices is evaluated through regular laboratory inspections performed by a combination of professional and peer inspectors from CMS-approved organizations.
Table 1.
Components of a distributed system to ensure safe, effective, and equitable artificial intelligence-based clinical decision support
Component | Laboratory testing | AICDS (proposed correlate) | |
---|---|---|---|
Pre-deployment development and testing | Method validation (analytical) |
All laboratory tests and devices are required to undergo a thorough validation of their analytical performance (see below). For all marketed devices, these studies are reviewed by the FDA. Analytical accuracy: Comparing test results in real-world conditions to that of a gold-standard method or other method with established accuracy. Reference Interval: Establishing expected “normal” results, often determined by testing a cohort of “healthy” reference subjects. For some tests, an established decision threshold may be adopted and validated. Precision: Quantifying variability in results observed in repeat testing over time. Reportable Range: Establishing the range of reported results with sufficient accuracy and precision. Analytical Specificity: Evaluating for result errors due to similar analytes or method-specific interfering substances. Specimen Acceptability Requirements: Establishment of acceptable specimen collection, transport, and storage conditions. |
All AICDS models should be sufficiently validated prior to deployment: Model accuracy: Establishing AICDS model performance relative to a key outcome. The precise outcome will depend on the clinical use case; it may be a clinical outcome, more pragmatic process outcome, or comparator process or model. The precise metric will depend on the role of the AI model; examples include area under the receiver operating curve (AUROC) or the positive predictive value (PPV) at the decision threshold. This evaluation should reflect the diversity of real-world conditions, including relevant patient subgroups and practice contexts. Precision: Quantifying variability in results in sensitivity analyses and a temporal (stability) evaluation. Reportable Range: Establishing the range of reportable results (e.g., range of results with a sufficiently narrow confidence interval). Analytical Specificity: Evaluating for model errors associated with expected mechanisms, such as with adversarial examples. Input Predictor Requirements: Determining acceptable input requirements (e.g., space of input predictors with sufficient training data) and establishing appropriate input checks. |
Clinical utility | All tests must undergo some evaluation for efficacy, relative to a clinical or surrogate outcome. However, for most tests there are no concrete requirements for clinical effectiveness. | All models should undergo retrospective (and ideally prospective) evaluations for efficacy. We believe all models should also undergo pragmatic evaluations for effectiveness (either pre- or post-deployment, depending on the specific context), ideally incorporating the entire AICDS intervention and ideally relative to a key clinical outcome. | |
Local verification | All non-FDA-waived tests must undergo local verification of test accuracy, precision, and reportable range for each specimen type. Reference intervals may be adopted from device validation studies or published literature or may be established locally. | As there are impactful local factors for many models, the performance of all models should be locally verified. This verification should include assessment of model accuracy (including general model metrics and use case-specific metrics), precision, reportable range, and input predictor checks. | |
Change control | Any potentially impactful changes to the complete (i.e., end-to-end) testing process should prompt a verification study. Relevant changes include alteration of method process or reagents, changing of instruments, physically moving instruments, change to specimen tubes or processing thereof, and major software changes. Verification studies assess for potential errors associated with specific changes and frequently include accuracy, precision, and reportable range. | Any non-minor change to workflow, process, definition of an input predictor, or relevant IS software should prompt a verification study to demonstrate that there is no substantial impact on model performance. | |
Registration | All marketed devices (including test reagents, calibrators, and instruments) are identifiable by a unique device identifier (UDI) that includes a device identifier and product (instance) identifier. However, there is currently no comprehensive registration for locally developed LDTs. | All AICDS tools should be registered (with minimal process and expense burden) and assigned unique tool and version identifiers. | |
Real-world monitoring | Regular performance checks |
Patient tests results are reported only if quality control (QC) results are acceptable. QC specimens are tested regularly, at least every day of testing. Results are compared to pre-determined acceptability criteria. Failure to meet those criteria triggers further evaluations. |
Model results should only be reported if QC results are acceptable. Models should be tested regularly, at least on every day of use. QC results should be compared to pre-determined acceptability criteria. Model performance evaluations could span different portions of the end-to-end process, depending on risks and capabilities. The model could be narrowly assessed using stored patient or synthetic input data. The pre-model steps of data extraction/processing could be evaluated by extracting synthetic/test patient data from the source database or, to better emulate the data generation process, by programmatically generating test patients in the native source system. |
Input checks | Received patient specimens are evaluated to confirm they meet the input requirements (e.g., proper collection container maintained at an appropriate temperature). If they do not meet criteria, then the test results are not released and the ordering clinician is notified. | Predictor inputs should be compared to the pre-established bounds looking for unexpected outliers, which could include deviations in data type or data values. Failure to meet pre-established criteria should prevent the model from automatically releasing results. | |
Periodic performance checks |
As part of a quality assurance program, the results of QC testing and patient testing are reviewed (often weekly or monthly) to identify shifts or drifts in performance. In addition, there are periodic (often every 6 months) requirements for verification of method calibration across the measuring range. Concerning patterns trigger further evaluations. The vast majority of results are currently not labeled with UDIs, which makes it prohibitive to evaluate and monitor real-world performance beyond individual health systems. |
A quality assurance program should be implemented to detect relevant drifts or shifts in input predictors, outcomes, or the relationship between predictors and outcomes. Results should be labeled with a unique AICDS tool identifier to facilitate quality monitoring and evidence generation. |
|
Peer performance checks | At least twice a year, patient or patient-equivalent specimens are tested (without access to the ‘correct’ result) and results are generally sent to a third-party for evaluation. Results are most commonly compared to those of other laboratories performing the same or an equivalent method. Failure to fall within a pre-established acceptability criteria triggers further evaluation and serial failures require patient testing to be stopped. | For models performed at multiple sites or models with relevant comparators, synthetic patient data could be distributed and model results could be compared amongst peers. Failure to fall within a pre-established acceptability criteria would trigger further evaluation and serial failures would require clinical application to be stopped. | |
Personnel | Directorship | The laboratory director must have sufficient educational background (i.e., MD/DO) and pathology board certification or a PhD in a relevant field and clinical board certification. | There should be a medical director with sufficient educational background and certification, likely supported by an oversight committee and delegation of some responsibilities. |
Testing personnel | All testing personnel must have a minimum level of education and training, depending on test complexity. Competency must be assessed for each test by a combination of direct observation, examination, and/or test specimens. | Personnel responsible for model verification, monitoring, and maintenance must have a minimum level of education and training. Competency should be assessed by a combination of direct observation, examination, and/or test patients. | |
Accountability | The Laboratory Director oversees overall laboratory operations and is responsible for the quality of laboratory test results. This includes (a) selection of laboratory testing methods, (b) the implementation of laboratory testing, (c) reporting test results promptly and accurately, (d) employment of personnel competent to perform testing, (e) regulatory compliance, and (f) delegation of authorities to ensure high-quality laboratory testing. | The AI Medical Director oversees the overall AICDS operations and is responsible for the safety and effectiveness of AICDS tools as they are applied for clinical use. This includes (a) what AICDS tools are used; (b) how AICDS tools are implemented; (c) the accuracy of AICDS outputs, including predictions and recommendations; (d) employment of competent personnel; and (e) delegation of authorities to ensure high-quality AICDS tool use. | |
Certification | Laboratories must obtain one of several CLIA certificates, depending on the complexity of the diagnostic tests they perform. | AICDS practices should obtain a CLIA-equivalent certificate, depending on the complexity and risk of the AICDS tools they apply. | |
Inspection | Laboratories are regularly inspected by a deemed status organization approved by CMS, including the College of American Pathologists or the Joint Commission on Accreditation of Healthcare Organizations. Most inspections are carried out by peers and involve the evaluation of a long checklist that encompasses CLIA requirements and best practices. Deficiencies must be corrected to continue clinical operations. | AICDS practices should be regularly inspected by a deemed status organization approved by CMS. Inspection requirements could be adaptive, such as restricting pre-deployment review for new, high risk, high complexity tools. Some inspections could be virtual and asynchronous. | |
Procedures | All clinical testing practices must be specified in a version-controlled procedure sufficiently detailed such that a new, sufficiently trained individual could independently perform the testing. | The fully deployed process, including automated and manual steps, needs to be fully specified in version-controlled policies and procedures. Procedures for automated processes should include code, environment specifications, and version control. | |
Documentation | All impactful actions must be documented, to enable traceability of the entire testing process. | All impactful actions should be documented to enable traceability of the entire AICDS process. This includes logging of automated process steps, such as tool versions. | |
Reimbursement | Medicare can reimburse for the professional (i.e., oversight) and technical (i.e., performance) components of appropriate laboratory testing. For hospital-based care the reimbursement is indirect via Medicare Part A fixed payments and for ambulatory care the reimbursement is direct via Medicare Part B payments. To perform laboratory testing and to bill Medicare, laboratories must have appropriate CLIA certification. |
As the costs of safely and effectively applying AICDS are non-trivial and for many such tools indirect cost savings can be very delayed, reimbursement would help align short-term cost-effectiveness for practices with long-term benefits, to drive the adoption of appropriate AICDS tools. The professional (i.e., oversight) and technical (i.e., performance) components of AICDS tool application could be reimbursed by Medicare using a combination of Part A and Part B payments that require a CLIA-equivalent certification. |
Bolded text indicates key themes and proposals.
For each test, CMS’s oversight requirements depend on the test’s complexity and the risk of patient harm, which depends on whether it has been authorized by the FDA and, if so, the FDA’s risk classification. Some tests (i.e., FDA-waived tests) are simple enough and the risk of harm is low enough that they are authorized for performance by clinicians not specifically trained in laboratory testing or for at-home use. For all other tests, including those authorized for clinical use by the FDA, CMS mandates a more extensive set of requirements that include verification and monitoring of performance in every clinical laboratory in which they are used.
Just as local context is critical for the performance of some AICDS tools, many laboratory tests benefit from local refinements to optimize their utility in individual clinical practices. To tailor clinical testing for local context, some clinical laboratories are permitted to modify FDA-authorized tests to improve or expand their utility, such as to enable testing of different specimen types (e.g., body fluid rather than blood) or to apply a reference (or normal) interval that is more appropriate for a practice’s population13,14. The laboratory director is then responsible for validating the performance of these modified tests13,14. In addition, appropriately certified laboratories have been empowered to innovate to address the many clinical needs that are unmet by FDA-authorized tests, such as tests for rare or emergent diseases, as well as rapidly evolving changes in clinical practice. To fulfill these clinical needs, these laboratories have been able to develop and locally deploy their own laboratory-developed tests (LDTs). CMS requires that all LDTs undergo a comprehensive validation by every laboratory in which they are performed13,14.
The CLIA regulatory framework has enabled clinical laboratories to safely and efficiently implement FDA-authorized tests developed by manufacturers and to adapt to meet local, ever-changing clinical needs. It threads a reasonable balance of safety and innovation, effectiveness and regulatory burden, and centralized and distributed accountability and oversight.
Application of the laboratory medicine model to AICDS oversight
Adopting elements from CMS’s oversight of laboratory testing to AICDS would fill major gaps in the current federal regulatory approaches. First, the application of AICDS tools not subject to FDA or ONC requirements, such as those not meeting device criteria, not embedded in the EHR, developed and deployed within a clinical practice, or exempt from device designation under the 21st Century Cures Act, would still be subject to rigorous oversight similar to that of LDTs. Locally developed AICDS tools are common at academic centers with data science research programs. Local AICDS tools are also important because they can dramatically improve patient care and are essential for the development and refinement of technologies that can subsequently scale. However, current federal standards are insufficient to ensure conscientious use. Furthermore, current regulatory frameworks do not designate any specific individual or committee as explicitly responsible for the local use and monitoring of AICDS tools. An AI medical director, akin to a hospital’s CLIA director, could be accountable for adverse clinical consequences of such AICDS tools. They would be responsible for ensuring such tools have sufficient evidence for clinical utility and are implemented with sufficient quality systems to mitigate the risk of patient harm15.
Second, federal standards of safety, effectiveness, and equity could be ensured in the application of all AICDS tools at the bedside. Even if regulatory criteria set forth by FDA, ONC, or other regulatory bodies, are fulfilled at the time of authorization or certification of a tool, a framework for compliance and monitoring will be needed locally where AICDS tools are deployed. These requirements could include fully specified procedures, pre-implementation verification of clinical performance measures, evaluation of input parameters for local use, assessments of algorithmic equity across protected demographic subgroups, and ongoing monitoring for performance drift over time. To fulfill these requirements, staff would need relevant education, which will require new curricular development, and should be explicitly assessed for competency16,17. Individual clinical practices themselves would naturally be ultimately responsible for this clinical oversight and accountability. But, for tools hosted by a third-party vendor, some of the responsibility should fall upon the AI medical director of the provider of the tool. This model of shared accountability is similar to that of laboratory testing performed by an external reference laboratory.
One notable way in which some AICDS tools will differ from common laboratory tests is in their potential for very broad indications for use. This difference is foreshadowed by recent advances in generative AI including large language models (LLMs). Such tools, and potentially someday truly general AI tools, could be useful for extremely broad indications for decision support. Such broad indications would present a challenge for current FDA review that requires defining a narrow indication for use. Relevant correlates in laboratory testing include untargeted drug screens and genome sequencing; although not as broad, for such tests many clinical results will have never been seen before and therefore not explicitly validated. While this breadth of use increases the complexity of tool validation and monitoring, the same overarching principles for ensuring quality apply. With a clear definition of the clinical indication(s), a medical AI director with sufficient understanding of the methods, inputs and outputs, the potential risks for errors, and the potential impact of errors on downstream clinical decision-making, could define a set of policies and procedures to reasonably ensure safety and efficacy. These would include performing a sufficient validation of key performance metrics and defining requirements for evaluating the appropriateness of inputs (i.e., is this a scenario for which the model has been sufficiently trained), bounds for outputs (i.e., scenarios for which the model’s prediction is sufficiently confident), and monitoring for changes in model performance.
Another notable way that AICDS tools are different from common laboratory tests is in the scale, diversity, and complexity of AICDS model inputs. Inputs to AICDS models can include unstructured data that is naturally more variable and can be of very high dimensionality. The data-generating process for these inputs is also generally much more complicated, as it involves nuanced interactions between patients, clinicians, and information systems. With this in mind, the local context of individual practices can be extremely important. For instance, if a tool were gathering information from a practices’ EHR, the local differences in how medicine is practiced, how that practice is documented, and how the EHR is configured could have massive implications for model accuracy and equity. In addition, these processes cannot be controlled as directly as laboratory tests, for which one can adjust for small changes in test performance using reference materials with known “correct” results. Such complexity in model inputs calls for more sophisticated approaches for validating performance and equity across the diversity of local inputs and monitoring over time to ensure consistent, accurate performance. For these reasons, local experts that are accountable and supported by best practices, sufficient tooling, and regulatory oversight will be essential.
Just as organizations with laboratory expertise are approved by CMS to perform or oversee laboratory inspections and external evaluations of laboratories’ test results acceptability, a new cadre of organizations with expertise in clinical AI tools are needed. In light of the wide variation across hospitals in investment in local AI infrastructure, there is a need for independent, third-party entities to perform accreditation via conscientious and equitable assessments.
Lessons for a balanced regulatory approach for AICDS
There are several limitations to the CLIA framework to consider in designing approaches for oversight of AICDS. One challenge, as exemplified in the case of Theranos18, is the difficulty in ferreting out rare cases of fraud when the inspected entity knows what is being evaluated and inspectors have variable training and experience. Another challenge is the vague requirements for clinical utility, which leaves the responsibility of determining utility to individual actors who may have competing motivations and biases. A third challenge is conscientiously considering how clinicians and patients interpret results, highlighted by scrutiny of independent laboratories using prenatal screening LDTs for rare disorders that appeared to ineffectively communicate that most positive findings are false positives19.
To address these problems, the FDA issued a final rule regarding Laboratory-Developed Tests on May 6, 2024 indicating that it intends to begin directly overseeing all laboratory tests as manufactured devices, including LDTs20. The addition of some new regulatory processes, such as registration of LDTs and improvements to adverse event and complaint reporting processes, could facilitate better oversight with hopefully only modest increases in work and costs for clinical laboratories. However, the final rule includes a much more extensive set of new requirements designed for device manufacturers including submissions for pre-deployment review. Adding expensive, labor-intensive, and often duplicative regulatory requirements will considerably increase the cost and the complexity of locally developing or locally improving laboratory testing. This would undoubtedly stifle innovation and nimbleness in individual hospitals and clinics, which would in turn decrease access to care, patient choice, and slow advances in clinical care.
A relevant correlate to the oversight of AICDS is found in the challenge of crafting regulation for laboratory testing when laboratories vary considerably in their mission and business models. This diversity spans non-profit laboratories caring for patients within a healthcare system to for-profit, independent laboratories that market testing services externally. In an attempt to adapt regulation to such different contexts, the FDA rule includes a carveout such that the FDA does not currently plan to enforce certain requirements for LDTs that are developed and performed within an integrated healthcare system for a clinical need that is unmet. However, as written any FDA-authorized device would render the clinical need met regardless of how clinically useful, practical, or expensive it is.
One reason that local laboratories currently design and deploy LDTs is to use newer methodologies, approaches, or tools that can yield laboratory test results that are demonstrably superior to those of FDA-authorized options. For example, for measuring concentrations of steroid hormones like testosterone and estradiol the FDA-authorized tests in clinical use are immunoassays, but some clinical laboratories have validated and implemented LDTs that use mass spectrometry. The mass spectrometry LDTs can yield more accurate results, which are clinically impactful for certain patient populations including patients with lower analyte concentrations21. Similar mass spectrometry LDTs are essential for measuring analytes for which there are no FDA-authorized tests, including for pediatric rare disease testing22. These clinical needs are being met using LDTs in part because it is more challenging and more expensive to create and market a device to be used across many, diverse clinical laboratories than to create a validated LDT that performs well in an individual laboratory. To be clear, such LDTs can achieve performance that is comparable or superior to a marketed device more easily because the design specifications are naturally different for a procedure designed and used by a single laboratory as compared to a set of devices that are designed by a manufacturer and then implemented by an independent laboratory. Under the FDA’s final rule, when a new technology becomes available that could be locally applied by laboratory experts to improve the quality of test results for a clinical indication that has an existing method (i.e., the clinical need is considered met), the regulatory requirements and cost to do so will be much higher. As a consequence of equating the regulatory requirements for a marketed device and a LDT, this rule will actually result in worse quality laboratory testing.
Under the CLIA paradigm, laboratory directors, who are either physicians board certified in pathology or PhD scientists with clinical board certification, were empowered to apply their clinical and laboratory testing expertise and experience to provide the best available laboratory testing for their patients. The FDA’s final rule shifts what had been within the scope of their clinical practice into the realm of rigid federal oversight. While the rule is well-intended, the shift towards centralized oversight and commoditization of testing would stifle local quality improvement and decrease access to high-quality care.
To understand what the application of this new, centralized LDT oversight approach to AICDS oversight would mean, consider the example of the clinical goal of optimizing early care of sepsis. The FDA has recently authorized a software device “to aid in the prediction or diagnosis of sepsis [using] advanced algorithms to analyze patient-specific data”23. Thus, the FDA would consider the clinical need for early sepsis prediction to be met. If the framework from the FDA’s LDT final rule were applied to AICDS, no non-FDA-authorized sepsis prediction AICDS tools could be used anywhere clinically without a pre-market submission to the FDA demonstrating non-inferiority. This restriction would limit the use of many commercially available proprietary models and locally developed models, many of which are already in use and some of which probably perform better locally than the FDA-authorized tool. Furthermore, it would likely stifle future development of new tools24. Local autonomy to develop and refine AICDS is crucial for enabling innovation in the form of both quality improvement and research. The FDA is focused on non-inferiority; local clinicians, empowered by best practices and tooling, are needed to realize best-in-class clinical care.
Conclusion
We believe that we can learn from the challenges and changes in laboratory testing oversight how to develop an effective and parsimonious regulatory framework for AICDS. It is critical to both improve the systems for facilitating centralized oversight and strengthen distributed oversight by empowering (rather than marginalizing) appropriately certified and accountable medical practitioners. The FDA’s recent LDT final rule should serve as a cautionary tale for the AICDS community. Overseeing AI in medicine is in many ways a much harder problem than overseeing laboratory testing; the scope is bigger, the inputs are less controllable, and the methods and modes of use are more variable. This makes effective oversight more challenging, but also more important. There is no oversight approach that will fully eliminate risk. In designing oversight approaches, it is essential to deliberatively consider both risk that could be mitigated by additional layers of oversight as well as the negative impact of such additional oversight on local innovation, optimization of care, and access to care.
In summary, comprehensive approaches are needed to ensure that local implementations of AICDS tools are safe, effective, and equitable. The current oversight framework for laboratory medicine offers several lessons for how to scale oversight across the US health system while balancing regulatory burden, safety, and innovation. Recent changes to further centralize the oversight of laboratory testing under a device framework and discount the role of clinical laboratorians should caution the community against the risks of too much centralized oversight. CMS could begin this process by implementing some requirements for AICDS oversight as a condition of participation (CoP), as it has with many other policies and procedures intended to improve patient safety25. But to achieve comprehensive AICDS oversight, more fundamental change is needed. Reflecting on the oversight of laboratory medicine, we recommend Congress consider legislation that would allow CMS to create CLIA-equivalent certification for clinical practices to use AICDS and to potentially reimburse practices to drive the adoption of safe and effective AICDS. CMS should join the FDA and ONC in regulating the application of AI in medicine to realize its promise everywhere, all the time, and for everyone.
Author contributions
D.S.H. and G.E.W. conceived the idea of the manuscript. D.S.H., J.T.R., and G.E.W. wrote and reviewed the manuscript.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Wong, A. et al. External validation of a widely implemented proprietary sepsis prediction model in hospitalized patients. JAMA Intern. Med.181, 1065–1070 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Nastasi, A. J. et al. A vignette-based evaluation of ChatGPT’s ability to provide appropriate and equitable medical advice across care contexts. Sci. Rep.13, 17885 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Finlayson, S. G. et al. The clinician and dataset shift in artificial intelligence. New Engl. J. Med.385, 283–286 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Chen, S. et al. Use of artificial intelligence chatbots for cancer treatment information. JAMA Oncol.9, 1459 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Schulz, W. L., Durant, T. J. S. & Krumholz, H. M. Validation and regulation of clinical artificial intelligence. Clin. Chem.65, 1336–1337 (2019). [DOI] [PubMed] [Google Scholar]
- 6.Price, W. N., Sendak, M., Balu, S. & Singh, K. Enabling collaborative governance of medical AI. Nat. Mach. Intell.5, 821–823 (2023). [Google Scholar]
- 7.Panch, T. et al. A distributed approach to the regulation of clinical AI. PLoS Digit. Health1, e0000040 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Gerke, S., Babic, B., Evgeniou, T. & Cohen, I. G. The need for a system view to regulate artificial intelligence/machine learning-based software as medical device. NPJ Digit. Med.3, 53 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Shah, N. H. et al. A nationwide network of health AI Assurance laboratories. JAMA331, 245 (2024). [DOI] [PubMed] [Google Scholar]
- 10.Lee, J. T. et al. Analysis of devices authorized by the FDA for clinical decision support in critical care. JAMA Intern. Med.183, 1399 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Nong, P., Hamasha, R., Singh, K., Adler-Milstein, J. & Platt, J. How academic medical centers govern AI prediction tools in the context of uncertainty and evolving regulation. NEJM AI1, AIp2300048 (2024). [Google Scholar]
- 12.Longhurst, C. A., Singh, K., Chopra, A., Atreja, A. & Brownstein, J. S. A call for artificial intelligence implementation science centers to evaluate clinical effectiveness. NEJM AI1, AIp2400223 (2024). [Google Scholar]
- 13.Graden, K. C., Bennett, S. A., Delaney, S. R., Gill, H. E. & Willrich, M. A. V. A high-level overview of the regulations surrounding a clinical laboratory and upcoming regulatory challenges for laboratory developed tests. Lab. Med.52, 315–328 (2021). [DOI] [PubMed] [Google Scholar]
- 14.Genzen, J. R. Regulation of laboratory-developed tests. Am. J. Clin. Pathol.152, 122–131 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Price, W. N., Gerke, S. & Cohen, I. G. Potential liability for physicians using artificial intelligence. JAMA322, 1765 (2019). [DOI] [PubMed] [Google Scholar]
- 16.Paranjape, K. et al. The value of artificial intelligence in laboratory medicine. Am. J. Clin. Pathol.155, 823–831 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Bellini, C., Padoan, A., Carobene, A. & Guerranti, R. A survey on artificial intelligence and big data utilisation in Italian clinical laboratories. Clin. Chem. Lab. Med. (CCLM)60, 2017–2026 (2022). [DOI] [PubMed] [Google Scholar]
- 18.Mazer, B. Theranos exploited black box medicine. BMJ379, o3003 (2022). [DOI] [PubMed] [Google Scholar]
- 19.Food & Drug Administration. Genetic Non-Invasive Prenatal Screening Tests May Have False Results: FDA Safety Communication. https://www.fda.gov/medical-devices/safety-communications/genetic-non-invasive-prenatal-screening-tests-may-have-false-results-fda-safety-communication (2022).
- 20.Department of Health and Human Services, FDA. Medical Devices; Laboratory Developed Tests. 21 CFR Part 809 [Docket No. FDA-2023-N-2177]. https://www.federalregister.gov/documents/2024/05/06/2024-08935/medical-devices-laboratory-developed-tests (2024).
- 21.French, D. Clinical utility of laboratory developed mass spectrometry assays for steroid hormone testing. J. Mass Spectrom. Adv. Clin. Lab28, 13–19 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Marzinke, M. A. et al. The VALIDity of laboratory developed tests: leave it to the experts? J. Mass Spectrom. Adv. Clin. Lab27, 1–6 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Prenosis Sepsis ImmunoScore. Software Device To Aid In The Prediction Or Diagnosis Of Sepsis. Regulation Number 880.6316. https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfpmn/denovo.cfm?id=DEN230036 (2024).
- 24.Fleuren, L. M. et al. Machine learning for the prediction of sepsis: a systematic review and meta-analysis of diagnostic test accuracy. Intensive Care Med.46, 383–400 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Fleisher, L. A. & Economou-Zavlanos, N. J. Artificial intelligence can be regulated using current patient safety procedures and infrastructure in hospitals. JAMA Health Forum5, e241369 (2024). [DOI] [PubMed] [Google Scholar]