Editor:
Interventional trials (specifically randomized control trials, RCTs) represent the reference standard of evidence for estimating the safety and efficacy of medical interventions, and this is no less true for the assessment of artificial intelligence (AI) systems. Historically, the preclinical testing of AI, for example in diagnostic accuracy studies, has failed to accurately estimate the clinical benefits and harms related to the use of these systems (1). Many “real-world” factors have been implicated, including the role of human misuse of systems as well as the exact nature of the deployment (eg, how the model integrates into current workflows, as well as the level of human oversight and quality control). Many of these factors can only be directly evaluated during an interventional trial, and they can be quite dissimilar to the challenges faced during non-AI RCTs. It seems clear that as this exciting field matures, any welcome shift toward prospective research in the form of interventional trials will need to be accompanied by guidelines specific to the challenges of AI deployment.
To date, less than a handful of RCTs have been published on AI systems in clinical practice, and those that have been are subject to significant reporting bias, lack of standardization, and obfuscation of statistical elements, potentially leading to misleading results and claims (2). In light of the massive growth of medical AI, both in development and on the marketplace, this lack of clinical evidence is concerning. Although existing guidelines are currently being updated for AI, such as the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD-AI/ML) statement (3) and the Standards for Reporting Diagnostic Accuracy Studies (STARD-AI) (4), these standards were designed to guide the preclinical reporting of prediction models and diagnostic accuracy studies and do not identify best practices for interventional trials. Similarly, the Checklist for Artificial Intelligence in Medical Imaging (CLAIM), while specific to medical imaging applications of AI, is not particular to RCTs (5).
What has been lacking thus far is distinct guidance for RCTs involving AI systems, and that is precisely what the SPIRIT-AI and CONSORT-AI working group announced they would develop in October 2019, which we were invited to take part in, being both radiologists and researchers in this field.
The Standard Protocol Items: Recommendations for AI Interventional Trials (SPIRIT-AI) and Consolidated Standards of Reporting AI Trials (CONSORT-AI) guidelines have been developed over the last year with the EQUATOR Network using a Delphi methodology and with an international multidisciplinary consortium of academics, industry representatives, journal editors, and medical device regulators (6,7). SPIRIT-AI provides evidence-based guidance for the content of trial protocols involving AI, and CONSORT provides an evidence-based minimum set of recommendations on the reporting of interventional trials. Both are harmonized with current global regulatory terminology, including reference to unique device identifiers (UDIs), intended use statements, and indications for use statements, and include a number of items addressing AI-specific concerns. These concerns include the selection and quality of the input data in a real-world setting, handling of missing input data in real-time, integration of the AI system into the trial setting, description of the human-AI interaction, analysis of performance errors and, importantly, failure case analysis to identify unexpected harms arising within a real-world deployment. Each of these elements, where possible, was informed by existing literature on the specific challenges identified in the implementation and integration of AI models in clinical practice. The resulting combined guidance was formed with engagement from regulators such as the U.S. Food and Drug Administration (FDA), European Medicines Agency (EMA), and U.K. Medicines and Healthcare products Regulatory Agency (MHRA), and has already received support from several of the major medical journals for their use.
As we move into the next phase of prospective evaluation of AI technologies, these guidelines will be critical in ensuring transparent and complete reporting of RCTs, to produce the highest quality evidence. We believe this is hugely relevant to the field of radiology, not only due to the immense interest in this space, but also the apparent market-readiness of many algorithms despite the current lack of transparently and accurately reported prospective interventional studies in the form of RCTs.
Footnotes
Disclosures of Conflicts of Interest: H.H. Activities related to the present article: authors received expenses reimbursement for attendance at Delphi process working group from the EQUATOR Network. Activities not related to the present article: employed at Hardian Health (managing director); stock/stock options in Smart Reporting and AlgoMedica; advisor for Segmed.ai. Other relationships: disclosed no relevant relationships. L.O.R. Activities related to the present article: author received travel reimbursement to the consensus meeting in Birmingham. Activities not related to the present article: disclosed no relevant relationships. Other relationships: disclosed no relevant relationships.
References
- 1.Kohli A, Jha S. Why CAD failed in mammography. J Am Coll Radiol 2018;15(3 Pt B):535–537. [DOI] [PubMed] [Google Scholar]
- 2.Nagendran M, Chen Y, Lovejoy CA, et al. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ 2020;368:m689. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Collins GS, Moons KGM. Reporting of artificial intelligence prediction models. Lancet 2019;393(10181):1577–1579. [DOI] [PubMed] [Google Scholar]
- 4.Sounderajah V, Ashrafian H, Aggarwal R, et al. Developing specific reporting guidelines for diagnostic accuracy studies assessing AI interventions: The STARD-AI Steering Group. Nat Med 2020;26(6):807–808. [DOI] [PubMed] [Google Scholar]
- 5.Mongan J, Moy L, Kahn CE Jr. Checklist for Artificial Intelligence in Medical Imaging (CLAIM): a guide for authors and reviewers. Radiol Artif Intell 2020;2(2):e200029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Liu X, Cruz Rivera S, Moher D, Calvert MJ, Denniston AK; SPIRIT-AI and CONSORT-AI Working Group . Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Nat Med 2020;26(9):1364–1374. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Cruz Rivera S, Liu X, Chan AW, et al. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Nat Med 2020;26(9):1351–1363. [DOI] [PMC free article] [PubMed] [Google Scholar]