Skip to main content
. 2024 Mar 4;1(1):ubae003. doi: 10.1093/bjrai/ubae003

Table 1.

General overview of the key considerations for acceptance testing and quality assurance of AI tools in medicine.

Stage Description Critical considerations Potential stakeholdersa
Preparation Information review prior to installation: The vendors must provide instructions for use with detailed guidance on system installation, AT, acceptance criteria at installation and subsequent upgrades, proper user interface configuration, vendor-provided reference dataset, and the expected performance level of the AI tool along with tolerance limits. In-house teams ensure infrastructure compatibility, acquire representative local datasets, identify gaps, and establish test protocols and plans. Considerations regarding the composition of training data, the target variable used for training, and the dataset size are necessary since increasingly complex AI models are at risk of overfitting.11 In addition, at this stage, factors related to the compatibility of models with local equipment and software environment, regulatory compliance, and stakeholder engagement should be understood. Performance metrics for efficacy and efficiency must be established.
  • Administrators, manufacturers or vendors,

  • AT and IT teams. Patient representatives or ethic teams may be considered too

Implementation Integrating the AI tool within the local setting, interoperability, cybersecurity, calibrating the system, and confirming functionality with a vendor-provided reference dataset.1 IT auditing processesb, system calibration, ensuring proper input data compatibility, verifying AI output and user interface functions, data privacy and security, vendor support. AT and IT teams
Retrospective Evaluation AI performance testing with local test sets. Baseline AT results are documented to enable comparisons. Performing additional failure mode analyses or case review audits. Baseline metrics include obtaining quantitative and subjective measures from clinical users. In addition, identifying potentially unintended biases or unfairness using subgroups of patients, and performance metrics that capture ethical measures.c Clinicians, AT and IT teams
Prospective Evaluation Evaluation of the AI tool in a real-world clinical setting to gain experience or when retrospective test sets are not readily available. In general, this step should be completed after the tool is installed but before clinical use to ensure clinical decisions are not influenced.12 AI performance in clinical workflow is recorded and analyzed by clinicians and AT team, compared to follow-up clinical outcomes for sufficiently large number of cases. Procedures should be established to identify and address harmful or incorrect recommendations. Clinicians, AT and IT teams, administrators, manufacturers or vendors
Ethical Considerations Ensuring alignment with ethical standards, regulations, and best practices, including informed consent (if needed) and transparency. Ethical guidelines considering the need for informed consent, transparency in algorithms, accountability mechanisms, and bias assessment. Clinicians, regulators, patient advisory groups, manufacturers, administrators
User Training and Support at AT Providing comprehensive training and ongoing support to end-users, including feedback mechanisms before the tool is deployed for routine clinical use. User training should include hands-on experience observing AI performance in real-world cases, to understand its intended use and limitations, establish proper levels of trust/confidence, and avoid off-label use or misuse. This can be conducted during the prospective evaluation period. Manufacturers or vendors, end-users
Risk Management Identifying, assessing, and mitigating potential risks associated with the AI tool, including legal and clinical risks. Identifying the risk of off-label use, inflated performance metrics,10 risk mitigation strategies, emergency protocols, liability considerations, and patient safety measures.
  • Clinicians, administrators, risk management team

End-to-end Workflow during Installation Consideration of the entire workflow including training. Comprehensive workflow consideration and optimization of all aspects of the AI tool usage. AT and IT teams, manufacturers, or vendors

Abbreviations: AT = acceptance testing, IT = information technology.

The manufacturer creates the tool, establishing QA protocols, seeking regulatory approval, and offering product updates or technical support. A vendor may be responsible for distributing the tool and aiding with installation, user training, and support. The testing procedures required will depend on the tool, the risk it poses, and regulatory and manufacturer or vendor requirements.

a

There may be more stakeholders or involvement than indicated, depending on resources at the local institution. The QA team generally includes clinicians, physicists, and technologists. Other technical personnel including AI domain experts, data scientists, statisticians, etc., may be involved if available and if needed.

b

Integration of the device within the local setting, similar to the IT auditing processes established for cloud computing or cybersecurity, should be confirmed before testing the AI tool's functionality with vendor-supplied and locally acquired test datasets. Calibration is how well the predicted absolute risk corresponds to the true absolute risk.11 “Vendors” refers to groups that sell the AI tools, and “manufacturers” refers to those who develop the AI tools.

c

AI tools must meet predefined performance and safety tolerance limits on retrospective and prospective case reviews before accepting for clinical use. Vendor-specified performance on the reference dataset and generalization performance on the local test sets should be documented as baseline results. Testing should also include assessing the tools’ performance on sub-groups, infrequent cases, and inputs with known artifacts that can reveal unintended biases or unfairness of the AI tool.