Table 2.
Types of AI Model Testing
Testing Type | Description | Considerations |
---|---|---|
Internal (retrospective) | Testing on the same dataset the model was trained on, commonly an 80/20 test train split. | A good first step but insufficient to prove robustness or generalizability because it can inflate accuracy and fail to represent true performance. |
External (retrospective) | Testing on an entirely different dataset. | Provides a more robust measure of performance but may be limited by the availability of appropriate datasets. |
Prospective | Real-time testing in a clinical setting. | Considered the gold standard for evaluating model performance but requires acquiring new data from patients, which can be time consuming and resource intensive. |
Subgroup analysis | Evaluating AI performance across distinct subgroups, such as disease severity levels and demographic groups. | Subgroup analysis is essential for assessing AI model performance across varying disease severities and demographic groups in both retrospective and prospective testing, helping to identify and mitigate potential biases or disparities. |