Table 2. Model Performance Characteristics in 14 230 Total Imaging Reports Among Curation Set Patientsa.
Outcome (% of All Reports With Outcome) | Metric | Training Subset (11 182 Reports) | Validation Subset (1545 Reports) | Test Subset (1503 Reports) | ||||
---|---|---|---|---|---|---|---|---|
CV 1 | CV 2 | CV 3 | CV 4 | CV 5 | ||||
Any cancer (61.7) | AUC | 0.92 | 0.94 | 0.91 | 0.93 | 0.90 | 0.91 | 0.92 |
Area under PR curve | 0.95 | 0.96 | 0.92 | 0.95 | 0.93 | 0.92 | 0.94 | |
Best F1 score | 0.90 | 0.90 | 0.86 | 0.90 | 0.87 | 0.88 | 0.88 | |
Worsening/progressing (24.7) | AUC | 0.92 | 0.94 | 0.90 | 0.92 | 0.92 | 0.92 | 0.94 |
Area under PR curve | 0.79 | 0.85 | 0.78 | 0.81 | 0.82 | 0.78 | 0.83 | |
Best F1 score | 0.74 | 0.79 | 0.72 | 0.75 | 0.76 | 0.73 | 0.78 | |
Improving/responding (11.5) | AUC | 0.94 | 0.93 | 0.94 | 0.93 | 0.93 | 0.93 | 0.95 |
Area under PR curve | 0.78 | 0.75 | 0.76 | 0.74 | 0.72 | 0.75 | 0.82 | |
Best F1 score | 0.74 | 0.73 | 0.73 | 0.73 | 0.70 | 0.72 | 0.76 | |
Metastasis in liver (8.1) | AUC | 0.97 | 0.95 | 0.97 | 0.97 | 0.97 | 0.96 | 0.98 |
Area under PR curve | 0.83 | 0.73 | 0.86 | 0.84 | 0.83 | 0.76 | 0.74 | |
Best F1 score | 0.78 | 0.73 | 0.81 | 0.78 | 0.77 | 0.71 | 0.71 | |
Metastases in bone (17.3) | AUC | 0.96 | 0.95 | 0.96 | 0.94 | 0.95 | 0.93 | 0.95 |
Area under PR curve | 0.85 | 0.87 | 0.85 | 0.81 | 0.82 | 0.74 | 0.75 | |
Best F1 score | 0.80 | 0.82 | 0.83 | 0.78 | 0.79 | 0.75 | 0.76 | |
Metastases in brain/spine (8.3) | AUC | 0.99 | 0.97 | 0.98 | 0.97 | 0.99 | 0.95 | 0.97 |
Area under PR curve | 0.85 | 0.77 | 0.90 | 0.77 | 0.89 | 0.79 | 0.83 | |
Best F1 score | 0.83 | 0.78 | 0.85 | 0.77 | 0.83 | 0.75 | 0.79 | |
Metastases in lymph nodes (13.4) | AUC | 0.86 | 0.84 | 0.87 | 0.82 | 0.87 | 0.87 | 0.89 |
Area under PR curve | 0.54 | 0.45 | 0.47 | 0.43 | 0.41 | 0.49 | 0.49 | |
Best F1 score | 0.55 | 0.51 | 0.53 | 0.49 | 0.48 | 0.53 | 0.55 | |
Metastases in adrenal (4.7) | AUC | 0.97 | 0.96 | 0.97 | 0.96 | 0.97 | 0.96 | 0.97 |
Area under PR curve | 0.76 | 0.80 | 0.75 | 0.68 | 0.82 | 0.69 | 0.73 | |
Best F1 score | 0.70 | 0.77 | 0.72 | 0.64 | 0.75 | 0.76 | 0.73 |
Abbreviations: AUC, area under the receiver operating characteristic curve; CV, cross-validation; F1 score, harmonic mean between precision and recall; PR, precision recall.
Cross-validation models, each of which was trained using a random sample of 80% of the training subset patients and evaluated using the remaining 20% of the training subset. Training subset patients were allowed into more than 1 cross-validation model.