. 2022 Jul 8;82:104127. doi: 10.1016/j.ebiom.2022.104127

Table 3.

Model Performance. Results for performance metrics on the test set are displayed for the machine learning models using manually and automatically segmented masks. Accuracy, sensitivity, and specificity for the CNN models are shown with 95% confidence intervals in parentheses. Confidence intervals were calculated using the adjusted Wald method. Comparisons between performance metrics between models was performed with a McNemar test for paired proportions. Statistically significant p-values are highlighted in bold (p<0.05); CNN: convolutional neural network; RSF: random survival forest; AUC: area under the receiver operating characteristic curve.

Modality	AUC	Accuracy (Acc)	Sensitivity (Sens)	Specificity (Spec)
CNN Models – Progression (Manually segmented masks)
CT	0.888	0.723 (0.643, 0.803)	0.681 (0.587, 0.775)	0.857 (0.722, 0.992)
PET	0.669	0.664 (0.580, 0.748)	0.659 (0.563, 0.755)	0.679 (0.514, 0.843)
PET+CT Ensemble	0.876	0.790 (0.717, 0.863)	0.769 (0.683, 0.855)	0.857 (0.722, 0.992)
CNN Models – Progression (Automatically segmented masks)
CT	0.876	0.798 (0.726, 0.870)	0.791 (0.708, 0.875)	0.821 (0.678, 0.965)
PET	0.706	0.571 (0.484, 0.659)	0.495 (0.394, 0.595)	0.821 (0.678, 0.965)
PET+CT Ensemble	0.874	0.815 (0.745, 0.885)	0.813 (0.733, 0.894)	0.821 (0.678, 0.965)

P-values [McNemar], comparisons between CNNs (Manual Masks)				P-values [McNemar], comparisons between CNNs (Automated Masks)				P-values [McNemar], comparisons between manual (M) and automated (A) CNNs
Comp.	Acc.	Sens.	Spec.	Comp.	Acc.	Sens.	Spec.	Comp.	Acc.	Sens.	Spec.
CT vs. PET	0.371	0.864	0.227	CT vs. PET	<0.001	<0.001	1.00	CT (M vs. A)	0.049	0.006	1.00
CT vs. PET+CT	0.115	0.077	1.00	CT vs. PET+CT	0.754	0.727	1.00	PET (M vs. A)	0.090	0.004	0.344
PET vs. PET+CT	0.004	0.031	0.125	PET vs. PET+CT	<0.001	<0.001	1.00	PET + CT (M vs. A)	0.629	0.388	1.00