. 2024 Jan 16;37(1):268–279. doi: 10.1007/s10278-023-00909-7

Table 3.

Performance of the lung graph–based machine learning model and radiologists in the identification of f-ILD on the independent validation set

Method	Evaluation level	AUC	Accuracy	Sensitivity	Specificity	PPV	NPV
Split 1	Scan-level	0.998 (0.992, 1.000)	0.957 (0.915, 0.989)	0.929 (0.837, 1.000)	0.981 (0.939, 1.000)	0.975 (0.917, 1.000)	0.944 (0.878, 1.000)
Split 2		0.997 (0.989, 1.000)	0.957 (0.915, 0.989)	0.905 (0.810, 0.979)	1.000 (1.000, 1.000)	1.000 (1.000, 1.000)	0.929 (0.855, 0.984)
Split 3		0.998 (0.992, 1.000)	0.968 (0.926, 1.000)	0.952 (0.880, 1.000)	0.981 (0.932, 1.000)	0.976 (0.914, 1.000)	0.962 (0.902, 1.000)
Split 4		0.997 (0.991, 1.000)	0.957 (0.915, 0.989)	0.905 (0.814, 0.978)	1.000 (1.000, 1.000)	1.000 (1.000, 1.000)	0.929 (0.862, 0.983)
Split 5		0.995 (0.984, 1.000)	0.968 (0.926, 1.000)	0.929 (0.844, 1.000)	1.000 (1.000, 1.000)	1.000 (1.000, 1.000)	0.945 (0.879, 1.000)
Average		0.999 (0.994, 1.000)	0.968 (0.926, 1.000)	0.929 (0.844, 1.000)	1.000 (1.000, 1.000)	1.000 (1.000, 1.000)	0.945 (0.873, 1.000)
Radiologist A		0.933 (0.879, 0.979)	0.936 (0.883, 0.979)	0.905 (0.810, 0.970)	0.962 (0.902, 1.000)	0.950 (0.868, 1.000)	0.926 (0.849, 0.983)
Radiologist B		0.842 (0.769, 0.909)	0.830 (0.755, 0.904)	0.952 (0.882, 1.000)	0.731 (0.607, 0.854)	0.741 (0.621, 0.857)	0.950 (0.871, 1.000)
Radiologist C		0.904 (0.846, 0.953)	0.894 (0.830, 0.947)	1.000 (1.000, 1.000)	0.808 (0.692, 0.906)	0.808 (0.690, 0.906)	1.000 (1.000, 1.000)
Split 1	Patient-level	1.000 (1.000, 1.000)	0.986 (0.959, 1.000)	0.971 (0.905, 1.000)	1.000 (1.000, 1.000)	1.000 (1.000, 1.000)	0.974 (0.915, 1.000)
Split 2		0.997 (0.988, 1.000)	0.973 (0.932, 1.000)	0.943 (0.861, 1.000)	1.000 (1.000, 1.000)	1.000 (1.000, 1.000)	0.950 (0.881, 1.000)
Split 3		0.999 (0.995, 1.000)	0.986 (0.959, 1.000)	0.971 (0.903, 1.000)	1.000 (1.000, 1.000)	1.000 (1.000, 1.000)	0.974 (0.913, 1.000)
Split 4		0.998 (0.994, 1.000)	0.959 (0.918, 1.000)	0.914 (0.821, 1.000)	1.000 (1.000, 1.000)	1.000 (1.000, 1.000)	0.927 (0.838, 1.000)
Split 5		0.998 (0.992, 1.000)	0.973 (0.932, 1.000)	0.943 (0.857, 1.000)	1.000 (1.000, 1.000)	1.000 (1.000, 1.000)	0.950 (0.870, 1.000)
Average		1.000 (1.000, 1.000)	0.986 (0.959, 1.000)	0.971 (0.912, 1.000)	1.000 (1.000, 1.000)	1.000 (1.000, 1.000)	0.974 (0.919, 1.000)
Radiologist A		0.917 (0.855, 0.973)	0.918 (0.849, 0.973)	0.886 (0.774, 0.974)	0.947 (0.872, 1.000)	0.939 (0.853, 1.000)	0.900 (0.795, 0.977)
Radiologist B		0.828 (0.742, 0.903)	0.822 (0.726, 0.904)	0.971 (0.912, 1.000)	0.684 (0.525, 0.825)	0.739 (0.608, 0.860)	0.963 (0.880, 1.000)
Radiologist C		0.908 (0.844, 0.969)	0.904 (0.836, 0.973)	1.000 (1.000, 1.000)	0.816 (0.688, 0.938)	0.833 (0.705, 0.944)	1.000 (1.000, 1.000)

Statistics in the square brackets showed 95% confidence intervals (CIs). Evaluation results (except AUC) of the proposed method were calculated by using the standard classification decision threshold of 0.5

Average average of five groups of models, PPV positive predict value, NPV negative predict value