Table 3.
Performance of the lung graph–based machine learning model and radiologists in the identification of f-ILD on the independent validation set
| Method | Evaluation level | AUC | Accuracy | Sensitivity | Specificity | PPV | NPV |
|---|---|---|---|---|---|---|---|
| Split 1 | Scan-level | 0.998 (0.992, 1.000) | 0.957 (0.915, 0.989) | 0.929 (0.837, 1.000) | 0.981 (0.939, 1.000) | 0.975 (0.917, 1.000) | 0.944 (0.878, 1.000) |
| Split 2 | 0.997 (0.989, 1.000) | 0.957 (0.915, 0.989) | 0.905 (0.810, 0.979) | 1.000 (1.000, 1.000) | 1.000 (1.000, 1.000) | 0.929 (0.855, 0.984) | |
| Split 3 | 0.998 (0.992, 1.000) | 0.968 (0.926, 1.000) | 0.952 (0.880, 1.000) | 0.981 (0.932, 1.000) | 0.976 (0.914, 1.000) | 0.962 (0.902, 1.000) | |
| Split 4 | 0.997 (0.991, 1.000) | 0.957 (0.915, 0.989) | 0.905 (0.814, 0.978) | 1.000 (1.000, 1.000) | 1.000 (1.000, 1.000) | 0.929 (0.862, 0.983) | |
| Split 5 | 0.995 (0.984, 1.000) | 0.968 (0.926, 1.000) | 0.929 (0.844, 1.000) | 1.000 (1.000, 1.000) | 1.000 (1.000, 1.000) | 0.945 (0.879, 1.000) | |
| Average | 0.999 (0.994, 1.000) | 0.968 (0.926, 1.000) | 0.929 (0.844, 1.000) | 1.000 (1.000, 1.000) | 1.000 (1.000, 1.000) | 0.945 (0.873, 1.000) | |
| Radiologist A | 0.933 (0.879, 0.979) | 0.936 (0.883, 0.979) | 0.905 (0.810, 0.970) | 0.962 (0.902, 1.000) | 0.950 (0.868, 1.000) | 0.926 (0.849, 0.983) | |
| Radiologist B | 0.842 (0.769, 0.909) | 0.830 (0.755, 0.904) | 0.952 (0.882, 1.000) | 0.731 (0.607, 0.854) | 0.741 (0.621, 0.857) | 0.950 (0.871, 1.000) | |
| Radiologist C | 0.904 (0.846, 0.953) | 0.894 (0.830, 0.947) | 1.000 (1.000, 1.000) | 0.808 (0.692, 0.906) | 0.808 (0.690, 0.906) | 1.000 (1.000, 1.000) | |
| Split 1 | Patient-level | 1.000 (1.000, 1.000) | 0.986 (0.959, 1.000) | 0.971 (0.905, 1.000) | 1.000 (1.000, 1.000) | 1.000 (1.000, 1.000) | 0.974 (0.915, 1.000) |
| Split 2 | 0.997 (0.988, 1.000) | 0.973 (0.932, 1.000) | 0.943 (0.861, 1.000) | 1.000 (1.000, 1.000) | 1.000 (1.000, 1.000) | 0.950 (0.881, 1.000) | |
| Split 3 | 0.999 (0.995, 1.000) | 0.986 (0.959, 1.000) | 0.971 (0.903, 1.000) | 1.000 (1.000, 1.000) | 1.000 (1.000, 1.000) | 0.974 (0.913, 1.000) | |
| Split 4 | 0.998 (0.994, 1.000) | 0.959 (0.918, 1.000) | 0.914 (0.821, 1.000) | 1.000 (1.000, 1.000) | 1.000 (1.000, 1.000) | 0.927 (0.838, 1.000) | |
| Split 5 | 0.998 (0.992, 1.000) | 0.973 (0.932, 1.000) | 0.943 (0.857, 1.000) | 1.000 (1.000, 1.000) | 1.000 (1.000, 1.000) | 0.950 (0.870, 1.000) | |
| Average | 1.000 (1.000, 1.000) | 0.986 (0.959, 1.000) | 0.971 (0.912, 1.000) | 1.000 (1.000, 1.000) | 1.000 (1.000, 1.000) | 0.974 (0.919, 1.000) | |
| Radiologist A | 0.917 (0.855, 0.973) | 0.918 (0.849, 0.973) | 0.886 (0.774, 0.974) | 0.947 (0.872, 1.000) | 0.939 (0.853, 1.000) | 0.900 (0.795, 0.977) | |
| Radiologist B | 0.828 (0.742, 0.903) | 0.822 (0.726, 0.904) | 0.971 (0.912, 1.000) | 0.684 (0.525, 0.825) | 0.739 (0.608, 0.860) | 0.963 (0.880, 1.000) | |
| Radiologist C | 0.908 (0.844, 0.969) | 0.904 (0.836, 0.973) | 1.000 (1.000, 1.000) | 0.816 (0.688, 0.938) | 0.833 (0.705, 0.944) | 1.000 (1.000, 1.000) |
Statistics in the square brackets showed 95% confidence intervals (CIs). Evaluation results (except AUC) of the proposed method were calculated by using the standard classification decision threshold of 0.5
Average average of five groups of models, PPV positive predict value, NPV negative predict value