Table 2.
Performance metrics assessed on the test datasets for DL-based classification models with different backbone networks.
| Development dataset | Test dataset | Backbone network | Accuracy | Precision | F1 score | AUC |
|---|---|---|---|---|---|---|
| All | All | VGG19 | 0.945 ± 0.004 | 0.946 ± 0.003 | 0.945 ± 0.003 | 0.983 ± 0.002 |
| VGG16 | 0.946 ± 0.007 | 0.947 ± 0.006 | 0.946 ± 0.006 | 0.984 ± 0.006 | ||
| Dense121 | 0.949 ± 0.004 | 0.951 ± 0.003 | 0.949 ± 0.004 | 0.982 ± 0.003 | ||
| All | Non-tilted disc | VGG19 | 0.951 ± 0.003 | 0.952 ± 0.002 | 0.951 ± 0.003 | 0.989 ± 0.002 |
| VGG16 | 0.957 ± 0.008 | 0.958 ± 0.008 | 0.957 ± 0.008 | 0.991 ± 0.005 | ||
| Dense121 | 0.956 ± 0.006 | 0.959 ± 0.006 | 0.956 ± 0.006 | 0.985 ± 0.003 | ||
| All | Tilted disc | VGG19 | 0.936 ± 0.006 | 0.940 ± 0.006 | 0.935 ± 0.006 | 0.969 ± 0.005 |
| VGG16 | 0.930 ± 0.010 | 0.933 ± 0.010 | 0.930 ± 0.010 | 0.969 ± 0.009 | ||
| Dense121 | 0.938 ± 0.004 | 0.942 ± 0.004 | 0.937 ± 0.005 | 0.977 ± 0.011 | ||
| Non-tilted disc | Non-tilted disc | VGG19 | 0.945 ± 0.006 | 0.946 ± 0.006 | 0.945 ± 0.007 | 0.988 ± 0.002 |
| VGG16 | 0.944 ± 0.009 | 0.945 ± 0.009 | 0.943 ± 0.009 | 0.991 ± 0.003 | ||
| Dense121 | 0.944 ± 0.007 | 0.945 ± 0.007 | 0.944 ± 0.007 | 0.986 ± 0.003 | ||
| Non-tilted disc | Tilted disc | VGG19 | 0.891 ± 0.028 | 0.903 ± 0.009 | 0.894 ± 0.020 | 0.927 ± 0.019 |
| VGG16 | 0.886 ± 0.019 | 0.902 ± 0.009 | 0.890 ± 0.011 | 0.922 ± 0.020 | ||
| Dense121 | 0.915 ± 0.007 | 0.926 ± 0.006 | 0.918 ± 0.006 | 0.944 ± 0.008 | ||
| Tilted disc | Non-tilted disc | VGG19 | 0.878 ± 0.022 | 0.874 ± 0.040 | 0.869 ± 0.031 | 0.950 ± 0.013 |
| VGG16 | 0.886 ± 0.010 | 0.888 ± 0.012 | 0.878 ± 0.010 | 0.951 ± 0.015 | ||
| Dense121 | 0.891 ± 0.011 | 0.904 ± 0.009 | 0.884 ± 0.013 | 0.957 ± 0.008 | ||
| Tilted disc | Tilted disc | VGG19 | 0.923 ± 0.009 | 0.922 ± 0.012 | 0.921 ± 0.009 | 0.924 ± 0.046 |
| VGG16 | 0.914 ± 0.009 | 0.913 ± 0.014 | 0.912 ± 0.011 | 0.928 ± 0.017 | ||
| Dense121 | 0.918 ± 0.007 | 0.917 ± 0.007 | 0.915 ± 0.008 | 0.935 ± 0.008 |
AUC = area under the curve.
The table displays the means ± standard errors of accuracy, precision, F1 score and the AUC values for classification models developed with VGG19, VGG16, and Dense121 using different pairs of development and test datasets.
The precision, F1 score, and AUC were computed using a weighted average to address class imbalance issues.