Table 2.
Binary classification accuracy, absolute error and Krippendorff’s alpha for the ruler score from six existing methods, our best performing model (NN-cMEON), and human professionals. Comparison of results from an optimal single-value threshold and thresholds defined by two or nine content-dependent image rulers. The best two results are in bold. Marginal errors denote the 95% confidence interval.
| Metric | Threshold type | QAC | Chen (2015) | Liu (2012) | ILNIQE | IEDD | MEON | NN-cMEON | Radiologists |
|---|---|---|---|---|---|---|---|---|---|
| Accuracy | Single best | 61.70% | 62.77% | 68.72% | 70.85% | 69.79% | 76.38% | 85.11% | |
| 2 ruler defined | 66.38% | 71.06% | 74.04% | 75.74% | 78.94% | 81.06% | 88.72% | 87.23% | |
| 9 ruler defined | 68.09% | 72.13% | 75.11% | 78.94% | 80.85% | 84.47% | 91.06% | ||
| Score error | 2 ruler defined | 2.264 | 1.813 | 1.611 | 1.681 | 1.638 | 1.309 | 1.126 | - |
| 9 ruler defined | 2.213 | 1.770 | 1.553 | 1.651 | 1.432 | 1.209 | 1.066 | 0.987 | |
| Krippendorff’s alpha | 9 ruler defined | 0.263±0.075 | 0.528±0.061 | 0.573±0.056 | 0.543±0.059 | 0.615±0.054 | 0.705±0.047 | 0.762±0.039 | 0.783±0.037 |