TABLE 4.
Model performance metrics | Training set (N = 137)& | Test set (N = 58) |
Pearson’s correlation coefficient | 0.67 (0.57–0.76) | 0.67 (0.49–0.79) |
ICC(2,1) [95% CI]* | 0.604 (0.49–0.70) | 0.66 (0.49–0.79) |
Mean absolute error (SD) | 2.87 (2.36) | 2.88 (2.21) |
Root mean square error (SD) | 3.71 (5.09) | 3.62 (4.30) |
Mean bias error (SD) | −0.05 (3.72) | 0.13 (3.65) |
Receiver Operating Characteristics | ||
Sensitivity (true positive rate) | 0.846 | 0.692 |
Specificity (true negative rate) | 0.810 | 0.697 |
AUC | 0.849 | 0.721 |
Accuracy# (%) | 83.21 | 70.69 |
&N refers to the number of children for whom DEEP predictions could be generated. Full dataset: N = 140 (Training set) and N = 60 (Test set). *Agreement levels for ICC(2,1): >0.6 = good. #DEEP cut-off score that optimized accuracy for correct classification (performance above or below the 25th percentile BSID-III cognitive score) was 67.19. The accuracy of the test set predictions was based on this cut-off value.