. 2023 Jul 14;17:62. doi: 10.1186/s40246-023-00508-1

Table 2.

The statistics of confusion matrixes of the final deployed RF model for both internal and external cross-validation

	Internal cross-validation	External cross-validation
Accuracy	0.9808	0.9512
95% CI	(0.8974, 0.9995)	(0.8347, 0.994)
No information rate	0.6346	0.6341
P-value (ACC > INR)	1.664e-09	2.309e-06
Kappa	0.9581	0.8918
McNemar’s test P-value	1.0000	0.4795
Sensitivity	1.0000	0.8667
Specificity	0.9474	1.0000
Pos pred value	0.9706	1.0000
Neg pred value	1.0000	0.9286
Precision	0.9705882	1.0000000
Recall	1.0000000	0.8666667
F1	0.9850746	0.9285714
Prevalence	0.6346	0.3659
Detection rate	0.6346	0.3171
Detection prevalence	0.6538	0.3171
Balanced accuracy	0.9737	0.9333
Area under the curve (AUC)	0.9736842	0.5384848

Note that while the accuracies in both types of validation are quite high, the overfitting to the training data within internal cross-validation resulted in an unreal AUC. On the other hand, increasing the sample size with external cross-validation displayed more “close to real” performance of the RF model for small cohorts

‘Positive’ Class Patients with ADRs, AUC area under the curve, RF random forest