Skip to main content
. 2009 Sep-Oct;16(5):690–704. doi: 10.1197/jamia.M3162

Table 5.

Table 5 Mean AUC Performance Comparison of the Three Systems Studied Averaged Across All Topics. Statistical Significance Shown for Pairs of Systems at the Different Amounts of Training Data Represented by Differing Levels of Reverse Cross-Validation. p Values Computed Using the Nonparametric Paired Wilcoxon Test, Comparing Pairs of System Performance on Each of the 24 SR Topics

Average Mean AUC Across All Topics
Topic Pairwise Wilcoxon Test
Average Mean AUC Across All Topics
Topic Pairwise Wilcoxon Test
XVAL Hybrid Baseline p-value Hybrid Non-Topic p-value
2 0.900 0.896 0.020 0.900 0.795 0.000
4 0.879 0.870 0.002 0.879 0.795 0.000
8 0.860 0.841 0.000 0.860 0.795 0.000
16 0.841 0.807 0.000 0.841 0.795 0.000
32 0.826 0.773 0.000 0.826 0.795 0.005
64 0.811 0.727 0.000 0.811 0.795 0.160
128 0.796 0.675 0.000 0.796 0.795 0.574

AUC = area under the curve; SR = systematic review.

A p-value of 0.0000 Indicates p < 0.00005.