Table 5.
Table 5 Mean AUC Performance Comparison of the Three Systems Studied Averaged Across All Topics. Statistical Significance Shown for Pairs of Systems at the Different Amounts of Training Data Represented by Differing Levels of Reverse Cross-Validation. p Values Computed Using the Nonparametric Paired Wilcoxon Test, Comparing Pairs of System Performance on Each of the 24 SR Topics
Average Mean AUC Across All Topics |
Topic Pairwise Wilcoxon Test |
Average Mean AUC Across All Topics |
Topic Pairwise Wilcoxon Test |
|||
---|---|---|---|---|---|---|
XVAL | Hybrid | Baseline | p-value | Hybrid | Non-Topic | p-value |
2 | 0.900 | 0.896 | 0.020 | 0.900 | 0.795 | 0.000 |
4 | 0.879 | 0.870 | 0.002 | 0.879 | 0.795 | 0.000 |
8 | 0.860 | 0.841 | 0.000 | 0.860 | 0.795 | 0.000 |
16 | 0.841 | 0.807 | 0.000 | 0.841 | 0.795 | 0.000 |
32 | 0.826 | 0.773 | 0.000 | 0.826 | 0.795 | 0.005 |
64 | 0.811 | 0.727 | 0.000 | 0.811 | 0.795 | 0.160 |
128 | 0.796 | 0.675 | 0.000 | 0.796 | 0.795 | 0.574 |
AUC = area under the curve; SR = systematic review.
A p-value of 0.0000 Indicates p < 0.00005.