. 2009 Sep-Oct;16(5):690–704. doi: 10.1197/jamia.M3162

Table 5.

Table 5 Mean AUC Performance Comparison of the Three Systems Studied Averaged Across All Topics. Statistical Significance Shown for Pairs of Systems at the Different Amounts of Training Data Represented by Differing Levels of Reverse Cross-Validation. p Values Computed Using the Nonparametric Paired Wilcoxon Test, Comparing Pairs of System Performance on Each of the 24 SR Topics

	Average Mean AUC Across All Topics		Topic Pairwise Wilcoxon Test	Average Mean AUC Across All Topics		Topic Pairwise Wilcoxon Test
XVAL	Hybrid	Baseline	p-value	Hybrid	Non-Topic	p-value
2	0.900	0.896	0.020	0.900	0.795	0.000
4	0.879	0.870	0.002	0.879	0.795	0.000
8	0.860	0.841	0.000	0.860	0.795	0.000
16	0.841	0.807	0.000	0.841	0.795	0.000
32	0.826	0.773	0.000	0.826	0.795	0.005
64	0.811	0.727	0.000	0.811	0.795	0.160
128	0.796	0.675	0.000	0.796	0.795	0.574

AUC = area under the curve; SR = systematic review.

A p-value of 0.0000 Indicates p < 0.00005.