Skip to main content
. 2011 Apr 21;12:112. doi: 10.1186/1471-2105-12-112

Table 5.

Comparative evaluation of cdf-based predictions with other commonly used IR methods

Category Best AUC Method
cdf ctf ctf-icdf Stemming Synonyms

AMH major classes > 0.80 20 (100) 20 (100) 20 (100) 19 (95) 20 (100)
> 0.90 19 (95) 17 (85) 18 (90) 17 (85) 18 (90)
> 0.95 12 (60) 12 (60) 15 (75) 12 (60) 11 (55)
AMH minor classes > 0.80 135 (69) 106 (54)* 157 (80) 152 (77) 156 (79)
> 0.90 123 (62) 100 (51) 150 (76)* 145 (74) 151 (77)*
> 0.95 114 (58) 92 (47) 144 (73)* 142 (72)* 143 (73)*
AMH adverse events > 0.80 159 (67) 148 (62) 173 (73) 153 (64) 155 (65)
> 0.90 86 (36) 88 (37) 100 (42) 84 (35) 84 (35)
> 0.95 41 (17) 42 (18) 55 (23) 44 (18) 44 (18)
PKIS perpetrator > 0.80 7 (47) 6 (40) 7 (47) 12 (80) 10 (67)
> 0.90 5 (33) 5 (33) 4 (27) 3 (20) 4 (27)
> 0.95 2 (13) 2 (13) 2 (13) 3 (20) 3 (20)
Narrow therapeutic index drugs > 0.80 9 (64) 9 (64) 11 (79) 10 (71) 10 (71)
> 0.90 8 (57) 7 (50) 10 (71) 9 (64) 10 (71)
> 0.95 5 (36) 6 (43) 9 (64) 9 (64) 10 (71)

Overall > 0.80 330 (68) 289 (60)* 368 (76)* 346 (71) 351 (73)
> 0.90 241 (50) 217 (45) 282 (58)* 258 (53) 267 (55)
> 0.95 174 (36) 154 (32) 225 (46)* 210 (43) 211 (44)

The numbers in this table indicate the number of characteristics (percentage) achieved an AUC above the given thresholds in stratified cross-validation evaluations. For each method, the results from the best of 4 algorithms were compared. The thresholds of AUC can be interpreted as good (> 0.8), very good (> 0.9), and excellent (> 0.95) respectively. The entries labelled (*) indicate a significantly better or worse performance than cdf for predicting drug characteristics. Fisher's exact tests were applied as 2 × 2 tables with α = 0.05 adjusted for a family of four comparisons by using the Bonferroni method. The numbers in boldface indicate the best performing method(s) for each characteristic category above the AUC = 0.8 threshold. Abbreviations of the method names: cdf: conditional document frequency; ctf: conditional term frequency; ctf-icdf: conditional term frequency-inverse conditional document frequency; Stemming: cdf of tokens reduced by Porter's stemming algorithm; Synonyms: cdf of tokens calculated by retrieving abstracts with both generic and trade names for a given drug.