Table 5.
Category | Best AUC | Method | ||||
---|---|---|---|---|---|---|
cdf | ctf | ctf-icdf | Stemming | Synonyms | ||
AMH major classes | > 0.80 | 20 (100) | 20 (100) | 20 (100) | 19 (95) | 20 (100) |
> 0.90 | 19 (95) | 17 (85) | 18 (90) | 17 (85) | 18 (90) | |
> 0.95 | 12 (60) | 12 (60) | 15 (75) | 12 (60) | 11 (55) | |
AMH minor classes | > 0.80 | 135 (69) | 106 (54)* | 157 (80) | 152 (77) | 156 (79) |
> 0.90 | 123 (62) | 100 (51) | 150 (76)* | 145 (74) | 151 (77)* | |
> 0.95 | 114 (58) | 92 (47) | 144 (73)* | 142 (72)* | 143 (73)* | |
AMH adverse events | > 0.80 | 159 (67) | 148 (62) | 173 (73) | 153 (64) | 155 (65) |
> 0.90 | 86 (36) | 88 (37) | 100 (42) | 84 (35) | 84 (35) | |
> 0.95 | 41 (17) | 42 (18) | 55 (23) | 44 (18) | 44 (18) | |
PKIS perpetrator | > 0.80 | 7 (47) | 6 (40) | 7 (47) | 12 (80) | 10 (67) |
> 0.90 | 5 (33) | 5 (33) | 4 (27) | 3 (20) | 4 (27) | |
> 0.95 | 2 (13) | 2 (13) | 2 (13) | 3 (20) | 3 (20) | |
Narrow therapeutic index drugs | > 0.80 | 9 (64) | 9 (64) | 11 (79) | 10 (71) | 10 (71) |
> 0.90 | 8 (57) | 7 (50) | 10 (71) | 9 (64) | 10 (71) | |
> 0.95 | 5 (36) | 6 (43) | 9 (64) | 9 (64) | 10 (71) | |
Overall | > 0.80 | 330 (68) | 289 (60)* | 368 (76)* | 346 (71) | 351 (73) |
> 0.90 | 241 (50) | 217 (45) | 282 (58)* | 258 (53) | 267 (55) | |
> 0.95 | 174 (36) | 154 (32) | 225 (46)* | 210 (43) | 211 (44) |
The numbers in this table indicate the number of characteristics (percentage) achieved an AUC above the given thresholds in stratified cross-validation evaluations. For each method, the results from the best of 4 algorithms were compared. The thresholds of AUC can be interpreted as good (> 0.8), very good (> 0.9), and excellent (> 0.95) respectively. The entries labelled (*) indicate a significantly better or worse performance than cdf for predicting drug characteristics. Fisher's exact tests were applied as 2 × 2 tables with α = 0.05 adjusted for a family of four comparisons by using the Bonferroni method. The numbers in boldface indicate the best performing method(s) for each characteristic category above the AUC = 0.8 threshold. Abbreviations of the method names: cdf: conditional document frequency; ctf: conditional term frequency; ctf-icdf: conditional term frequency-inverse conditional document frequency; Stemming: cdf of tokens reduced by Porter's stemming algorithm; Synonyms: cdf of tokens calculated by retrieving abstracts with both generic and trade names for a given drug.