Table 3:
Subsets of Features | AUC | Precision/PPV | Specificity | Sensitivity/Recall |
---|---|---|---|---|
From clinical notes (CNs) | 0.920 | 0.763 | 0.803 | 0.702 |
From known adverse-event (KA) | 0.723 | 0.526 | 0.624 | 0.550 |
From known usage (KU) | 0.815 | 0.561 | 0.661 | 0.584 |
CN + KA | 0.932 | 0.775 | 0.714 | 0.801 |
CN + KU | 0.937 | 0.781 | 0.719 | 0.820 |
All | 0.944 | 0.796 | 0.913 | 0.839 |
We performed feature ablation to investigate the contribution of different feature sets on the performance of the random forest classifier for detecting drug–AE relationships. The first column is the feature set used to train the classifier. The classifier performance was evaluated on the 1.9k withheld test examples. Individually, features from clinical notes (CNs) yielded higher performance than features from known ADEs (KA) and known usages (KU) in all metrics. Adding features from KA or KU to features from CNs significantly improved the classifier performance in terms of sensitivity, while all features together resulted in a sensitivity of 0.839 and an AUC of 0.944.