. 2015 Jul 31;22(6):1196–1204. doi: 10.1093/jamia/ocv102

Table 3:

Performance of Random Forest classifier on held-out test set with different feature sets.

Subsets of Features	AUC	Precision/PPV	Specificity	Sensitivity/Recall
From clinical notes (CNs)	0.920	0.763	0.803	0.702
From known adverse-event (KA)	0.723	0.526	0.624	0.550
From known usage (KU)	0.815	0.561	0.661	0.584
CN + KA	0.932	0.775	0.714	0.801
CN + KU	0.937	0.781	0.719	0.820
All	0.944	0.796	0.913	0.839

We performed feature ablation to investigate the contribution of different feature sets on the performance of the random forest classifier for detecting drug–AE relationships. The first column is the feature set used to train the classifier. The classifier performance was evaluated on the 1.9k withheld test examples. Individually, features from clinical notes (CNs) yielded higher performance than features from known ADEs (KA) and known usages (KU) in all metrics. Adding features from KA or KU to features from CNs significantly improved the classifier performance in terms of sensitivity, while all features together resulted in a sensitivity of 0.839 and an AUC of 0.944.