Table 2.
Paired classification performances over the three data sets when only ADR instances from a different corpus are added. ADR F-scores, non-ADR F-scores, Accuracies and 95% Confidence Intervals (CI) for each of the train-test set combinations are shown.
Test Data | Training Data | ADR F-score | non-ADR F-score | Accuracy (%) | 95% CI |
---|---|---|---|---|---|
| |||||
ADE | ADE | 0.812 | 0.914 | 88.2 | 87.3 – 89.1 |
ADE+DSADR | 0.802 | 0.912 | 87.8 | 86.8 – 88.7 | |
ADE+TWADR | 0.802 | 0.909 | 87.5 | 86.5 – 88.4 | |
| |||||
TW | TW | 0.538 | 0.919 | 86.2 | 84.7 – 87.6 |
TW+ADEADR | 0.549 | 0.946* | 90.3* | 89.0 – 91.5 | |
TW+DSADR | 0.565* | 0.939* | 89.3* | 87.9 – 90.5 | |
| |||||
DS | DS | 0.678 | 0.890 | 83.8 | 82.2 – 85.0 |
DS+ADEADR | 0.682 | 0.886 | 83.2 | 82.7 – 85.8 | |
DS+TWADR | 0.695* | 0.897 | 84.6 | 82.8 – 86.0 |
indicates statistically significant improvement in performance over the highest score achieved in the single corpus binary classification task.