Table 5.
Event and attribute classification performance with gold standard medication mentions.
Model | Metric | MedSingleTask |
MedMultiTask |
MedSpan |
MedIdentifiers |
MedIDTyped |
|||||
---|---|---|---|---|---|---|---|---|---|---|---|
Macro | Micro | Macro | Micro | Macro | Micro | Macro | Micro | Macro | Micro | ||
event | P | 0.771 | 0.884 | 0.774 | 0.894 | 0.830 | 0.917 | 0.848 | 0.931 | 0.798 | 0.913 |
R | 0.697 | 0.884 | 0.739 | 0.894 | 0.818 | 0.917 | 0.844 | 0.931 | 0.851 | 0.913 | |
F1 | 0.729 | 0.884 | 0.755 | 0.894 | 0.824 | 0.917† | 0.846 | 0.931 ‡ | 0.815 | 0.913† | |
| |||||||||||
action | P | 0.704 | 0.728 | 0.789 | 0.741 | 0.856 | 0.808 | 0.825 | 0.797 | 0.884 | 0.821 |
R | 0.482 | 0.479 | 0.570 | 0.616 | 0.671 | 0.726 | 0.685 | 0.739 | 0.706 | 0.762 | |
F1 | 0.568 | 0.578 | 0.646 | 0.673δ | 0.739 | 0.765† | 0.742 | 0.767† | 0.775 | 0.791 ‡ | |
| |||||||||||
actor | P | 0.614 | 0.761 | 0.675 | 0.833 | 0.711 | 0.857 | 0.755 | 0.865 | 0.721 | 0.876 |
R | 0.513 | 0.707 | 0.528 | 0.684 | 0.552 | 0.782 | 0.623 | 0.811 | 0.564 | 0.805 | |
F1 | 0.554 | 0.733 | 0.592 | 0.751 | 0.611 | 0.818† | 0.677 | 0.837 † | 0.622 | 0.839† | |
| |||||||||||
temporality | P | 0.707 | 0.729 | 0.768 | 0.802 | 0.727 | 0.785 | 0.724 | 0.804 | 0.743 | 0.812 |
R | 0.570 | 0.622 | 0.592 | 0.645 | 0.631 | 0.691 | 0.655 | 0.749 | 0.651 | 0.746 | |
F1 | 0.629 | 0.671 | 0.665 | 0.715δ | 0.675 | 0.735δ | 0.687 | 0.776† | 0.691 | 0.778 † | |
| |||||||||||
certainty | P | 0.580 | 0.730 | 0.670 | 0.806 | 0.748 | 0.846 | 0.760 | 0.856 | 0.737 | 0.851 |
R | 0.656 | 0.713 | 0.555 | 0.635 | 0.684 | 0.749 | 0.782 | 0.795 | 0.718 | 0.779 | |
F1 | 0.611 | 0.722 | 0.598 | 0.710 | 0.701 | 0.795† | 0.766 | 0.824 † | 0.711 | 0.813† | |
| |||||||||||
Overall* | P | 0.675 | 0.766 | 0.735 | 0.814 | 0.774 | 0.843 | 0.782 | 0.851 | 0.777 | 0.855 |
R | 0.584 | 0.681 | 0.597 | 0.694 | 0.671 | 0.773 | 0.718 | 0.805 | 0.698 | 0.801 | |
F1 | 0.618 | 0.718 | 0.651 | 0.748δ | 0.710 | 0.806† | 0.744 | 0.827 † | 0.723 | 0.827 † |
Overall score is an unweighted average of the event and attribute scores.
indicates performance significance compared across all models.
indicates performance significance compared against MedSingleTask and MedMultiTask.
indicates performance significance over MedSingleTask.