Table 3.
Performance on relation types and overall. Bold indicates micro-averaged F-measures of SR classifier that are significantly different from the corresponding micro-averaged F-measure of the tokens-in-concepts baseline. Italic indicates micro-averaged F-measures of the SR classifier that are significantly different from the corresponding micro-averaged F-measure of the tokens-in-sentence baseline.
BIDMC corpus | ||||||
---|---|---|---|---|---|---|
Relation type | Tokens-in-concepts baseline |
Tokens-in-sentence baseline |
SR classifier | |||
Micro-avgd F-measure |
Macro- avgd F- measure |
Micro-avgd F-measure |
Macro- avgd F- measure |
Micro-avgd F-measure |
Macro- avgd F- measure |
|
Present disease–treatment | 0.64 | 0.56 | 0.66 | 0.53 | 0.81 | 0.69 |
Possible disease–treatment | 0.67 | 0.35 | 0.75 | 0.41 | 0.74 | 0.40 |
Present symptom–treatment | 0.61 | 0.45 | 0.69 | 0.53 | 0.80 | 0.62 |
Possible symptom–treatment | 0.76 | 0.30 | 0.89 | 0.41 | 0.95 | 0.47 |
Disease–test | 0.81 | 0.67 | 0.73 | 0.68 | 0.88 | 0.80 |
Disease–symptom | 0.93 | 0.71 | 0.94 | 0.77 | 0.96 | 0.84 |
Overall performance | 0.75 | 0.49 | 0.75 | 0.54 | 0.86 | 0.63 |
Partners corpus | ||||||
Relation type |
Tokens-in-concepts baseline |
Tokens-in-sentence baseline |
SR classifier | |||
Micro-avgd F-measure |
Macro- avgd F- measure |
Micro-avgd F-measure |
Macro- avgd F- measure |
Micro- avgd F- measure |
Macro- avgd F- measure |
|
Present disease–treatment | 0.69 | 0.42 | 0.75 | 0.55 | 0.78 | 0.58 |
Possible disease–treatment | 0.63 | 0.60 | 0.66 | 0.52 | 0.76 | 0.65 |
Present symptom–treatment | 0.58 | 0.46 | 0.62 | 0.48 | 0.68 | 0.57 |
Possible symptom–treatment | 0.78 | 0.41 | 0.80 | 0.47 | 0.91 | 0.65 |
Disease–test | 0.70 | 0.60 | 0.72 | 0.65 | 0.78 | 0.72 |
Disease–symptom | 0.61 | 0.56 | 0.69 | 0.65 | 0.69 | 0.66 |
Overall performance | 0.67 | 0.49 | 0.71 | 0.54 | 0.76 | 0.62 |