Skip to main content
. Author manuscript; available in PMC: 2020 Jul 1.
Published in final edited form as: J Biomed Inform. 2019 May 28;95:103219. doi: 10.1016/j.jbi.2019.103219

Table 2.

Distribution of attribute values in the complete annotated dataset.

Attribute Value Training set Test set All corpus

Semantic type Problem 1504 (45.1%) 463 (45.0%) 1967 (45.1%)
Test 880 (26.4%) 295 (28.6%) 1175 (26.9%)
Occurrence 542 (16.3%) 163 (15.8%) 705 (16.2%)
Treatment 409 (12.3%) 109 (10.6%) 518 (11.9%)

DocTimeRel Overlap 1565 (46.9%) 400 (38.8%) 1965 (45.0%)
Before 1222 (36.6%) 437 (42.4%) 1659 (38.0%)
After 517 (15.5%) 185 (18.0%) 702 (16.1%)
Before/overlap 31 (0.9%) 8 (0.8%) 39 (0.9%)

Polarity Positive 2882 (86.4%) 915 (88.8%) 3797 (87.0%)
Negative 453 (13.6%) 115 (11.2%) 568 (13.0%)

Modality Actual 3020 (90.6%) 918 (89.1%) 3938 (90.2%)
Hypothetical 116 (3.5%) 54 (5.2%) 170 (3.9%)
Hedged 107 (3.2%) 35 (3.4%) 142 (3.3%)
Generic 92 (2.8%) 23 (2.2%) 115 (2.6%)

Experiencer Patient 3070 (92.1%) 961 (93.3%) 4031 (92.3%)
Other 265 (7.9%) 69 (6.7%) 334 (7.7%)