. Author manuscript; available in PMC: 2020 Jul 1.

Published in final edited form as: J Biomed Inform. 2019 May 28;95:103219. doi: 10.1016/j.jbi.2019.103219

Table 2.

Distribution of attribute values in the complete annotated dataset.

Attribute	Value	Training set		Test set		All corpus

Semantic type	Problem	1504	(45.1%)	463	(45.0%)	1967	(45.1%)
	Test	880	(26.4%)	295	(28.6%)	1175	(26.9%)
	Occurrence	542	(16.3%)	163	(15.8%)	705	(16.2%)
	Treatment	409	(12.3%)	109	(10.6%)	518	(11.9%)

DocTimeRel	Overlap	1565	(46.9%)	400	(38.8%)	1965	(45.0%)
	Before	1222	(36.6%)	437	(42.4%)	1659	(38.0%)
	After	517	(15.5%)	185	(18.0%)	702	(16.1%)
	Before/overlap	31	(0.9%)	8	(0.8%)	39	(0.9%)

Polarity	Positive	2882	(86.4%)	915	(88.8%)	3797	(87.0%)
Polarity	Negative	453	(13.6%)	115	(11.2%)	568	(13.0%)

Modality	Actual	3020	(90.6%)	918	(89.1%)	3938	(90.2%)
	Hypothetical	116	(3.5%)	54	(5.2%)	170	(3.9%)
	Hedged	107	(3.2%)	35	(3.4%)	142	(3.3%)
	Generic	92	(2.8%)	23	(2.2%)	115	(2.6%)

Experiencer	Patient	3070	(92.1%)	961	(93.3%)	4031	(92.3%)
Experiencer	Other	265	(7.9%)	69	(6.7%)	334	(7.7%)