. 2009 Feb 10;9:10. doi: 10.1186/1472-6947-9-10

Table 8.

Five Way Classification Including 'Intervention' on Manually Annotated Abstracts

	All Abstracts			Structured Subset			Unstructured Subset

	P	R	F	P	R	F	P	R	F
System 1	Accuracy = 90.14%			Accuracy = 91.39%			Accuracy = 87.35%

Aim	0.92	0.97	0.94	0.94	0.98	0.96	0.88	0.95	0.91
Method	0.85	0.81	0.83	0.86	0.83	0.84	0.80	0.76	0.78
Intervention	0.87	0.78	0.82	0.88	0.80	0.84	0.85	0.74	0.79
Results	0.91	0.97	0.92	0.91	0.95	0.93	0.89	0.92	0.90
Conclusion	0.96	0.94	0.95	0.98	0.94	0.96	0.92	0.94	0.93

System 2	Accuracy = 95.24%			Accuracy = 96.45%			Accuracy = 92.51%

Aim	0.94	0.99	0.99	0.96	1.00	0.98	0.90	0.96	0.93
Method	0.92	0.91	0.92	0.93	0.93	0.93	0.89	0.87	0.88
Intervention	0.87	0.79	0.83	0.88	0.80	0.84	0.85	0.75	0.80
Results	0.98	0.99	0.99	0.99	1.00	0.99	0.96	0.97	0.96
Conclusion	0.99	0.99	0.99	1.00	1.00	1.00	0.89	0.87	0.88

System 3	Accuracy = 95.60%			Accuracy = 96.45%			Accuracy = 94.55%

Aim	0.95	0.98	0.97	0.96	1.00	0.98	0.93	0.97	0.95
Method	0.92	0.92	0.92	0.93	0.93	0.93	0.91	0.91	0.91
Intervention	0.87	0.80	0.83	0.88	0.80	0.84	0.86	0.78	0.82
Results	0.98	0.99	0.99	0.99	1.00	0.99	0.97	0.99	0.98
Conclusion	0.99	0.99	0.99	1.00	1.00	1.00	0.99	0.98	0.99

System 4	Accuracy = 93.89%			Accuracy = 95.02%			Accuracy = 91.34%

Aim	0.95	0.97	0.96	0.96	0.99	0.97	0.92	0.93	0.93
Method	0.88	0.89	0.88	0.89	0.90	0.90	0.85	0.85	0.85
Intervention	0.77	0.71	0.74	0.78	0.73	0.75	0.76	0.69	0.72
Results	0.99	0.99	0.99	1.00	1.00	1.00	0.96	0.96	0.97
Conclusion	0.99	0.99	0.99	1.00	1.00	1.00	0.96	0.98	0.97

Sentence classification using CRFs into five classes including Intervention. Results report on four systems. System 1: baseline system. System 2: feature vectors augmented with section headings from the four rhetorical roles, where they are either mapped from original headings in structured abstracts or predicted by the four class CRF model for unstructured abstracts. System 3 (oracle): feature vectors augmented with manually corrected section headings. System 4: same as System 2 except the training data is also augmented with training data from Set I. Precision (P), Recall (R) and F-score (F) are reported for each label over the entire data set (318), the structured subset (211) and unstructured subset (107).