. 2021 Nov 25;22:565. doi: 10.1186/s12859-021-04479-9

Table 4.

The performance of model-term and model-code in different modelling tasks, and in comparison with two baselines using Precision (P), Recall (R), $F_{1}$ measures grounded at evidence level in each (in)consistency type and Micro-Precision ( $P^{*}$ ), Micro-Recall ( $R^{*}$ ) averaged over every predicted instances in the test set

	Model-term
	Consistent			(A)			(B)			(C)
	P	R	F₁	P	R	F₁	P	R	F₁	P	R	F₁
Basic system	0.54	0.70	0.61	0.48	0.29	0.36	0.79	0.48	0.60	0.65	0.96	0.78
+Training Opt	0.74	0.71	0.72	0.54	0.33	0.41	0.76	0.57	0.65	0.61	0.93	0.73
+SectionInfo	0.69	0.65	0.67	0.46	0.35	0.40	0.77	0.52	0.62	0.52	0.96	0.68
+Opt & SectionInfo	0.69	0.64	0.66	0.45	0.31	0.37	0.75	0.51	0.61	0.50	0.96	0.66
Baselines	First modelling task
prior-biased classifier	0.48	0.35	0.41	0.05	0.02	0.03	0.36	0.28	0.31	0.10	0.33	0.15
rule-based model	0.53	0.38	0.44	0.09	0.04	0.06	0.41	0.29	0.34	0.18	0.99	0.30

	Model-term			Model-code
	Overall			Consistent			(D)			Overall
	P^*	R^*	F₁	P	R	F₁	P	R	F₁	P^*	R^*	F₁
Basic system	0.64	0.64	0.64	0.75	0.50	0.60	0.31	0.58	0.41	0.52	0.52	0.52
+Training Opt	0.69	0.69	0.69	–	–	–	–	–	–	–	–	–
+SectionInfo	0.65	0.65	0.65	0.82	0.48	0.61	0.21	0.56	0.31	0.50	0.50	0.50
+Opt & SectionInfo	0.63	0.63	0.63	–	–	–	–	–	–	–	–	–
Baselines	First modelling task			Second modelling task
prior-biased classifier	0.30	0.30	0.30	0.6	0.64	0.62	0.41	0.37	0.39	0.53	0.53	0.53
rule-based model	0.36	0.36	0.36	0.6	0.64	0.62	0.41	0.37	0.39	0.53	0.53	0.53

The highest metric scores for the identification of each type of (in)consistency is bolded