Skip to main content
. 2021 Nov 25;22:565. doi: 10.1186/s12859-021-04479-9

Table 4.

The performance of model-term and model-code in different modelling tasks, and in comparison with two baselines using Precision (P), Recall (R), F1 measures grounded at evidence level in each (in)consistency type and Micro-Precision (P), Micro-Recall (R) averaged over every predicted instances in the test set

Model-term
Consistent (A) (B) (C)
P R F1 P R F1 P R F1 P R F1
Basic system 0.54 0.70 0.61 0.48 0.29 0.36 0.79 0.48 0.60 0.65 0.96 0.78
+Training Opt 0.74 0.71 0.72 0.54 0.33 0.41 0.76 0.57 0.65 0.61 0.93 0.73
+SectionInfo 0.69 0.65 0.67 0.46 0.35 0.40 0.77 0.52 0.62 0.52 0.96 0.68
+Opt & SectionInfo 0.69 0.64 0.66 0.45 0.31 0.37 0.75 0.51 0.61 0.50 0.96 0.66
Baselines First modelling task
prior-biased classifier 0.48 0.35 0.41 0.05 0.02 0.03 0.36 0.28 0.31 0.10 0.33 0.15
rule-based model 0.53 0.38 0.44 0.09 0.04 0.06 0.41 0.29 0.34 0.18 0.99 0.30
Model-term Model-code
Overall Consistent (D) Overall
P* R* F1 P R F1 P R F1 P* R* F1
Basic system 0.64 0.64 0.64 0.75 0.50 0.60 0.31 0.58 0.41 0.52 0.52 0.52
+Training Opt 0.69 0.69 0.69
+SectionInfo 0.65 0.65 0.65 0.82 0.48 0.61 0.21 0.56 0.31 0.50 0.50 0.50
+Opt & SectionInfo 0.63 0.63 0.63
Baselines First modelling task Second modelling task
prior-biased classifier 0.30 0.30 0.30 0.6 0.64 0.62 0.41 0.37 0.39 0.53 0.53 0.53
rule-based model 0.36 0.36 0.36 0.6 0.64 0.62 0.41 0.37 0.39 0.53 0.53 0.53

The highest metric scores for the identification of each type of (in)consistency is bolded