Table 3.
Model performance in each measure (Experiment 1).
| Condition | Accuracy | Macro average | Weighted average | ||||
|---|---|---|---|---|---|---|---|
| Mean precision | Mean recall | Mean f1 score | Mean precision | Mean recall | Mean f1 score | ||
| Responses/same-pair | 0.724 | 0.636 | 0.666 | 0.640 | 0.742 | 0.724 | 0.730 |
| Responses/different-pair | 0.472 | 0.454 | 0.468 | 0.404 | 0.510 | 0.472 | 0.476 |
| Classification criteria/same-pair | 0.630 | 0.498 | 0.486 | 0.486 | 0.668 | 0.630 | 0.644 |
| Classification criteria/different-pair | 0.548 | 0.466 | 0.460 | 0.438 | 0.644 | 0.548 | 0.580 |
| Combination/same-pair | 0.828 | 0.714 | 0.692 | 0.694 | 0.828 | 0.828 | 0.828 |
| Combination/different-pair | 0.814 | 0.698 | 0.690 | 0.690 | 0.818 | 0.814 | 0.814 |
The highest performances in each performance measure are provided in bold.