Table 3:
Evaluation results for total CTRS scores, M: #utterances/segment, k: #times processing the segment scores estimator, SQE: segment quality estimator
Approach | BERT fine-tune | M | k | SQE mode | RMSE/MAE | F1 score (%) |
---|---|---|---|---|---|---|
Frequency-based Methods | ||||||
tf-idf + LR | - | - | - | - | 9.48/7.49 | - |
tf-idf + SVM | - | - | - | - | - | 69.0 |
Neural Network Methods | ||||||
glove + LSTM | - | 1 | - | - | 10.05/8.09 | 59.6 |
- | 40 | - | - | 9.90/7.99 | 60.2 | |
doc2vec + LSTM | - | 1 | - | - | 9.88/7.91 | 62.2 |
- | 40 | - | - | 9.75/7.80 | 63.0 | |
Transformer Models | ||||||
BERT-small | - | - | - | - | 9.89/7.93 | 61.9 |
Longformer | - | - | - | - | 9.35/7.31 | 67.9 |
BigBird | - | - | - | - | 9.30/7.25 | 68.5 |
Hierarchical Framework 1 | ||||||
BERT-small + LSTM2 | ✗ | 1 | 0 | 9.78/7.82 | 62.6 | |
✓ | 1 | 0 | - | 9.88/7.89 | 62.2 | |
✗ | 40 | 0 | - | 9.68/7.70 | 63.5 | |
✓ | 40 | 0 | 8.78/6.97 | 70.7 | ||
BERT-cbt-utt + LSTM | ✗ | 1 | 0 | 9.57/7.59 | 65.3 | |
✓ | 1 | 0 | - | 9.56/7.67 | 64.6 | |
✗ | 40 | 0 | - | 9.45/7.50 | 65.5 | |
✓ | 40 | 0 | 8.59/6.80 | 72.0 | ||
BERT-cbt-segment + LSTM | ✗ | 40 | 0 | 9.27/7.29 | 67.9 | |
✓ | 40 | 0 | 8.47/6.59 | 73.0 + | ||
BERT-cbt-segment + LSTM + SQE | ✓ | 40 | 1 | Even | 8.19/6.35 | 74.7 |
✓ | 40 | 1 | Uneven | 8.25/6.40 | 74.3 | |
✓ | 40 | 2 | Even | 8.12/6.29 | 75.0 | |
✓ | 40 | 2 | Uneven | 8.22/6.38 | 74.5 | |
✓ | 40 | 3 | Even | 8.09/6.27 | 75.1 * | |
✓ | 40 | 3 | Uneven | 8.22/6.37 | 74.5 |
Approachs without fine-tuning correspond to the single task models in Flemotomos et al. (2021a)
Corresponds to the framework in Pappagari et al. (2019)
is significantly higher than
at p < 0.05 based on Student’s t-test.