Skip to main content
. Author manuscript; available in PMC: 2023 Sep 1.
Published in final edited form as: Comput Speech Lang. 2022 Mar 28;75:101380. doi: 10.1016/j.csl.2022.101380

Table 3:

Evaluation results for total CTRS scores, M: #utterances/segment, k: #times processing the segment scores estimator, SQE: segment quality estimator

Approach BERT fine-tune M k SQE mode RMSE/MAE F1 score (%)
Frequency-based Methods
tf-idf + LR - - - - 9.48/7.49 -
tf-idf + SVM - - - - - 69.0
Neural Network Methods
glove + LSTM - 1 - - 10.05/8.09 59.6
- 40 - - 9.90/7.99 60.2
doc2vec + LSTM - 1 - - 9.88/7.91 62.2
- 40 - - 9.75/7.80 63.0
Transformer Models
BERT-small - - - - 9.89/7.93 61.9
Longformer - - - - 9.35/7.31 67.9
BigBird - - - - 9.30/7.25 68.5
Hierarchical Framework 1
BERT-small + LSTM2 1 0 9.78/7.82 62.6
1 0 - 9.88/7.89 62.2
40 0 - 9.68/7.70 63.5
40 0 8.78/6.97 70.7
BERT-cbt-utt + LSTM 1 0 9.57/7.59 65.3
1 0 - 9.56/7.67 64.6
40 0 - 9.45/7.50 65.5
40 0 8.59/6.80 72.0
BERT-cbt-segment + LSTM 40 0 9.27/7.29 67.9
40 0 8.47/6.59 73.0 +
BERT-cbt-segment + LSTM + SQE 40 1 Even 8.19/6.35 74.7
40 1 Uneven 8.25/6.40 74.3
40 2 Even 8.12/6.29 75.0
40 2 Uneven 8.22/6.38 74.5
40 3 Even 8.09/6.27 75.1 *
40 3 Uneven 8.22/6.37 74.5
1

Approachs without fine-tuning correspond to the single task models in Flemotomos et al. (2021a)

2

Corresponds to the framework in Pappagari et al. (2019)

*

is significantly higher than

+

at p < 0.05 based on Student’s t-test.