Table 2:
ROUGE-L F-score (RL-F), sentence embedding cosine similarity (Sent.θ), BERTScore (BS), and evaluation using CUI F-score (CUI) from fine-tuning T5 and BART on the two input settings: Assessment (Assmt), Assessment with Subjective sections(A+Subj.) ++ represents the training with data augmentation.
| Model | Setting | Explicit Mentions | Direct Problems | Indirect Problems | All Problems | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| |||||||||||||||||
| RL-F | Sent.θ | BS | CUI | RL-F | Sent.θ | BS | CUI | RL-F | Sent.θ | BS | CUI | RL-F | Sent.θ | BS | CUI | ||
| Rule-based | Assmt | 34.45 | 58.81 | 59.80 | 38.97 | 12.31 | 55.33 | 40.13 | 34.23 | 9.49 | 55.58 | 44.46 | 33.16 | 13.45 | 68.61 | 50.32 | 43.93 |
|
| |||||||||||||||||
| T5 | Assmt | 32.77 | 59.57 | 57.75 | 41.73 | 13.68 | 53.44 | 39.72 | 36.10 | 10.40 | 54.76 | 44.16 | 35.08 | 14.82 | 67.49 | 49.89 | 44.51 |
| ++ | 31.76 | 58.74 | 57.12 | 42.19 | 13.78 | 53.65 | 40.30 | 35.84 | 10.55 | 54.10 | 43.48 | 35.20 | 15.00 | 67.32 | 50.36 | 44.55 | |
| A+Subj | 20.24 | 50.04 | 47.55 | 33.44 | 9.52 | 51.91 | 39.72 | 30.43 | 7.10 | 54.14 | 43.87 | 30.29 | 10.89 | 64.63 | 49.75 | 39.02 | |
| ++ | 20.72 | 59.64 | 57.97 | 33.56 | 9.46 | 53.55 | 39.52 | 18.76 | 7.35 | 54.69 | 44.36 | 14.40 | 10.93 | 67.19 | 50.42 | 24.83 | |
|
| |||||||||||||||||
| BART | Assmt | 25.70 | 54.98 | 52.99 | 32.49 | 10.00 | 53.66 | 39.08 | 29.41 | 8.04 | 54.66 | 43.12 | 29.04 | 11.56 | 66.86 | 48.48 | 38.36 |
| ++ | 28.22 | 57.04 | 55.16 | 32.28 | 10.33 | 53.40 | 39.21 | 30.75 | 8.29 | 54.48 | 44.01 | 32.08 | 11.65 | 66.67 | 49.23 | 40.69 | |
| A+Subj | 18.80 | 49.19 | 46.77 | 26.96 | 7.04 | 51.70 | 38.24 | 25.30 | 6.00 | 54.29 | 43.71 | 26.01 | 9.25 | 64.95 | 48.19 | 34.02 | |
| ++ | 20.23 | 57.91 | 54.68 | 32.91 | 7.88 | 53.85 | 40.21 | 30.09 | 6.85 | 54.61 | 43.15 | 30.12 | 9.84 | 67.00 | 49.70 | 38.72 | |