Table 3. Automatic quality metrics for GAI-baseda machine translation output in a case study of an in-house translation team in a Finnish wellbeing services county.
| Measure | Target segment language | ||
|---|---|---|---|
| Swedish | English | ||
| Total number of segments, n | 1261 | 112 | |
| GAI output accepted without edits | |||
| Segments, n (%) |
141 (11) | 18 (16) | |
| Machine translation output quality metric (average), meanb,c | |||
| HTERd,e | 55 | 46 | |
| BLEUf | 43 | 38 | |
| METEORgh | 63 | 59 | |
| ChrFi | 68 | 57 | |
GAI: generative artificial intelligence.
Scale 0-100, where 100=perfect match.
The reported HTER, BLEU, METEOR, and ChrF values are aggregate machine translation evaluation metrics calculated over the entire dataset. Therefore, a standard deviation is not applicable in this context.
HTER: human-targeted translation edit rate.
Multiplied by 100, inverted and negative values changed to 0 to convert the metric to equivalent scale from 0 to 100.
BLEU: Bilingual Evaluation Understudy.
METEOR: Metric for Evaluation of Translation With Explicit Ordering.
Multiplied by 100 to convert the metric to equivalent scale from 0 to 100.
ChrF: Character n-gram F-score.