Skip to main content
. 2025 Sep 17;9:e73658. doi: 10.2196/73658

Table 3. Automatic quality metrics for GAI-baseda machine translation output in a case study of an in-house translation team in a Finnish wellbeing services county.

Measure Target segment language
Swedish English
Total number of segments, n 1261 112
GAI output accepted without edits
 Segments, n (%)
141 (11) 18 (16)
Machine translation output quality metric (average), meanb,c
 HTERd,e 55 46
 BLEUf 43 38
 METEORgh 63 59
 ChrFi 68 57
a

GAI: generative artificial intelligence.

b

Scale 0-100, where 100=perfect match.

c

The reported HTER, BLEU, METEOR, and ChrF values are aggregate machine translation evaluation metrics calculated over the entire dataset. Therefore, a standard deviation is not applicable in this context.

d

HTER: human-targeted translation edit rate.

e

Multiplied by 100, inverted and negative values changed to 0 to convert the metric to equivalent scale from 0 to 100.

f

BLEU: Bilingual Evaluation Understudy.

g

METEOR: Metric for Evaluation of Translation With Explicit Ordering.

h

Multiplied by 100 to convert the metric to equivalent scale from 0 to 100.

i

ChrF: Character n-gram F-score.