Example study of reports, and error types and categories
(A) Example study of a test report and four metric-oracle reports corresponding to BLEU, BERTScore, CheXbert vector similarity, and RadGraph F1 that radiologists evaluate to identify errors.
(B) Two error types and six error categories that radiologists identify for each pair of test report and metric-oracle report.