Confusion matrices for report coding with two language models (BERT-base
and RadBERT-RoBERTa) fine-tuned to assign diagnostic codes in two coding
systems (Lung Imaging Reporting and Data System [Lung-RADS] and
abnormal) (see Appendix E4 [supplement]).
(A, B) The Lung-RADS dataset consisted of six
categories: “incomplete,” “benign nodule appearance
or behavior,” “probably benign nodule,”
“suspicious nodule-a,” “suspicious
nodule-b,” and “prior lung cancer,” denoted as
numbers 1 to 6 in the figure. (C, D) The abnormal dataset
also consisted of six categories: “major abnormality,”
“no attn needed,” “major abnormality, physician
aware,” “minor abnormality,” “possible
malignancy,” “significant abnormality, attn
needed,” and “normal.” The figures show that
RadBERT-RoBERTa improved from BERT-base by better distinguishing code
numbers 5 and 6 for Lung-RADS and making fewer errors for code number 1
of the abnormal dataset. BERT = bidirectional encoder representations
from transformers, RadBERT = BERT-based language model adapted for
radiology, RoBERTa = robustly optimized BERT pretraining approach.