Skip to main content
. 2021 Jul 17;28(10):2193–2201. doi: 10.1093/jamia/ocab112

Figure 1.

Figure 1.

Overall framework of the synthetic text generation and evaluation of clinical named entity recognition tasks. (Left box) Compare different text generation language algorithms and generate the synthetic corpus of History and Present Illness sections (BLEU [bilingual evaluation understudy] measures reported in Table 1); (right box) train named entity recognition models and evaluate their performance across different corpora: synthetic, natural, external_1, and external_2. Yellow arrows indicate 10-fold cross validation for each corpus (performance is reported in Table 3); purple arrows indicate train on the synthetic corpus and predict on test sets of natural, external_1, and external_2 (performance is reported in Table 4); orange arrows indicate train on the natural corpus and predict on test sets of synthetic, external_1, and external_2 (performance is reported in Table 5); and green arrows indicate train on the natural+synthetic corpus and predict on test sets of natural corpus (performance is reported in Table 6).