Skip to main content
. 2024 Jan 24;6(2):e230205. doi: 10.1148/ryai.230205

Figure 2:

Model output comparison for a sample generated radiology report with multiple errors. Errors (red) and corrections (green) detected by (A) GPT-4, (B) text-davinci-003, (C) GPT-3.5-turbo, (D) Llama-v2–70B-chat, and (E) Bard.

Model output comparison for a sample generated radiology report with multiple errors. Errors (red) and corrections (green) detected by (A) GPT-4, (B) text-davinci-003, (C) GPT-3.5-turbo, (D) Llama-v2–70B-chat, and (E) Bard.