Skip to main content
. 2025 Mar 27;27:e65537. doi: 10.2196/65537

Table 4.

Performance comparison of GPT-4, Qwen2-72B, Llama3-70B, and GPT-3.5 models on the sepsis dataset.

Model Sepsis dataset

Precision Recall F1-score
Qwen2-72B 44.73 42.85 43.77
Llama3-70B 49.40 47.43 48.39
GPT-3.5 56.63 54.48 55.53
GPT-4 and Zero shot 72.12 70.48 71.29
GPT-4 and Few shot 77.73 75.81 76.76