Skip to main content
. 2025 Sep 23;11:e71102. doi: 10.2196/71102

Table 5. Performance of baseline models in text classification.

Model, learning strategy, and sentiment Precision Recall F1-score Standard error (F1) Lower CI (F1) Upper CI (F1)
Bert-base-cased
 SFTa
  Harmful_outcome 0 0 0 0 0 0
  Favorable_outcome 0.6470 0.8800 0.7457 0.0435 0.6603 0.8310
  Ambiguous_outcome 0.7000 0.4375 0.5384 0.0498 0.4406 0.6361
Bio_ClinicalBERT
 SFT
  Harmful_outcome 0 0 0 0 0 0
  Favorable_outcome 0.6486 0.96 0.7741 0.0418 0.6921 0.8560
  Ambiguous_outcome 0.8571 0.375 0.5217 0.0499 0.4237 0.6196
gpt4-1106-preview-chat
 Zero-shot
  Harmful_outcome 0.6667 0.6667 0.6667 0.0471 0.5743 0.7590
  Favorable_outcome 0.7368 0.56 0.6364 0.0481 0.5421 0.7306
  Ambiguous_outcome 0.4545 0.625 0.5263 0.0499 0.4284 0.6241
 Few-shot
  Harmful_outcome 0.5 0.6667 0.5714 0.0494 0.4744 0.6683
  Favorable_outcome 0.6429 0.72 0.6792  0.0466 0.5877 0.7706
  Ambiguous_outcome 0.3333 0.25 0.2857 0.0451 0.1971 0.3742
 Many-shot
  Harmful_outcome 0.6667 0.6667 0.6667 0.0471 0.5743 0.7590
  Favorable_outcome 0.8519 0.92 0.8846 0.0319 0.8219 0.9472
  Ambiguous_outcome 0.7857 0.6875 0.7333 0.0442 0.6466 0.8199
a

SFT: supervised fine-tuning.