Skip to main content
. 2024 Jun 4;31(9):1821–1832. doi: 10.1093/jamia/ocae122

Table 2.

One-shot performance of the baseline models, original LLaMA, and instructed LLaMA on the medication status extraction and coreference resolution tasks.

Medication status extraction
Coreference resolution
Models Precision Recall F1 Conditional ACC Precision Recall F1
PMC-LLaMA 7B 68.08 78 67.28 ± 0.3213 71.21 ± 0.3311 56.56 47.37 45.97 ± 0.1935
ChatDoctor 63.07 83.56 67.6 ± 0.0232 80.31 ± 0.3565 58.29 56.18 54.02 ± 0.0366
LLaMA 1 7B 67.61 71.95 66.58 ± 0.0601 78.45 ± 0.2048 60.14 39.19 43.8 ± 0.1914
LLaMA 1 7B Instruct 62.83 84.43 68.97 ± 0.241 82.52 ±0.6844 67.58 52.52 55.14 ± 0.1765
LLaMA 2 7B 65.58 89.16 71.72 ± 0.2313 76.69 ± 0.1564 61.48 45.97 50.03 ± 0.3343
LLaMA 2 7B Instruct 75.44 82.83 75.63 ±0.3701 82.3 ± 0.0664 72.58 58.69 61.24 ±0.439

Conditional ACC measured the correctness of medication status classification, indicating how many statuses were correctly identified conditional on the extracted medications, P <.05; 95% confidence interval. Bold values indicate best performance.