Table 2.
One-shot performance of the baseline models, original LLaMA, and instructed LLaMA on the medication status extraction and coreference resolution tasks.
Medication status extraction |
Coreference resolution |
||||||
---|---|---|---|---|---|---|---|
Models | Precision | Recall | F1 | Conditional ACC | Precision | Recall | F1 |
PMC-LLaMA 7B | 68.08 | 78 | 67.28 ± 0.3213 | 71.21 ± 0.3311 | 56.56 | 47.37 | 45.97 ± 0.1935 |
ChatDoctor | 63.07 | 83.56 | 67.6 ± 0.0232 | 80.31 ± 0.3565 | 58.29 | 56.18 | 54.02 ± 0.0366 |
LLaMA 1 7B | 67.61 | 71.95 | 66.58 ± 0.0601 | 78.45 ± 0.2048 | 60.14 | 39.19 | 43.8 ± 0.1914 |
LLaMA 1 7B Instruct | 62.83 | 84.43 | 68.97 ± 0.241 | 82.52 ± 0.6844 | 67.58 | 52.52 | 55.14 ± 0.1765 |
LLaMA 2 7B | 65.58 | 89.16 | 71.72 ± 0.2313 | 76.69 ± 0.1564 | 61.48 | 45.97 | 50.03 ± 0.3343 |
LLaMA 2 7B Instruct | 75.44 | 82.83 | 75.63 ± 0.3701 | 82.3 ± 0.0664 | 72.58 | 58.69 | 61.24 ± 0.439 |
Conditional ACC measured the correctness of medication status classification, indicating how many statuses were correctly identified conditional on the extracted medications, P <.05; 95% confidence interval. Bold values indicate best performance.