Table 4:
Aggregate Accuracy, True Negative Rate, (Micro- and Macro-) Precision and Recall for MMSE and CDR scores extracted by ChatGPT and LlaMA-2.
| All notes with parsed JSON (N=710) | Double-reviewed notes with parsed JSON (N=306) | |||
|---|---|---|---|---|
| ChatGPT | LlaMA-2 | ChatGPT | LlaMA-2 | |
| MMSE | ||||
| Total notes without any MMSE (in ground truth) | 115 | 48 | ||
| Total notes without any MMSE (in GPT results) | 77 | 110 | 25 | 46 |
| Total correctly predicted empty MMSEs | 76 | 66 | 24 | 23 |
| ChatGPT’s True Negative Rate for MMSE(%) | 98.7 | 60.0 | 96 | 50.0 |
| ChatGPT’s False Negative Rate for MMSE(%) | 1.2 | 40.0 | 4 | 50.0 |
| Remaining notes with un-empty GPT response undergone Precision/Recall calculation for MMSE | 633 | 600 | 281 | 260 |
| Total MMSE instances predicted | 831 | 957 | 366 | 410 |
| MMSE Macro Precision (mean % (sd %)) | 82.9 (sd 36.2) | 62.2(sd 45.5) | 82.7 (sd 36.8) | 63.4 (sd 44.9) |
| MMSE Macro Recall (mean % (sd %)) | 87.8 (sd 30.4) | 69.9 (sd 43.5) | 89.7 (sd 28.3) | 71.8 (sd 42.1) |
| MMSE Micro Precision (%) | 83.8 | 57.7 | 84.1 | 59.3 |
| MMSE Micro Recall (%) | 83.7 | 68.1 | 87.5 | 69.0 |
| Total notes with any error MMSE result | 121 | 238 | 52 | 98 |
| Overall accuracy of MMSE (%) | 82.9 | 66.4 | 83.0 | 68.0 |
| CDR | ||||
| Total notes without CDR (in ground truth) | 608 | 260 | ||
| Total notes without CDR (in GPT results) | 533 | 497 | 233 | 215 |
| Total correctly predicted empty CDR | 532 | 489 | 233 | 212 |
| CDR True Negative Rate (%) | 99.8 | 98.4 | 100 | 98.6 |
| CDR False Negative Rate (%) | 0.2 | 1.6 | 0 | 1.4 |
| Remaining notes with un-empty GPT response undergone Precision/Recall calculation for CDR | 177 | 213 | 73 | 153 |
| Total CDR instances predicted | 256 | 344 | 92 | 153 |
| CDR Macro Precision (mean % sd %) | 48.3 (sd 49.9) | 16.1 (sd 35.5) | 57.5 (sd 49.4) | 18.1 (sd 36.9) |
| CDR Macro Recall (mean % sd %) | 84.3 (sd 36.3) | 39.7 (sd 48.7) | 91.3 (sd 28.1) | 43.5 (sd 49.6) |
| CDR Micro Precision (%) | 36.3 | 12.0 | 51.0 | 13.2 |
| CDR Micro Recall (%) | 85.3 | 37.6 | 92.1 | 39.2 |
| Total notes with any error CDR result | 91 | 181 | 31 | 76 |
| Overall accuracy of CDR (%) | 87.1 | 74.5 | 89.8 | 75.4 |