Table 2.
Results for BERT and Llama 2 and 3 in the test of
| BERT | Llama 2 | Llama 3 | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Cohort | Recall (accuracy) | Precision | Precision (outside) | F1 | F1 (outside) | Exact match | 80% match | Exact match | 80% match |
| Superhero | 1.00 | 0.54 | 1.00 | 0.70 | 1.00 | 0.58 | 0.92 | 0.60 | 0.95 |
| Dinosaur | 0.96 | 0.52 | 0.98 | 0.67 | 0.97 | 0.50 | 0.90 | 0.58 | 0.92 |
| Mammal | 1.00 | 0.54 | 0.94 | 0.70 | 0.97 | 0.60 | 0.92 | 0.58 | 0.92 |
| Bird | 0.98 | 0.53 | 0.96 | 0.69 | 0.97 | 0.52 | 0.94 | 0.56 | 0.95 |
“Precision (outside)” means that negative samples contain names outside the union of A, B, and C. “F1 (outside)” means the F1 score between Recall and Precision(outside).