Table 2.
Evaluation metrics when using different classification schemes for GPT-3 synonyms and using manual labels as a proxy for ground truth.
| Index Term | GPT-3 Synonym Criteria | Precision | Recall | F1 Score | F2 Score |
|---|---|---|---|---|---|
| Alprazolam | All generated terms | 0.264 | 1.000 | 0.418 | 0.642 |
| Fentanyl | All generated terms | 0.220 | 1.000 | 0.361 | 0.585 |
| Alprazolam | All RedMed terms | 1.000 | 0.178 | 0.302 | 0.213 |
| Fentanyl | All RedMed terms | 1.000 | 0.115 | 0.206 | 0.140 |
| Alprazolam | Drug name filter | 0.285 | 0.996 | 0.443 | 0.664 |
| Fentanyl | Drug name filter | 0.232 | 1.000 | 0.377 | 0.602 |
| Alprazolam | Drug name & frequency filters | 0.567 | 0.487 | 0.524 | 0.501 |
| Fentanyl | Drug name & frequency filters | 0.521 | 0.465 | 0.491 | 0.475 |
| Alprazolam | Drug name & Google filters | 0.698 | 0.859 | 0.770 | 0.821 |
| Fentanyl | Drug name & Google filters | 0.568 | 0.793 | 0.662 | 0.735 |
| Alprazolam | Drug name, frequency, & Google filters | 0.859 | 0.431 | 0.574 | 0.479 |
| Fentanyl | Drug name, frequency, & Google filters | 0.770 | 0.395 | 0.522 | 0.438 |