Skip to main content
. 2022 Dec 22;22:338. doi: 10.1186/s12911-022-02085-0

Table 3.

Measured Precision, Recall and F1-score performances on the three NLP tasks implemented in the pipeline on test sets

AE-Drug relationship classification Named Entity Recognition Seriousness classification
P R F1 P R F1 P R F1
UMLSBERT 0.94 0.93 0.93 0.94 0.96 0.95 0.89 0.87 0.88
bioBERT 0.91 0.93 0.92 0.96 0.95 0.95 0.89 0.90 0.89
blueBERT 0.93 0.89 0.91 0.96 0.93 0.94 0.73 0.83 0.78
sciBERT 0.94 0.92 0.93 0.95 0.95 0.95 0.92 0.81 0.86
Bio_ClinicalBERT 0.94 0.92 0.93 0.97 0.92 0.94 0.68 0.93 0.79
BERT 0.90 0.89 0.90 0.95 0.92 0.93 0.76 0.74 0.75
PubMedBERT 0.95 0.90 0.92 0.96 0.95 0.96 0.87 0.91 0.89

The best value per column is in bold. ThFor the drug/AE entity recognition task, the displayed metrics only concern the AE class. The best model was selected for each task, PubMedBERT for NER and seriousness classification, UMLSBERT for AE-Drug relationship classification