Table 3.
Average results for the different models on test data with a frequency threshold for the codes (codes occurring at least 50 times).
| Method | Weighted precision | Weighted specificity | Weighted recall | Weighted F1 |
| Binary Relevance (SGDa classifier) | 0.69 | 0.93 | 0.52 | 0.59 |
| BERTje | 0.77 | 0.97 | 0.68 | 0.70 |
| BERTje (domain adaptation) | 0.74 | 0.96 | 0.62 | 0.67 |
aSGD: Stochastic Gradient Descent.