Skip to main content
. 2021 Apr 30;9(4):e24020. doi: 10.2196/24020

Table 3.

Evaluation results on Observation concepts in the test set for different intermediate-task pretraining and domain-adaptive pretraining combinationsa.

ITPTb and BERTc model Precision, mean (SD) Recall, mean (SD) F1 score, mean (SD)
BERT 75.0 (1.8) 85.3 (1.1) 79.8 (0.6)
NCBId-disease

+DAPTe (BioBERT) 77.7 (2.6) 85.1 (2.8) 81.1 (1.1)

+DAPT (ClinicalBERT) 78.6 (3.2) 84.4 (1.5) 81.3 (1.2)

BERT 71.6 (3.4) 88.9 (2.4) 79.2 (1.5)
i2b2f 2010

+DAPT (BioBERT) 75.6 (1.9) 86.2 (1.4) 80.5 (1.4)

+DAPT (ClinicalBERT) 73.2 (2.0) 89.0 (1.8) 80.3 (0.7)

BERT 70.7 (2.7) 88.7 (1.5) 78.6 (1.3)
ShARe-CLEFg 2013

+DAPT (BioBERT) 72.9 (2.5) 88.3 (2.3) 79.8 (0.8)

+DAPT (ClinicalBERT) 74.2 (2.6) 86.5 (3.8) 79.8 (0.9)

aDocument-level precision, recall, and F1 score are reported using official evaluation scripts.

bITPT: intermediate-task pretraining.

cBERT: Bidirectional Encoder Representations from Transformers.

dNCBI: National Center for Biotechnology Information.

eDAPT: domain-adaptive pretraining.

fi2b2: Integrating Biology and the Bedside.

gShARe-CLEF: Shared Annotated Resources-Conference and Labs of the Evaluation Forum.