Table 4.
Experiment and language model | Data sets used for iterative intermediate training approach using multi-task learning | Pearson correlation coefficient on internal test | ||||||
|
STS-Ba | RQEb | MedNLIc | Topic | MedNERd | QQPe |
|
|
BLf | ||||||||
|
1 BERTg | —h | — | — | — | — | — | 0.834 |
|
2 ClinicalBERTi | — | — | — | — | — | — | 0.848 |
Iterj | ||||||||
|
1 ClinicalBERT | ✓k | — | — | — | — | — | 0.852 |
|
2 ClinicalBERT | ✓ | ✓ | ✓ | — | — | — | 0.862 |
|
3 ClinicalBERT | ✓ | ✓ | ✓ | ✓ | — | — | 0.866 |
|
4 ClinicalBERT | ✓ | ✓ | ✓ | ✓ | ✓ | — | 0.870 l |
|
5 ClinicalBERT | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.856 |
aSTS-B: semantic textual similarity benchmark.
bRQE: Recognizing Question Entailment.
cMedNLI: Natural Language Inference data set for the clinical domain.
dMedNER: Medication-NER data set.
eQQP: Quora Question Pair data set.
fBL: baseline.
gBERT: bidirectional encoder representations from transformers.
hIndicates data set was not used for this experiment.
iClinicalBERT: bidirectional encoder representations from transformers on clinical text mining.
jIter: iteration.
kIndicates data sets that were trained together in multi-task learning.
lItalics signify highest Pearson correlation coefficient obtained on internal test data set.