Table 3.
Language model | Corpora for language model pretraining | Domain |
MT-DNNa | Wikipedia+BookCorpus | General |
RoBERTab | Wikipedia+BookCorpus+CC-News+OpenWebText+Stories | General |
BioBERTc | Wikipedia+BookCorpus+PubMed+PMCd | Biomedical |
IIT-MTL-ClinicalBERTe | Wikipedia+BookCorpus+MIMIC-IIIf | Clinical |
aMT-DNN: multi-task deep neural networks.
bRoBERTa: robustly optimized bidirectional encoder representations from transformers approach.
cBioBERT: bidirectional encoder representations from transformers for biomedical text mining.
dPMC: PubMed Central
eIIT-MTL-ClinicalBERT: iteratively trained using multi-task learning on ClinicalBERT.
fMIMIC-III: Medical Information Mart for Intensive Care.