Skip to main content
. 2020 Nov 27;8(11):e22508. doi: 10.2196/22508

Table 3.

Pretrained language models used in the ensemble module and their training corpora.

Language model Corpora for language model pretraining Domain
MT-DNNa Wikipedia+BookCorpus General
RoBERTab Wikipedia+BookCorpus+CC-News+OpenWebText+Stories General
BioBERTc Wikipedia+BookCorpus+PubMed+PMCd Biomedical
IIT-MTL-ClinicalBERTe Wikipedia+BookCorpus+MIMIC-IIIf Clinical

aMT-DNN: multi-task deep neural networks.

bRoBERTa: robustly optimized bidirectional encoder representations from transformers approach.

cBioBERT: bidirectional encoder representations from transformers for biomedical text mining.

dPMC: PubMed Central

eIIT-MTL-ClinicalBERT: iteratively trained using multi-task learning on ClinicalBERT.

fMIMIC-III: Medical Information Mart for Intensive Care.