Skip to main content
. 2023 Sep 26;2:e43483. doi: 10.2196/43483

Table 1.

Summary of pretraining details for the various Bidirectional Encoder Representations from Transformers (BERT) models used in our experiments.

Model Vocabulary Pretraining Corpus Text size
BERT Wikipedia+Books N/Aa Wikipedia+Books 3.3B words (16 GB)
Clinical BERT Wikipedia+Books Continual pretraining MIMICb (subset)+MIMIC-III 0.5B words (3.7 GB)
BioBERT Wikipedia+Books Continual pretraining PubMed+PMCc 4.5B words
BlueBERT Wikipedia+Books Continual pretraining PubMed+MIMIC-III 4.5B words

aN/A: not applicable.

bMIMIC: Medical Information Mart for Intensive Care.

cPMC: PubMed Central.