Skip to main content
. 2023 Jan 12;2:e40843. doi: 10.2196/40843

Table 1.

Transformer models’ sizes and configurations.

Model Layers Attention heads Embedding dimension Parameters (millions) Pretraining corpus size (GB)
CamemBERT-base-CCNETa 12 12 768 110 135
FlauBERT-base-cased 12 12 768 138 71
BelGPT2 12 12 768 117 57.9
GPTanam 12 12 768 117 58.6

aCCNET: criss-cross attention for semantic segmentation.