Skip to main content
. 2024 Jun 3;14:12731. doi: 10.1038/s41598-024-63380-6

Table 2.

Description of the studied pre-trained lightweight LLMs. Model size is proportional to the number of Layers, Attention Heads, and trainable Parameters.

Model Layers Att. heads Parameters Size
BERT Base 12 12 110M 440 Mb
BERT Large 24 16 340M 1.2 Gb
ALBERT Base 12 12 11M 63 Mb
ALBERT Large 24 16 17M 87 Mb
RoBERTa Base 12 12 82M 499 Mb
RoBERTa Large 24 16 355M 1.6 Gb
XLNet Base 12 12 110M 565 Mb
XLNet Large 24 16 340M 1.57 Gb