. 2024 Jun 3;14:12731. doi: 10.1038/s41598-024-63380-6

Table 2.

Description of the studied pre-trained lightweight LLMs. Model size is proportional to the number of Layers, Attention Heads, and trainable Parameters.

Model	Layers	Att. heads	Parameters	Size
BERT Base	12	12	110M	440 Mb
BERT Large	24	16	340M	1.2 Gb
ALBERT Base	12	12	11M	63 Mb
ALBERT Large	24	16	17M	87 Mb
RoBERTa Base	12	12	82M	499 Mb
RoBERTa Large	24	16	355M	1.6 Gb
XLNet Base	12	12	110M	565 Mb
XLNet Large	24	16	340M	1.57 Gb