Table 4. Details of the models’ architecture.
Model | Tokenization | L | A | V | Params | ||
---|---|---|---|---|---|---|---|
BETO | WordPiece | 12 | 12 | 768 | 3,072 | 31 K | 110 M |
ALBETO | SentencePiece | 12 | 12 | 768 | 3,072 | 31 K | 12 M |
DistilBETO | WordPiece | 6 | 12 | 768 | 3,072 | 31 K | 67 M |
MarIA | Byte-level BPE | 12 | 12 | 768 | 3,072 | 50 K | 125 M |
BERTIN | Byte-level BPE | 12 | 12 | 768 | 3,072 | 50 K | 125 M |