Table 4.
Models | Tokenizer | Embedding vocabulary size | Number of parameters | Initial learning rate | Batch size |
Convolution neural network | MeCab-ko | 100,000 | 32 million | 1e–3 | 64 |
Unidirectional long short-term memory | MeCab-ko | 100,000 | 46 million | 2e–4 | 32 |
Bidirectional long short-term memory |
MeCab-ko | 100,000 | 40 million | 2e–4 | 32 |
Bidirectional encoder representations from transformers | WordPiece | 8002 | 92 million | 2e–5 | 8 |
ELECTRAa-version 1 | WordPiece | 32,200 | 110 million | 2e–5 | 8 |
ELECTRA-version 2 | MeCab-ko & WordPiece | 35,000 | 112 million | 2e–5 | 8 |
aELECTRA: efficiently learning an encoder that classifies token replacements accurately.