. 2020 Dec 15;8(12):e22982. doi: 10.2196/22982

Table 2.

The optimized hyperparameters of BERT-based models for various tasks.

Task	Pretrained model	Number of epochs	Batch size	Learning rate
NER^a	BERT^b-LARGE	30	4	1.00 × 10^–05
Negation classification	BERT-LARGE	5	8	1.00 × 10^–05
Side of family classification	BERT-LARGE	10	4	1.00 × 10^–05
Role of family classification	BERT-LARGE	5	8	1.00 × 10^–05
Living status classification	BERT-LARGE	6	8	1.00 × 10^–05
Relation identification	BERT-LARGE	12	16	2.00 × 10^–05

^aNER: named entity recognition.

^bBERT: bidirectional encoder representations from transformers.