Skip to main content
. 2022 Oct 4;62(20):4852–4862. doi: 10.1021/acs.jcim.2c00715

Table 2. Model Configuration and Hyperparameters.

hyperparameter value
batch size per GPU (11 GB) 8
gradient accumulation steps 32
effective batch size 4096
peak learning rate 0.0006
hidden size 768
intermediate size 3072