Table 2. Optimal hyper-parameters for models with and without reader embeddings.
| Model | ||||||
|---|---|---|---|---|---|---|
| ResNet18 | ResNet34 | ResNet50 | ||||
| without reader
embeddings |
with reader
embeddings |
without reader
embeddings |
with reader
embeddings |
without reader
embeddings |
with reader
embeddings |
|
| Hyper-parameter | ||||||
| Activation function for projected reader embeddings | identity | identity | identity | |||
| Batch size | 8 | 32 | 16 | 16 | 8 | 16 |
| Dropout | 0.22 | 0.28 | 0.35 | 0.05 | 0.36 | 0.01 |
| L2 regularization of convolutional layers | 0.199886 | 1.9E-05 | 0.000163 | 5E-06 | 0.256886 | 0.00443 |
| L2 regularization of fully connected layer | 4.8E-05 | 5.1E-05 | 0.000242 | 5E-06 | 1E-05 | 2.7E-05 |
| L2 regularization of fully connected layer projecting
the reader embeddings |
4E-06 | 0.291381 | 2E-06 | |||
| Learning rate for convolutional layers | 2.1E-05 | 0.000346 | 0.000474 | 0.000282 | 2E-05 | 3.6E-05 |
| Learning rate for fully connected layer | 0.002909 | 0.049604 | 0.00163 | 0.023335 | 2.6E-05 | 0.029704 |
| Learning rate for fully connected layer projecting the
reader embeddings |
0.000923 | 0.000141 | 0.020499 | |||
| Learning rate for reader embeddings | 0.001818 | 0.007738 | 0.009301 | |||
| Max L2-norm of reader Embeddings | 1 | 4 | 1 | |||
| Proportion of images with color brightness and
contrast augmentation |
0.2 | 0.5 | 0 | 0 | 0.5 | 1 |
| Proportion of training images with affine
transformation augmentation |
0.8 | 0.2 | 1 | 0.2 | 1 | 0.5 |