Table 3.
CL model | ESS-MB | CL backbone | Pretext task |
Downstream ImageNet classification |
||
---|---|---|---|---|---|---|
Training loss ↓ | Training loss ↓ | Test loss ↓ | Test accuracy (%) ↑ | |||
SimCLR | – | ResNet-50 | 0.15 ± 0.01 | 4.88 ± 0.01 | 4.84 ± 0.01 | 16.81 ± 0.05 |
SimCLR | ✓ | ResNet-50 | 0.59 ± 0.01 | 4.70 ± 0.01 | 4.79 ± 0.01 | 17.71 ± 0.13∗ |
DCL | – | ResNet-50 | 3.75 ± 0.06 | 4.67 ± 0.01 | 4.71 ± 0.01 | 17.62 ± 0.11 |
DCL | ✓ | ResNet-50 | 3.86 ± 0.00 | 4.66 ± 0.01 | 4.69 ± 0.02 | 18.15 ± 0.10∗ |
CLSA | – | ResNet-50 | 11.44 ± 0.00 | 4.16 ± 0.03 | 4.06 ± 0.03 | 24.77 ± 0.33 |
CLSA | ✓ | ResNet-50 | 11.23 ± 0.00 | 3.89 ± 0.01 | 3.83 ± 0.01 | 27.77 ± 0.22∗ |
NNCLR | – | ResNet-18 | 3.39 ± 0.24 | 1,555 ± 8.26 | 7.03 ± 0.15 | 3.55 ± 0.03 |
MoCo v.2 | ✓ | ResNet-18 | 3.89 ± 0.10 | 5.75 ± 0.01 | 5.71 ± 0.01 | 7.96 ± 0.08∗ |
MoCo v.3 | – | ViT | 1.87 ± 0.01 | 4.58 ± 0.02 | 4.47 ± 0.02 | 19.27 ± 0.21 |
MoCo v.3 | ✓ | ViT | 2.11 ± 0.00 | 4.57 ± 0.01 | 4.46 ± 0.01 | 19.84 ± 0.13∗ |
CL stands for contrastive learning. The ✓ means ESS-MB is implemented on a specified contrastive learning model. We compare NNCLR with ESS-MB on MoCo v.2, as NNCLR’s different definition of positive pairs complicates the direct application of ESS-MB on NNCLR. The better downstream classification result for each model type is denoted with an asterisk.