Table 1.
An overview of the recent typical self-supervised learning methods
| Model | FoS | Framework | Encoder | Pseudo-labels | Loss | Negative samples |
|
|---|---|---|---|---|---|---|---|
| Source | Strategy | ||||||
| TCN embedding45 (2018) | CV | (d) | inception network + CNN | different but simultaneous viewpoints | triplet loss | images of different time | end to end |
| SimCLR15 (2020) | CV | (d) | ResNet | data augmentation | NT-Xent loss | other images | end to end |
| SimCLR v.2193 (2020, semi) | CV | (d) | variants of ResNet | data augmentation | NT-Xent loss | other images | end to end |
| MoCo29 (2020) | CV | (d) | ResNet | data augmentation | InfoNCE loss | other images | momentum |
| MoCo v.2194 (2020) | CV | (d) | ResNet | data augmentation | InfoNCE loss | other images | momentum |
| MoCo v.3168(2021) | CV | (d) | vision transformers | data augmentation | InfoNCE loss | other images | end to end |
| RotNet32 (2018) | CV | (a) | ConvNet | rotation directions | prediction loss | – | – |
| Colorization31(2017) | CV | (a) | AlexNet, VGG-16, ResNet-152 | color of missing patch | regression loss, KL divergence | – | – |
| DIM46(2018) | CV | (d) | – | – | JSD, DV, or InfoNCE loss | – | end to end |
| Word2Vec64 (2019) | NLP | (a) | auto-encoder | context words | prediction loss | – | – |
| BERT67 (2019) | NLP | (a) | MPC | masked words | prediction loss | – | – |
| ALBERT36(2020) | NLP | (a) | MPC | masked words, sentence order | prediction loss | – | – |
| BYOL54 (2020) | CV | (b) | ResNet | data augmentation | MSE loss | – | – |
| Barlow Twins55 (2021) | CV | (b) | ResNet | data augmentation | Equation 3 | – | – |
| SimSiam50 (2021) | CV | (b) | ResNet | data augmentation | negative cosine similarity | – | – |
| DeepCluster57(2018) | CV | (c) | AlexNet, VGG-16 | clustering centroids | negative log-softmax loss | – | – |
| Local Aggregation59 (2019) | CV | (c) | AlexNet, VGG-16 | soft-clustering centroids | negative log-softmax loss | – | – |
| SwAV60 (2020) | CV | (c) | variants of ResNet-50 | online-clustering centroids | modified cross-entropy | – | end to end |
| CPC42 (2018) | CV, audio NLP | (d) | APC | – | InfoNCE loss | other images | end to end |
| CPC v.271 (2020) | CV | (d) | APC | – | InfoNCE loss | other images | end to end |
Model, field of study (FoS), type of frameworks (referring to Figure 1), encoder, pseudo-labels, and loss, as well as source and strategy for the negative samples, are given. “Other images,” in the source column, indicates other images of the mini-batch.