Table 1.
Method | Description |
AutoEncoder | First embed the input images to lower-dimension space and then use the embedding to reconstruct the input |
PredNet | Predict the next frame as well some of the network responses to the next frame using previous frames |
CPC | Predict the embedding of one image crop using the embeddings of its spatial neighbors |
Depth prediction | Predict the per-pixel relative depth image from the corresponding RGB image |
Relative position | Predict the relative position of two image crops sampled from a image grid |
Colorization | Predict the down-sampled color information from the grayscale image |
Deep cluster | Embed all images into a lower-dimension space and then use unsupervised clustering results on these embeddings |
as “category” labels to train the networks | |
CMC | Embed grayscale and color information of one image into two embedding spaces and push together two corresponding |
embeddings while separating them from all of the other embeddings | |
Instance recognition | Make the embedding of one image unchanged under data augmentations while separating it from the embeddings of |
all of the other images | |
SimCLR | Aggregate the embeddings of two data-augmented crops from one image while separating them from the embeddings |
of other images in one large batch | |
Local aggregation | Aggregate the embeddings of one image to its close neighbors in the embedding space while separating them from |
further neighbors |
CPC represents contrastive predictive coding (33).