Skip to main content
. 2021 Jan 11;118(3):e2014196118. doi: 10.1073/pnas.2014196118

Table 1.

Short descriptions of optimization goals of unsupervised learning tasks

Method Description
AutoEncoder First embed the input images to lower-dimension space and then use the embedding to reconstruct the input
PredNet Predict the next frame as well some of the network responses to the next frame using previous frames
CPC Predict the embedding of one image crop using the embeddings of its spatial neighbors
Depth prediction Predict the per-pixel relative depth image from the corresponding RGB image
Relative position Predict the relative position of two image crops sampled from a 2×2 image grid
Colorization Predict the down-sampled color information from the grayscale image
Deep cluster Embed all images into a lower-dimension space and then use unsupervised clustering results on these embeddings
as “category” labels to train the networks
CMC Embed grayscale and color information of one image into two embedding spaces and push together two corresponding
embeddings while separating them from all of the other embeddings
Instance recognition Make the embedding of one image unchanged under data augmentations while separating it from the embeddings of
all of the other images
SimCLR Aggregate the embeddings of two data-augmented crops from one image while separating them from the embeddings
of other images in one large batch
Local aggregation Aggregate the embeddings of one image to its close neighbors in the embedding space while separating them from
further neighbors

CPC represents contrastive predictive coding (33).