. 2021 Jan 11;118(3):e2014196118. doi: 10.1073/pnas.2014196118

Table 1.

Short descriptions of optimization goals of unsupervised learning tasks

Method	Description
AutoEncoder	First embed the input images to lower-dimension space and then use the embedding to reconstruct the input
PredNet	Predict the next frame as well some of the network responses to the next frame using previous frames
CPC	Predict the embedding of one image crop using the embeddings of its spatial neighbors
Depth prediction	Predict the per-pixel relative depth image from the corresponding RGB image
Relative position	Predict the relative position of two image crops sampled from a $2 \times 2$ image grid
Colorization	Predict the down-sampled color information from the grayscale image
Deep cluster	Embed all images into a lower-dimension space and then use unsupervised clustering results on these embeddings
	as “category” labels to train the networks
CMC	Embed grayscale and color information of one image into two embedding spaces and push together two corresponding
	embeddings while separating them from all of the other embeddings
Instance recognition	Make the embedding of one image unchanged under data augmentations while separating it from the embeddings of
	all of the other images
SimCLR	Aggregate the embeddings of two data-augmented crops from one image while separating them from the embeddings
	of other images in one large batch
Local aggregation	Aggregate the embeddings of one image to its close neighbors in the embedding space while separating them from
	further neighbors

CPC represents contrastive predictive coding (33).