a, Gloss values in the unsupervised networks well predict human gloss ratings for 50 novel rendered surfaces of random gloss levels. The scatter plot shows the decision value on a glossy-versus-matte SVM classifier for each image in the PixelVAE networks on the x axis (averaged over ten training instances) and human gloss ratings on the y axis (N = 20). The error bars indicate the s.e.m. over 20 observers (vertical) or 10 model training instances (horizontal). The bar plot shows the average error in predicting each individual observer’s ratings, after normalizing ratings and model predictions into the range 0–1, for the unsupervised model and for diverse alternative predictors (see the text and Methods for details). The vertical grey line indicates how well individual human ratings can be predicted from the data of other observers, giving the minimum possible model error. The data points show the values for each observer. RMSE, root mean square error. b, Moving along the gloss-discriminating axis in a network’s latent code as shown in c successfully modulates perceived gloss, both when adding gloss to initially matte surfaces (pale blue line; F4,76 = 649.82; P < 0.001; η2 = 0.97; 95% CI for r, 0.97–0.98) and when removing it from initially glossy surfaces (dark blue line; F4,76 = 244.11; P < 0.001; η2 = 0.93; 95% CI, 0.87–0.93). The dots show each observer’s gloss rankings of each of five modulation steps, averaged over four test sequences starting from low-gloss initial seed images and four starting from high-gloss seed images. The lines indicate the mean over 20 observers. c, Example gloss-modulated sequences. A glossy or matte rendered image (left) is input to a network, and its 10D latent representation is recorded. New images are generated from the model, conditioned first on the original (‘seed’) point in latent space, and after stepping by small amounts along the model’s gloss-discriminating axis (that is, the axis orthogonal to the decision plane of a glossy-versus-matte SVM classifier), in either the ‘matte’ (top) or ‘glossy’ (bottom) direction.