Figure - PMC

Skip to main content

View full-text article in PMC

. Author manuscript; available in PMC: 2018 Jan 9.

Published in final edited form as: Curr Biol. 2016 Dec 1;27(1):62–67. doi: 10.1016/j.cub.2016.10.015

Inputs are first encoded in a V1-like model. Its first layer (simple cells) corresponds to the S1 layer of the HMAX model. Its second layer (complex cells) corresponds to the C1 layer of HMAX [14]. In the view-based model, the V1-like encoding is then projected onto stored frames g_iw^k at orientation i, from videos of transforming faces k = 1,…,K. Finally, the last layer is computed as μ^k = Σ_iη(〈x, g_iw^k〉). That is, the kth element of the output is computed by summing over all responses to cells tuned to views of the kth template face. In the PCA-model, the V1-like encoding is instead projected onto templates $w_{i}^{k}$ describing the ith PC of the kth template face’s transformation video. The pooling in the final layer is then over all the PCs derived from the same identity. That is, it is computed as $μ^{k} = \sum_{i} η (〈 x, w_{i}^{k} 〉)$ . In both the view-based and PCA models, units in the output layer pool over all the units in the previous layer corresponding to projections onto the same template individual’s views (view-based model) or PCs (PCA-model).