Skip to main content
. 2023 May 31;12:e81499. doi: 10.7554/eLife.81499

Figure 4. Low-dimensional embedding of network layers reveals structure.

(A) Similarity in representations (centered kernel alignment [CKA]) between the trained and the untrained models for each of the five instantiations of the best-performing spatial-temporal models (left and center). CKA between models trained on recognition vs. decoding (right). (B) t-Distributed stochastic neighbor embedding (t-SNE) for each layer of one instantiation of the best-performing spatial-temporal model trained on both tasks. Each data point is a random stimulus sample (N=2000, 50 per stimulus). (C) Representational dissimilarity matrices (RDMs). Character level representation is visualized using percentile RDMs for proprioceptive inputs (left) and final layer features (right) of one instantiation of the best-performing spatio-temporal model trained on the recognition task. (D) Similarity in stimulus representations between RDMs of an Oracle (ideal observer) and each layer for the five instantiations of the action recognition task (ART)-trained models and their untrained counterparts. (E) Decoding error (in cm) along the hierarchy for each model type on the trajectory decoding task. (F) Centered kernel alignment (CKA) between models trained on recognition vs. decoding for the five instantiations of all network types (right).

Figure 4.

Figure 4—figure supplement 1. Extended analysis of network models.

Figure 4—figure supplement 1.

(A) t-Distributed stochastic neighbor embedding (t-SNE) for each layer of the best spatiotemporal and long short-term memory (LSTM) model. Each data point is a random stimulus sample (N=4000, 200 per character). (B) Representational dissimilarity matrices (RDMs) of an ideal observer ‘Oracle’, which by definition has low dissimilarity for different samples of the same character and high dissimilarity for different samples of different characters. Character level representation are calculated through percentile RDMs for proprioceptive inputs and final layer features of one instantiation of the best-performing spatiotemporal and LSTM model trained on recognition task. (C) Centered kernel alignment (CKA) between models trained on recognition vs decoding for all network types (N=50 per network type).