Skip to main content
. 2020 Jul 21;16(7):e1008017. doi: 10.1371/journal.pcbi.1008017

Fig 6.

Fig 6

(a) Ideal representations: After training, we expected the primary capsules to detect single shapes of different types (here: squares, circles and verniers), and secondary capsules to group these shapes into groups of 1, 3, or 5. If 3 squares are presented, the primary square capsules detect squares at 3 locations. Through routing by agreement, the secondary squares capsule detects this group of 3 squares. If 5 circles are presented, the primary circle capsules detect circles at 5 locations. After routing, the secondary circles capsule represents a group of 5 circles. If a vernier is presented, it is detected and routed to the secondary vernier capsule. (b) Network architecture: We used capsule networks with two convolutional layers, one primary capsule layer with m primary capsule types and n primary capsule dimensions, and one secondary capsule layer with m secondary capsule types. The primary capsules were implemented by a third convolution layer, reshaped into m primary capsule types outputting n-dimensional activation vectors (this is the standard way to implement primary capsules). In this example, there are seven primary and secondary capsules types to match the seven shape types used in experiment 1 (see caption a). The primary and secondary capsule layers communicate via routing-by-agreement. Small fully connected decoders are applied to the secondary capsules to compute different loss terms (see main text). The network is trained end-to-end through backpropagation to minimize all losses simultaneously. We used different network hyperparameters in the main text results and in S1 Appendix to ensure that our results are stable against hyperparameter changes (see Table A in S1 Appendix for all hyperparameter values).