All illustrative architectures in this figure are based on a greatly simplified model of the cortex that consists of three regions (red, blue, green), each of which contains three pixels. Each AN in the first layer of every architecture corresponds to a cortical pixel—9 for single image input architectures (1 and 4) and 18 for dual image inputs architectures (2,3,5,6 and 7)—of which 9 are within the first (‘T1’) input image and 9 are within the second (‘T2’) input image. The weights between the input layer and first hidden layer of every architecture are initialized by previously trained RBMs (i.e. the pre-training)—‘R1’ for single image input architectures (1 and 4) and ‘R2’ for dual image input architectures (2,3,5 and 6). The first hidden layers of the single image input architectures (1 and 4) consist of 3 ANs, each of which is connected only to correspondingly colored ANs in their input layers. The first hidden layers of the dual image input architectures (2,3,5 and 6) consist of two sets of ANs, which connect only to their correspondingly colored ANs in the input layers, originating in the first (‘T1’) and second (‘T2’) input images, respectively. The final layer of every architecture consists of a single AN that is trained to activate when the input image lies within an avalanche. (Architecture 1) A schematic of the first and simplest architecture, in which the signals are not spatially mixed before layer 3. (Architecture 2) A schematic of the second ANN architecture, in which the signals are not spatially or temporally mixed before layer 3. (Architecture 3) A schematic of the third ANN architecture, in which the third layer consists of two ANs, the first and second of which are connected only to the ANs in the second layer that originate from ‘T1’ and ‘T2’, respectively. The signals are not spatially mixed before layer 3, and not temporally mixed before layer 4. (Architecture 4) A schematic of the fourth ANN architecture, in which the third layer consists of 3 ANs, each of which connects to a unique pair of ANs in the second layer. The signals become spatially mixed across pairs of regions (‘s(p)’) between layers two and then fully spatially mixed (‘s’) between layers three and four. (Architecture 5) A schematic of the fifth ANN architecture, in which the third layer consists of 6 ANs, the first and second half of which connect to a unique pair of ANs in the second layer that originates in ‘T1’ and ‘T2’, respectively. The signals become mixed spatially across pairs of cortical regions between the second and third layer (‘s(p)’) and then fully spatially and temporally mixed (‘s+t’) between layers three and four. (Architecture 6) A schematic of the sixth ANN architecture, in which the third layer consists of 6 ANs, the first and second half of which each connects to a unique pair of ANs in the second layer that originates in ‘T1’ and ‘T2’, respectively. The fourth layer consists of two ANs, the first and second of which are connected to the ANs in the third layer that originate in ‘T1’ and ‘T2’, respectively. The signals become mixed spatially across pairs of regions (‘s(p)’) between the second and third layer, fully spatially mixed (‘s’) between the third and fourth layer, and temporally mixed (‘t’) between the fourth and fifth layer. (Architecture 7) A schematic of the seventh ANN architecture, in which the third layer consists of 9 ANs, each of which connects to two ANs in the second layer, the first of which originates within ‘T1’ and the second within ‘T2’. The signals become mixed spatially across pairs of cortical regions as well as temporally (‘s(p)+t’) between the second and third layer and then fully spatially mixed (‘s’) between the third and fourth layer.