Skip to main content
. 2016 Feb 11;6:20410. doi: 10.1038/srep20410

Figure 2. Overall architecture of the model.

Figure 2

After saliency detection, a 227 × 227 crop of the localized image is presented as the input. It is convolved in the first convolutional layer (Conv1) with local receptive fields, using a convolutional stride of fixed step. The results are then represented in vector form through other 4 convolutional layers (Conv2–5) which are with 3 max pooling layers, and two fully connected layers (FC6, FC7). The final layer is a 12-way softmax function for classification. Original image courtesy of Junfeng Gao.