. 2019 Apr 23;9:6381. doi: 10.1038/s41598-019-42294-8

Table 1.

Architecture of the original, off-the-shelf, and fine-tuned ResNet-50.

Layer name	Output size	Original 50-layer	Off-the-shelf	Fine-tuned
conv1	112 × 112	7 × 7, 64-d, stride 2	same	fine-tuned
pooling1	56 × 56	3 × 3, 64-d, max pool, stride 2	same	same
conv2_x	56 × 56	$[\begin{matrix} 1 \times 1, 64 ‐ d, stride1 \\ 3 \times 3, 64 ‐ d, stride1 \\ 1 \times 1, 256 ‐ d, stride1 \end{matrix}] \times 3$	same	fine-tuned
conv3_0	28 × 28	$[\begin{matrix} 1 \times 1, 128 ‐ d, stride2 \\ 3 \times 3, 128 ‐ d, stride1 \\ 1 \times 1, 512 ‐ d, stride1 \end{matrix}]$	same	fine-tuned
conv3_x	28 × 28	$[\begin{matrix} 1 \times 1, 128 ‐ d, stride1 \\ 3 \times 3, 128 ‐ d, stride1 \\ 1 \times 1, 512 ‐ d, stride1 \end{matrix}] \times 3$	same	fine-tuned
conv4_0	14 × 14	$[\begin{matrix} 1 \times 1, 256 ‐ d, stride2 \\ 3 \times 3, 256 ‐ d, stride1 \\ 1 \times 1, 1024 ‐ d, stride1 \end{matrix}]$	same	fine-tuned
conv4_x	14 × 14	$[\begin{matrix} 1 \times 1, 256 ‐ d, stride1 \\ 3 \times 3, 256 ‐ d, stride1 \\ 1 \times 1, 1024 ‐ d, stride1 \end{matrix}] \times 5$	same	fine-tuned
conv5_0	7 × 7	$[\begin{matrix} 1 \times 1, 512 ‐ d, stride2 \\ 3 \times 3, 512 ‐ d, stride1 \\ 1 \times 1, 2048 ‐ d, stride1 \end{matrix}]$	same	fine-tuned
conv5_x	7 × 7	$[\begin{matrix} 1 \times 1, 512 ‐ d, stride1 \\ 3 \times 3, 512 ‐ d, stride1 \\ 1 \times 1, 2048 ‐ d, stride1 \end{matrix}] \times 2$	same	fine-tuned
pooling2	1 × 1	7 × 7, 2048-d, average pool, stride 1	same	same
dense	1 × 1	1000-d, dense-layer	15-d, dense-layer
loss	1 × 1	1000-d, softmax	15-d, sigmoid, BCE

In our experiments, we use the ResNet-50 architecture and this table shows differences between the original architecture and ours (off-the-shelf and fine-tuned ResNet-50). If there is no difference to the original network, the word “same” is written in the table. The violet and bold text emphasizes, which parts of the network are changed for our application. All layers do employ automatic padding (i.e. depending on the kernel size) to keep spatial size the same. The conv3_0, conv4_0, and conv5_0 layers perform a down-sampling of the spatial size with a stride of 2.