. 2022 Jan 26;32(7):4749–4759. doi: 10.1007/s00330-021-08532-2

Table 2.

Layer-by-layer description of the CNNs used in the two ensemble models SEG and noSEG

Name	Layer	Filter kernel (shape, count)		Output size
Name	Layer	Main branch	Shortcut	noSEG	SEG
in	Input	-		50 × 50 × 50 × 1	50 × 50 × 50 × 2
res1a	3D convolution	3 × 3 × 3, 16	3 × 3 × 3, 1	25 × 25 × 25 × 16
res1b	3D convolution	3 × 3 × 3, 16	id	25 × 25 × 25 × 16
add1	Add	-		25 × 25 × 25 × 16
res2a	3D convolution	3 × 3 × 3, 32	3 × 3 × 3, 1	13 × 13 × 13 × 32
res2b	3D convolution	3 × 3 × 3, 32	id	13 × 13 × 13 × 32
add2	Add	-		13 × 13 × 13 × 32
res3a	3D convolution	3 × 3 × 3, 64	3 × 3 × 3, 1	7 × 7 × 7 × 64
res3b	3D convolution	3 × 3 × 3, 64	id	7 × 7 × 7 × 64
add3	Add	-		7 × 7 × 7 × 64
pool	Global average pooling	-		64
drop	Dropout	-		64
out	Fully connected layer	-		1

The convolutional part of each network (up to layer “add3”) consisted of a main branch, containing three-dimensional convolutions, and a shortcut branch, containing either a single convolution kernel for downscaling or an identity mapping (“id”). At each add layer (“add1”, “add2”, “add3”), the main branch and the shortcut branch were added. After add1 and add2, the images were split up again into main and shortcut branches