The localization network predicts the coordinates of the center voxel and the width, the height, and the depth of the bounding box for each hemisphere. Each block in the figure listed the detailed parameters for each network layer. For the convolutional layer (red), parameters include: the number of convolutional filters, convolutional kernel size, stride number, and dilation sizes; the global average polling layer (green) doesn’t include learnable parameters; the fully connected layer have output a total of 12 parameters representing the coordination for the two bounding boxes for each given image (6 for each hemisphere).