Fig. 5.
Architecture of the segmentation head in the 3D EfficientDet network. The - output features from the EfficientNetB0 backbone (see Fig. 2) are first convolved to a common channel dimension of 48 with filters (in white), then iteratively added with outputs from lower levels by fast normalized feature fusion (Eq. 6). The upscaling (purple blocks) is done by a nearest-neighbor resize followed by a anti-aliasing convolution. All convolutions are performed depth-wise and are followed by batch norm and a swish activation function, apart from the last block, which is a single convolution with a kernel and a sigmoid activation function