Skip to main content
. 2023 Feb 11;23:32. doi: 10.1186/s12880-023-00974-y

Fig. 5.

Fig. 5

Architecture of the segmentation head in the 3D EfficientDet network. The P3-P7 output features from the EfficientNetB0 backbone (see Fig. 2) are first convolved to a common channel dimension of 48 with 1×1×1 filters (in white), then iteratively added with outputs from lower levels by fast normalized feature fusion (Eq. 6). The upscaling (purple blocks) is done by a nearest-neighbor resize followed by a 3×3×1 anti-aliasing convolution. All convolutions are performed depth-wise and are followed by batch norm and a swish activation function, apart from the last block, which is a single convolution with a 1×1×1 kernel and a sigmoid activation function