Convolutional neural network architecture. A, Hybrid 3D-contracting (bottom-up) and 2D-expanding (top-down) fully convolutional feature-pyramid network architecture used for the mask R-CNN backbone. The architecture incorporates both traditional 3 × 3 filters (blue) as well as bottleneck 1 × 1–3 × 3–1 × 1 modules (orange). The contracting arm is composed of 3D operations and convolutional kernels. Subsampling in the x- and y-directions is implemented via 1 × 2 × 2 strided convolutions (marked by s2). Subsampling in the z-direction is mediated by a 2 × 1 x 1 convolutional kernel with valid padding. The expanding arm is composed entirely of 2D operations. B, Connections between the contracting and expanding arms are facilitated by residual addition operations between corresponding layers. 3D layers in the contracting arm are mapped to 2D layers in the expanding arm by projection operations, which are designed both to match in the input (N) and output (1) z-dimension shape in addition to input (C) and output (128) feature map sizes. Ops indicates operations; Conv, convolutions; BN-ReLU, Batch Normalization Rectified Linear Unit; Proj-Res, Projection-Residual; Z, Z-axis; I, In plane axis; J, In plane axis.