Figure 2.

Mask R-CNN architecture. A hybrid 3D-contracting and 2D-expanding fully convolutional feature-pyramid network is used as the backbone. The architecture incorporates the traditional 3 × 3filters and the bottleneck 1 × 1–3 × 3–1 × 1 modules (left block). The number of input channel is 3, using the pre-contrast image and the subtraction images of the left and right breasts to utilize symmetry.