Architecture of 3D Mask RCNN. The architecture illustrates the key components of the 3D Mask R-CNN model with the ConvNeXt-V2 backbone. Input CT images are processed through the ConvNeXt-V2 backbone, extracting detailed feature maps. These maps are passed through the Region Proposal Network (RPN) to generate object proposals. The RoI Align layer ensures accurate spatial alignment of features, which are then fed into two branches: the segmentation branch, which generates binary masks to localize tumor regions, and the classification branch, which predicts the class (e.g., lung cancer metastasis) and refines the bounding box for precise localization.