Figure 1.
Architecture of the modified UNet used for simultaneous outcome prediction and segmentation. A sigmoid activation (dark green) is applied to the output of the UNet decoder to obtain segmentation predictions. For outcome prediction, convolutional layers (orange) with 32 filters of kernel size one are applied to the resulting feature maps of each encoder block, followed by a global average pooling (yellow), which reduces the spatial dimensions to a single number before being flattened (light green) into 32-dimensional vectors and concatenated. In the multi-outcome setting, the resulting 128-dimensional feature vector is passed in parallel to the two fully connected layers (purple) for the prediction of the log-hazard of the Cox model using a tanh activation and for the prediction of the conditional survival probabilities for the Gensheimer model, implemented using a sigmoid activation function. Optionally, a DenseNet comprised of three DenseNet blocks could be integrated into the model as well, which takes the image volume and the predicted segmentation mask as input. Additional features are then extracted from each DenseNet block by applying a convolutional layer with 32 filters and a ReLU activation function, before being pooled, flattened, and concatenated to the UNet encoder features.
