Figure 2.
The U-Net architecture used for both inpainting and segmentation, which includes layers grouped into three categories: the “encoder” (in red), the “decoder” (in blue), and the “post-processing” layer (the final convolutional layer). Each dotted rectangular box represents a feature map from the encoder that was concatenated to the first feature map in the decoder at the same level.