Table 3.
Performances when using different backbone networks on the Nyuv2 test set. VGG16 and Resnet50 were reproduced and trained from scratch to ensure the consistency of the encoder structure. The pyramid pooling module is denoted as PPM, which was added before the decoder. The epochs were set to 250.
| Models | Global acc (%) | Mean IoU (%) | Mean acc (%) |
|---|---|---|---|
| Depth (VGG16)-RGB (VGG16) | 61.1 | 25.7 | 35.1 |
| Depth (ResNet50)-RGB (VGG16) | 59.7 | 24.4 | 33.2 |
| Depth (VGG16)-RGB (ResNet50) | 61.7 | 26.1 | 35.7 |
| Depth (ResNet50)-RGB (ResNet50) | 60.6 | 25.1 | 34.3 |
| Depth (VGG16)-RGB(VGG16)-PPM | 61.7 | 29.8 | 41.6 |