Table 5.
Experiments of discussion on shape and texture joint learning.
| Backbone layers | Structure | Acc.↑ | Pre.↑ | Rec.↑ | F1↑ |
|---|---|---|---|---|---|
| 50 | Single-streama | 0.950 | 0.872 | 0.945 | 0.907 |
| Cascade Cls. and Seg.b | 0.950 | 0.886 | 0.928 | 0.907 | |
| Two-stream without joint learningc | 0.960 | 0.909 | 0.939 | 0.924 | |
| Two-stream with joint learningd | 0.967 | 0.904 | 0.977 | 0.939 | |
| 101 | Single-streama | 0.952 | 0.877 | 0.950 | 0.912 |
| Cascade Cls. and Seg.b | 0.955 | 0.888 | 0.945 | 0.916 | |
| Two-stream without joint learningc | 0.961 | 0.911 | 0.944 | 0.927 | |
| Two-stream with joint learningd | 0.971 | 0.916 | 0.978 | 0.946 |
aSingle-stream: only use the texture encoder in the proposed method for feature extraction.
bCascade Cls. and Seg.: cascading segmentation network in front of classification network.
cTwo-stream without joint learning: removing the feature decoder in the shape-biased stream of our method.
dTwo-stream with joint learning: the proposed framework.