Skip to main content
. 2018 May 10;18(5):1506. doi: 10.3390/s18051506

Table 1.

Layer disposal of our proposed network. “Out-F”: Number of Feature maps at the layer’s Output, “Out-Res”: Output Resolution for input size of 640 × 480 RGB images from the smart glasses, “C”: number of semantic prediction Classes. Encoder: Layers 1–16, decoder: Layers 17–19.

Layer Type Out-F Out-Res
0 Scaling 640 × 480 3 320 × 240
1 Down-sampler block 16 160 × 120
2 Down-sampler block 64 80 × 60
3–7 5 × Non-bt-1D 64 80 × 60
8 Down-sampler block 128 40 × 30
9 Non-bt-1D (dilated 2) 128 40 × 30
10 Non-bt-1D (dilated 4) 128 40 × 30
11 Non-bt-1D (dilated 8) 128 40 × 30
12 Non-bt-1D (dilated 16) 128 40 × 30
13 Non-bt-1D (dilated 2) 128 40 × 30
14 Non-bt-1D (dilated 4) 128 40 × 30
15 Non-bt-1D (dilated 8) 128 40 × 30
16 Non-bt-1D (dilated 2) 128 40 × 30
17a Original feature map 128 40 × 30
17b Pooling and convolution 32 40 × 30
17c Pooling and convolution 32 20 × 15
17d Pooling and convolution 32 10 × 8
17e Pooling and convolution 32 5 × 4
17 Up-sampler and concatenation 256 40 × 30
18 Convolution C 40 × 30
19 Up-sampler C 640 × 480