Table 3.
Stage | Output Sizes | Slow Pathway | Fast Pathway |
---|---|---|---|
raw clip | 64 × 24 × 224 | - | |
data layer | Slow: 8 × 224 × 224 | stride 8 × 1 × 1 | stride 2 × 1 × 1 |
Fast: 32 × 224 × 224 | |||
conv1 | Slow: 8 × 112 × 112 | 1 × 7 × 7, 64 | 5 × 7 × 7, 8 |
Fast: 32 × 112 × 112 | stride 1 × 2 × 2 | stride 1 × 2 × 2 | |
pool1 | Slow: 8 × 56 × 56 | 1 × 3 × 3, max | |
Fast: 32 × 56 ×56 | stride, 1 × 2 × 2 | ||
res2 | Slow: 8 × 56 × 56 | ||
Fast: 32 × 56 × 56 | |||
res3 | Slow: 8 × 28 × 28 | ||
Fast: 32 × 28 × 28 | |||
res4 | Slow: 8 × 14 × 14 | ||
Fast: 32 × 14 × 14 | |||
res5 | Slow: 8 × 7 × 7 | ||
Fast: 32 × 7 × 7 | |||
1 × 1 × 1 | Spatiotemporal pooling, concatenate fc layer with softmax |