Table 3.
Performance of cuDNN SGEMM versus that of the 3D WMFA on 3D convolution layers. Performance is measured in effective TFLOPS.
| Layer | C × D × H × W × N | K | TFLOPS | Speedup | |
|---|---|---|---|---|---|
| cuDNN SGEMM | 3D WMFA | ||||
| conv2 | 32 × 16 × 56 × 56 × 32 | 64 | 1.21 | 1.28 | 1.05 |
| conv3 | 64 × 8 × 28 × 28 × 32 | 256 | 2.38 | 3.31 | 1.39 |
| conv4 | 256 × 4 × 14 × 14 × 32 | 256 | 2.4 | 4.72 | 1.96 |
| conv5 | 256 × 2 × 7 × 7 × 32 | 256 | 1.46 | 2.1 | 1.44 |