Table 3.
[19] | [20] | [21] | [31] | This Paper | |
---|---|---|---|---|---|
Platform | Arria10 | ZU2 | Arria10 | Virtex7 | XC7Z020 |
CNN Model | MobileNetV2 | ||||
Frequency | 133 MHz | 430 MHz | 200 MHz | 150 MHz | 150 MHz |
DSP Usage | 1278 | 212 | 1220 | 2160 | 176 (248 1) |
On-Chip Memory Usage | 1844 M20K (3.07 MB) | 145 BRAM | 15.3 Mb (1.91 MB) | 941.5 BRAM | 119.5 BRAM (524.25 KB) |
Speed | 266.2 FPS | 205.3 FPS | 1050 FPS (Throughput) | 302.3 FPS | 70.94 FPS |
FPS/MHz per DSP per KB () | 0.50 | 3.54 | 2.20 | 0.23 | 5.13 (3.64 1) |
1 As mentioned above, we use LUTs to conduct the multiplications in DWC because of insufficient resources. This will cause unfair results. Here, we assume every multiplication is conducted by DSPs and calculate the relevant results.