Skip to main content
. 2018 Mar 12;14:1176934318760543. doi: 10.1177/1176934318760543

Table 2.

Profiling results of the intertask implementations on synthetic data set 8.

Performance limitation Global memory bandwidth, GB/s Registers per thread Shared memory per block, bytes Theoretical occupancy, % Achieved occupancy, %
Naive Memory bandwidth 207 48 0 62.5 62.1
Tile = 2 Memory bandwidth 201 72 4096 43.8 43.3
Tile = 4 Memory bandwidth 193 73 8192 37.5 37.1
Tile = 6 Instruction and memory latency 154 104 12 288 25 24.9
Tile = 8 Instruction and memory latency 101 142 16 384 18.8 18.2