Table 3. Access list examples for a configuration with one CPU and two GPUs (three memory nodes in total).
(A) Distance matrix from Eq. (12). | |||
---|---|---|---|
CPU | GPU-0 | GPU-1 | |
CPU | 0 | 0.5 | 1 |
GPU-0 | 0.5 | 0 | 1 |
GPU-1 | 0.5 | 1 | 0 |
Note:
We use four buckets, but the tasks of bucket zero are only active on CPU. The priorities—the order of access to the buckets—is reversed for the GPU workers. S, the size of closed memory node subgroup, is set to two for the CPU and to one for the GPUs. Finally, the locality factor l is two for both.