Table 3. Multi-GPU Throughput Timings (ns/day) for AMBER GB Simulations with a Time Step of 2 fs Using the Parallel CPU Version (12 Intel X5670 Cores or 32 AMD Opteron 3136 Cores on Each Node) and the Parallel GPU Version with the SPDP Precision Model (One Intel X5670 Core and One GPU Per Node)a.
CPU/GPU | apo-myoglobin (2,492 atoms) | nucleosome (25,095 atoms) |
---|---|---|
GPU version | ||
8 × M2090 | 135.1 | 3.95 |
4 × M2090 | 115.0 | 2.71 |
2 × M2090 | 93.1 | 1.80 |
1 × M2090 | 78.1 | 1.42 |
CPU version | ||
2048 × Opteron 3136 | – | 0.53 |
1024 × Opteron 3136 | – | 0.78 |
512 × Opteron 3136 | – | 0.65 |
256 × Opteron 3136 | – | 0.55 |
128 × Opteron 3136 | 29.8 | 0.31 |
64 × Opteron 3136 | 18.3 | 0.17 |
32 × Opteron 3136 | 10.3 | 0.08 |
12 × X5670 | 6.6 | 0.07 |
For details on the hardware and software stack, see the text. A dash indicates lower speed than with less nodes.