TABLE III.
Training performance (ms/step) for water, Cu, HEA, OC2M, dipeptides, and SPICE systems. “FP64” means double floating precision, “FP32” means single floating precision, and “FP64c” and “FP32c” mean the compressed training109 for double and single floating precision, respectively. “EPYC” performed on 128 AMD EPYC 7742 cores, “3080 Ti” performed on an NVIDIA GeForce RTX 3080 Ti card, “V100” performed on an NVIDIA Tesla V100 card, “A100” performed on an NVIDIA Tesla A100 card, and “MI250” performed on an AMD Instinct MI250 Graphics Compute Die (GCD).
loc_frame | se_e2_a | se_e2_a+se_e2_r | se_e2_a+se_e3 | se_atten | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
System | Hardware | FP64 | FP32 | FP64 | FP32 | FP64c | FP32c | FP64 | FP32 | FP64c | FP32c | FP64 | FP32 | FP64c | FP32c | FP64 | FP32 |
Water | EPYC | 14.7 | 9.20 | 97.3 | 45.0 | 28.4 | 16.2 | 63.7 | 32.5 | 29.9 | 15.4 | 141 | 85.2 | 34.0 | 20.6 | 1210 | 383 |
3080 Ti | 7.00 | 4.80 | 24.6 | 10.3 | 9.70 | 6.40 | 26.3 | 11.6 | 12.0 | 8.20 | 52.8 | 17.2 | 16.3 | 6.80 | 199 | 26.9 | |
V100 | 7.90 | 8.50 | 11.1 | 8.20 | 5.90 | 4.80 | 13.6 | 10.9 | 6.90 | 6.40 | 23.5 | 14.0 | 8.60 | 7.30 | 69.6 | 31.7 | |
A100 | 10.7 | 10.0 | 8.20 | 9.30 | 4.90 | 5.70 | 14.5 | 10.8 | 7.80 | 6.30 | 24.5 | 12.0 | 7.50 | 7.20 | 30.8 | 21.2 | |
MI250 | 11.7 | 10.9 | 20.3 | 13.1 | 7.70 | 7.00 | 27.3 | 19.7 | 11.5 | 10.9 | 278 | 27.7 | 12.8 | 11.2 | 125 | 31.7 | |
Cu | EPYC | 4.90 | 3.30 | 33.7 | 12.8 | 8.00 | 5.40 | 19.9 | 10.0 | 10.5 | 5.30 | 45.5 | 24.2 | 9.10 | 6.50 | 226 | 89.1 |
3080 Ti | 3.20 | 2.20 | 6.50 | 5.10 | 4.60 | 3.90 | 8.70 | 6.30 | 5.90 | 3.40 | 11.8 | 4.80 | 7.20 | 5.70 | 36.8 | 8.80 | |
V100 | 3.20 | 3.80 | 4.20 | 4.80 | 3.20 | 3.70 | 6.50 | 5.30 | 5.50 | 4.10 | 7.90 | 5.60 | 6.00 | 5.80 | 15.6 | 11.9 | |
A100 | 4.00 | 3.90 | 3.80 | 3.70 | 3.10 | 3.00 | 5.40 | 5.30 | 4.10 | 4.10 | 8.00 | 5.60 | 4.80 | 4.60 | 11.6 | 11.2 | |
MI250 | 4.80 | 4.90 | 6.90 | 6.40 | 5.10 | 5.00 | 9.10 | 9.40 | 7.40 | 7.00 | 49.9 | 10.1 | 8.00 | 7.30 | 23.6 | 18.6 | |
HEA | EPYC | ⋯ | ⋯ | 53.4 | 30.5 | 19.4 | 12.2 | 52.3 | 29.3 | 27.7 | 16.7 | 83.7 | 51.1 | 26.6 | 15.7 | 159 | 60.1 |
3080 Ti | ⋯ | ⋯ | 38.4 | 25.2 | 11.2 | 9.10 | 71.4 | 41.8 | 16.3 | 12.7 | 93.6 | 41.0 | 19.7 | 15.0 | 35.9 | 9.10 | |
V100 | ⋯ | ⋯ | 33.2 | 29.8 | 11.8 | 11.1 | 63.2 | 47.4 | 17.5 | 16.5 | 65.5 | 49.6 | 27.4 | 18.7 | 15.6 | 11.9 | |
A100 | ⋯ | ⋯ | 30.5 | 28.6 | 10.9 | 10.4 | 51.6 | 67.4 | 16.9 | 21.2 | 61.7 | 52.9 | 18.6 | 18.8 | 11.7 | 11.5 | |
MI250 | ⋯ | ⋯ | 48.8 | 42.7 | 18.5 | 18.0 | 72.3 | 69.3 | 28.7 | 27.3 | 134 | 88.4 | 32.7 | 32.3 | 21.6 | 19.5 | |
OC2M | EPYC | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 2070 | 625 |
3080 Ti | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 352 | 46.0 | |
V100 | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 120 | 52.8 | |
A100 | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 51.4 | 30.9 | |
MI250 | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 171 | 55.7 | |
Dipeptides | EPYC | ⋯ | ⋯ | 49.7 | 30.5 | 21.2 | 19.4 | 52.0 | 35.3 | 30.1 | 21.2 | 89.5 | 61.1 | 35.0 | 21.2 | 214 | 91.5 |
3080 Ti | ⋯ | ⋯ | 54.8 | 39.5 | 17.3 | 11.3 | 90.0 | 64.3 | 19.0 | 15.3 | 131 | 67.7 | 25.4 | 19.2 | 26.1 | 12.0 | |
V100 | ⋯ | ⋯ | 54.1 | 52.6 | 14.8 | 14.8 | 88.0 | 84.3 | 20.5 | 21.7 | 96.2 | 103 | 30.1 | 30.8 | 14.3 | 10.6 | |
A100 | ⋯ | ⋯ | 50.2 | 50.8 | 14.3 | 14.3 | 89.0 | 75.9 | 20.7 | 19.9 | 91.1 | 82.7 | 26.6 | 26.7 | 13.2 | 11.1 | |
MI250 | ⋯ | ⋯ | 66.2 | 67.8 | 23.1 | 22.9 | 117 | 112 | 35.0 | 32.4 | 155 | 129 | 45.9 | 44.9 | 19.6 | 16.8 | |
SPICE | EPYC | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 244 | 98.0 |
3080 Ti | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 35.4 | 15.3 | |
V100 | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 17.3 | 15.9 | |
A100 | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 11.9 | 12.2 | |
MI250 | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 29.0 | 24.1 |