. 2015 Aug 4;36(26):1990–2008. doi: 10.1002/jcc.24030

Table 11.

Scaling of the MEM benchmark on different node types with performance P and parallel efficiency E.

No. of nodes	Processor(s) Intel	GPUs, IB	DD		Grid	N _PME/node	N _th		DLB	P (ns/d)	E
			x	y	z
1	E3‐1270v2	770,	1	1	1	–	8	(ht)	–	20.5	1
2	(4 cores)	QDR^[a]	2	1	1	–	8	(ht)	(✓)	27.2	0.66
4			4	1	1	–	8	(ht)	(✓)	22.1	0.27
8			8	1	1	–	8	(ht)	(✓)	68.3	0.42
16			16	1	1	–	8	(ht)	(✓)	85.7	0.26
32			8	4	1	–	8	(ht)	(✓)	119	0.18
1	E5‐1620	680,	1	1	1	–	8	ht	–	21	1
2	(4 cores)	QDR	2	1	1	–	8	ht	(✓)	29	0.69
4			4	1	1	–	8	ht	(✓)	46.9	0.56
1	E5‐2670v2	780Ti×2,	10	1	1	–	4	ht	✓	56.9	1
2	(2×10 cores)	QDR	4	5	1	–	2		✓	74.2	0.65
4			8	1	1	2	5		✗	103.4	0.45
8			8	1	2	2	5		✗	119.1	0.26
16			8	4	1	2	5		✗	164.8	0.18
32			8	8	1	2	5		✗	193.1	0.11
1	E5‐2670v2	980×2,	10	1	1	–	4	ht	(✓)	58	1
2	(2×10 cores)	QDR	4	5	1	–	2		(✓)	75.6	0.65
4			8	5	1	–	2		✗	96.6	0.42
1	E5‐2680v2	–	8	2	2	8	1	ht	✓	26.8	1
2	(2×10 cores)	FDR‐14,	4	5	3	10	1	ht	✓	42	0.78
4			8	5	3	10	1	ht	✓	76.3	0.71
8			8	7	2	6	2	ht	✓	122	0.57
16			8	8	4	4	1		✓	162	0.38
32			8	8	8	4	1		✓	209	0.24
64			10	8	6	2.5	2		✓	240	0.14
1	E5‐2680v2	K20X×2	8	1	1	–	5	ht	✓	55.2	1
2	(2×10 cores)	(732 MHz),	4	5	1	–	4	ht	✓	74.5	0.67
4		FDR‐14	8	1	2	–	5		✓	118	0.53
8			8	1	2	2	5		✗	163	0.37
16			8	4	1	2	5		✗	226	0.26
32			8	8	1	2	5		✗	304	0.17

[a] A black “ht” symbol indicates that using all hyperthreading cores resulted in the fastest execution, otherwise using only the physical core count was more advantageous. A gray “(ht)” denotes that this benchmark was done only with the hyperthreading core count (=2 × physical).

Note: These nodes cannot use the full QDR IB bandwidth due to insufficient number of PCIe lanes, see “Strong Scaling” section.