. 2015 Jun 4;4:26. doi: 10.1186/s13742-015-0058-5

Table 3.

Timings for mapping and the ratio T _alignment / T _comm for HPC and Hadoop I clusters for Dataset S as a function of the number of nodes involved

Hadoop I			HPC random
Number of nodes (cores)	Mapping T_alignment	$\frac{T_{alignment}}{T_{comm}}$	Number of nodes (cores)	Mapping T_alignment time,minutes	$\frac{T_{alignment}}{T_{comm}}$
4(28)	293.5	1.71	4(64)	74.4	3.89
6(42)	189.8	1.62	10(160)	32.4	3.76
8(56)	136.0	1.62	14(224)	22.7	3.77
16(112)	70.3	1.48	18(288)	17.9	3.78
32(224)	39.3	1.66	22(352)	14.5	3.79
40(280)	32.5	1.65	26(416)	12.3	3.77
			30(480)	10.7	3.73
			34(544)	9.5	3.45
			38(608)	8.5	3.16
			42(672)	7.6	2.96
			46(736)	7.0	2.55
			50(800)	6.4	2.65
			54(864)	5.9	2.34
			58(928)	5.5	2.12

For the ‘HPC random’ approach, data chunks first have to be copied to the local node disks, and the alignments (SAM files) are copied back, while Hadoop keeps all of the data inside HDFS and, hence, does not need data staging. However, Hadoop needs to ingest the data to HDFS and preprocess the reads before the actual mapping stage so as to be able to operate in an MR manner, resulting in what we term ‘communication costs’. Note that each HPC node has 16 cores, while each Hadoop node has seven cores (the eighth core is dedicated to run the virtual machine).