Skip to main content
. 2015 Jun 4;4:26. doi: 10.1186/s13742-015-0058-5

Table 3.

Timings for mapping and the ratio T alignment / T comm for HPC and Hadoop I clusters for Dataset S as a function of the number of nodes involved

Hadoop I HPC random
Number of nodes (cores) Mapping Talignment TalignmentTcomm Number of nodes (cores) Mapping Talignment time,minutes TalignmentTcomm
4(28) 293.5 1.71 4(64) 74.4 3.89
6(42) 189.8 1.62 10(160) 32.4 3.76
8(56) 136.0 1.62 14(224) 22.7 3.77
16(112) 70.3 1.48 18(288) 17.9 3.78
32(224) 39.3 1.66 22(352) 14.5 3.79
40(280) 32.5 1.65 26(416) 12.3 3.77
30(480) 10.7 3.73
34(544) 9.5 3.45
38(608) 8.5 3.16
42(672) 7.6 2.96
46(736) 7.0 2.55
50(800) 6.4 2.65
54(864) 5.9 2.34
58(928) 5.5 2.12

For the ‘HPC random’ approach, data chunks first have to be copied to the local node disks, and the alignments (SAM files) are copied back, while Hadoop keeps all of the data inside HDFS and, hence, does not need data staging. However, Hadoop needs to ingest the data to HDFS and preprocess the reads before the actual mapping stage so as to be able to operate in an MR manner, resulting in what we term ‘communication costs’. Note that each HPC node has 16 cores, while each Hadoop node has seven cores (the eighth core is dedicated to run the virtual machine).