A DAG Scheduling Scheme on Heterogeneous Computing Systems Using Tuple-Based Chemical Reaction Optimization

Yuyi Jiang; Zhiqing Shao; Yi Guo

doi:10.1155/2014/404375

. 2014 Jun 24;2014:404375. doi: 10.1155/2014/404375

A DAG Scheduling Scheme on Heterogeneous Computing Systems Using Tuple-Based Chemical Reaction Optimization

Yuyi Jiang ¹, Zhiqing Shao ^1,^*, Yi Guo ¹

PMCID: PMC4095736 PMID: 25143977

Abstract

A complex computing problem can be solved efficiently on a system with multiple computing nodes by dividing its implementation code into several parallel processing modules or tasks that can be formulated as directed acyclic graph (DAG) problems. The DAG jobs may be mapped to and scheduled on the computing nodes to minimize the total execution time. Searching an optimal DAG scheduling solution is considered to be NP-complete. This paper proposed a tuple molecular structure-based chemical reaction optimization (TMSCRO) method for DAG scheduling on heterogeneous computing systems, based on a very recently proposed metaheuristic method, chemical reaction optimization (CRO). Comparing with other CRO-based algorithms for DAG scheduling, the design of tuple reaction molecular structure and four elementary reaction operators of TMSCRO is more reasonable. TMSCRO also applies the concept of constrained critical paths (CCPs), constrained-critical-path directed acyclic graph (CCPDAG) and super molecule for accelerating convergence. In this paper, we have also conducted simulation experiments to verify the effectiveness and efficiency of TMSCRO upon a large set of randomly generated graphs and the graphs for real world problems.

1. Introduction

Modern computer systems with multiple processors working in parallel may enhance the processing capacity for an application. The effective scheduling of parallel modules of the application may fully exploit the parallelism. The application modules may communicate and synchronize several times during the processing. The limitation of the overall application performance may be incurred by a large communication cost on heterogeneous systems with a combination of GPUs, multicore processors and CELL processors, or distributed memory systems. And an effective scheduling may greatly improve the performance of the application.

Scheduling generally defines not only the processing order of application modules but also the processor assignment of these modules. The concept of makespan (i.e., the schedule length) is used to evaluate the scheduling solution quality including the entire execution and communication cost of all the modules. On the heterogeneous systems [1–4], searching optimal schedules minimizing the makespan is considered as a NP-complete problem. Therefore, two classes of scheduling strategies have been proposed to solve this problem by finding the suboptimal solution with lower time overhead, such as heuristic scheduling and metaheuristic scheduling.

Heuristic scheduling strategies try to identify a good solution by exploiting the heuristics. An important subclass of heuristic scheduling is list scheduling with an ordered task list for a DAG job on the basis of some greedy heuristics. Moreover, the ordered tasks are selected to be allocated to the processors which minimize the start times in list scheduling algorithms. In heuristic scheduling, the attempted solutions are narrowed down by greedy heuristics to a very small portion of the entire solution space. And this limitation of the solution searching leads to the low time complexity. However, the higher complexity DAG scheduling problems have, the harder greedy heuristics produce consistent results on a wide range of problems, because the quality of the found solutions relies on the effectiveness of the heuristics, heavily.

Metaheuristic scheduling strategies such as ant colony optimization (ACO), genetic algorithms (GA), Tabu search (TS), simulated annealing (SA), and so forth take more time cost than heuristic scheduling strategies, but they can produce consistent results with high quality on the problems with a wide range by directed searching solution spaces.

Chemical reaction optimization (CRO) is a new metaheuristic method proposed very recently and has shown its power to deal with NP-complete problem. There is only one CRO-based algorithm called double molecular structure-based CRO (DMSCRO) for DAG scheduling on heterogeneous system as far as we know. DMSCRO has a better performance on makespan and convergence rate than genetic algorithm (GA) for DAG scheduling on heterogeneous systems. However, the rate of convergence of DMSCRO as a metaheuristic method is still defective. This paper proposes a new CRO-based algorithm, tuple molecular structure-based CRO (TMSCRO), for the mentioned problem, encoding the two basic components of DAG scheduling, module execution order and module-to-processor mapping, into an array of tuples. Combining this kind of molecular structure with the elementary reaction operator designed in TMSCRO has a better capability of intensification and diversification than DMSCRO. Moreover, in TMSCRO, the concept of constrained critical paths (CCPs) [5] and constrained-critical-path directed acyclic graph (CCPDAG) are applied to creating initial population in order to speed up the convergence of TMSCRO. In addition, the first initial molecule, InitS, is also considered to be a super molecule [6] for accelerating convergence, which is converted from the scheduling result of the algorithm constrained earliest finish time (CEFT).

In theory, a metaheuristic method will gradually approach the optimal result if it runs for long enough, based on No-Free-Lunch Theorem, which means the performances of the search for optimal solution of each metaheuristic algorithm are alike when averaged over all possible fitness functions. We have conducted the simulation experiments over the graphs abstracted from two well-known real applications: Gaussian elimination and molecular dynamics application and also a large set of randomly generated graphs. The experiment results show that the proposed TMSCRO can achieve similar performance as DMSCRO in the literature in terms of makespan and outperforms the heuristic algorithms.

There are three major contributions of this work.

Developing TMSCRO based on CRO framework by designing a more reasonable molecule encoding method and elementary chemical reaction operators on intensification and diversification search than DMSCRO.
For accelerating convergence, applying CEFT and CCPDAG to the data pretreatment, utilizing the concept of CCPs in the initialization, and using the first initial molecule, InitS, to be a super molecule in TMSCRO.
Verifying the effectiveness and efficiency of the proposed TMSCRO by simulation experiments. The simulation results of this paper show that TMSCRO is able to approach similar makespan as DMSCRO, but it finds good solutions faster than DMSCRO by 12.89% on average (by 26.29% in the best case).

2. Related Work

Most of the scheduling algorithms can be categorized into heuristic scheduling (including list scheduling, duplication-based scheduling, and cluster scheduling) and metaheuristic (i.e., guided-random-search-based) scheduling. These strategies are to generate the scheduling solution before the execution of the application. The approaches adopted by these different scheduling strategies are summarized in this section.

2.1. Heuristic Scheduling

Heuristic methods usually provide near-optimal solutions for a task scheduling problem in less than polynomial time. The approaches adopted by heuristic method search only one path in the solution space, ignoring other possible ones [7]. Three typical kinds of algorithms based on heuristic scheduling for the DAG scheduling problem are discussed as below, such as list scheduling [7, 8], cluster scheduling [9, 10], and duplication-based scheduling [11, 12].

The list scheduling [7, 13–21] generates a schedule solution in two primary phases. In phase 1, all the tasks are processed in a sequence order by their assigned priorities, which are normally based on the task execution and communication costs. There are two attributes used in most list scheduling algorithms, such as b-level and t-level, to assign task priorities. In a DAG, b-level of a node (task) is the length of the longest path from the end node to the node; however, t-level of a node is the length of the longest path from the entry node to the node. In phase 2, the processors are assigned to each task in the sequence.

The heterogeneous earliest finish time (HEFT) scheduling algorithm [16] assigns the scheduling task priorities based on the earliest start time of each task. HEFT allocates a task to the processor which minimizes the task's start time.

The modified critical path (MCP) scheduling [22] considers only one CP (critical path) of the DAG and assigns the scheduling priority to tasks based on their latest start time. The latest start times of the CP tasks are equal to their t-levels. MCP allocates a task to the processor which minimizes the task's start time.

Dynamic-level scheduling (DLS) [23] uses the concept of the dynamic level, which is the difference between the b-level and earliest start time of a task on a processor. Each time the (task, processor) pair with the largest dynamic-level value is chosen by DLS during the task scheduling.

Mapping heuristic (MH) [24] assigns the task scheduling priorities based on the static b-level of each task, which is the b-level without the communication costs between tasks. Then, a task is allocated to the processor which gives the earliest start time.

Levelized-min time (LMT) [17] assigns the task scheduling priority in two steps. Firstly, it groups the tasks into different levels based on the topology of the DAG, and then in each level, the task with the highest priority is the one with the largest execution cost. A task is allocated to the processor which minimizes the sum of the total communication costs with the tasks in the previous level and the task's execution cost.

There are two heuristic algorithms for DAG scheduling on heterogeneous systems proposed in [8]. One algorithm named HEFT_T uses the sum of t-level and b-level to assign the priority to each task. In HEFT_T, the critical tasks are attempted to be on the same processor, and the other tasks are allocated to the processor that gives earliest start time. The other algorithm named HEFT_B applies the concept of b-level to assign the priority (i.e., scheduling order) to each task. After the priority assignment, a task is allocated to the processor that minimizes the start time. The extensive experiment results in [8] demonstrate that HEFT_B and HEFT_T outperform (in terms of makespan) other representative heuristic algorithms in heterogeneous systems, such as DLS, MH, and LMT.

Comparing with the list scheduling algorithms, the duplication-based algorithms [23, 25–29] attempt to duplicate the tasks to the same processor on heterogeneous systems, because the duplication may eliminate the communication cost of these tasks and it may effectively reduce the total schedule length.

The clustering algorithms [8, 11, 30–32] regard task collections as clusters to be mapped to appropriate processors. These algorithms are mostly used in the homogeneous systems with unbounded number of processors and they will use as many processors as possible to reduce the schedule length. Then, if the number of the processors used for scheduling is more than that of the available processors, the task collections (clusters) are processed further to fit in with a limited number of processors.

2.2. Metaheuristic Scheduling

In comparison with the algorithms based on heuristic scheduling, the metaheuristic (guided-random-search-based) algorithms use a combinatorial process for solution searching. In general, with robust performance on many kinds of scheduling problems, the metaheuristic algorithms need sampling candidate solutions in the search space, sufficiently. Many metaheuristic algorithms have been applied to solve the task scheduling problem successfully, such as GA, chemical reaction optimization (CRO), energy-efficient stochastic [33], and so forth.

GA [15, 31, 34–36] is the mostly used metaheuristic method for DAG scheduling. In [15], a solution for scheduling is encoded as one-dimensional string representing an ordered list of tasks to be allocated to a processor. In each string of two parent solutions, the crossover operator selects a crossover point randomly and then merges the head portion of one parent with the tail portion of the other. Mutation operator exchanges two tasks in two solutions, randomly. The concept of makespan is used to evaluate the scheduling solution quality by fitness function.

Chemical reaction optimization (CRO) was proposed very recently [20, 30, 37–39]. It mimics the interactions of molecules in chemical reactions. CRO has good performance already in solving many problems, such as quadratic assignment problem (QAP), resource-constrained project scheduling problem (RCPSP), channel assignment problem (CAP) [39], task scheduling in grid computing (TSGC) [40], and 0-1 knapsack problem (KP01) [41]. So far as we know, double molecular structure-based chemical reaction optimization (DMSCRO) recently proposed in [37] is the only one CRO-based algorithm with two molecular structures for DAG scheduling on heterogeneous systems. CRO-based algorithm (just DMSCRO) mimics the chemical reaction process in a closed container and accords with energy conservation. In DMSCRO, one solution for DAG scheduling including two essential components, task execution order and task-to-processor mapping, corresponds to a double-structured molecule with two kinds of energy, potential energy (PE) and kinetic energy (KE). The value of PE of a molecule is just the fitness value (objective value), makespan, of the corresponding solution, which can be calculated by the fitness function designed in DMSCRO, and KE with a nonnegative value is to help the molecule escape from local optimums. There are four kinds of elementary reactions used to do the intensification and diversification search in the solution space to find the solution with the minimal makespan, and the principle of the reaction selection is in detail presented in Section 3.2. Moreover, a central buffer is also applied in DMSCRO for energy interchange and conservation during the searching progress. However, as a metaheuristic method for DAG scheduling, DMSCRO still has very large time expenditure and the rate of convergence of this algorithm needs to be improved. Comparing with GA, DMSCRO is similar in model and workload to TMSCRO proposed in this paper.

Our work is concerned with the DAG scheduling problems and the flaw of CRO-based method for DAG scheduling, proposing a tuple molecular structure-based chemical reaction optimization (TMSCRO). Comparing with DMSCRO, TMSCRO applies CEFT [5] to data pretreatment to take the advantage of CCPs as heuristic information for accelerating convergence. Moreover, the molecule structure and elementary reaction operators design in TMSCRO are more reasonable than those in DMSCRO on intensification and diversification of searching the solution space.

3. Background

3.1. CEFT

Constrained earliest finish time (CEFT) based on the constrained critical paths (CCPs) was proposed for heterogeneous system scheduling in [5]. In contrast to other approaches, the CEFT strategy takes account of a broader view of the input DAG. Moreover, the CCPs can be scheduled efficiently because of their static generation.

The constrained critical path (CCP) is a collection with the tasks ready for scheduling only. A task is ready when all its predecessors were processed. In CEFT, a critical path (CP) is generally the longest path from the start node to the end node for scheduling in the DAG. The DAG is initially traversed and critical paths are found. Then it is pruned off the nodes that constitute a critical path. The subsequent traversals of the pruned graph produce the remaining critical paths. While the nodes are being removed from the task graph, a pseudo-edge to the start or end node is added if a node has no predecessors or no successors, respectively. The CCPs are subsequently formed by selecting ready nodes in the critical paths in a round-robin fashion. Each CCP may be assigned a single processor which has the minimum finish time of processing all the tasks in the CCP. All the tasks in a CCP not only reduce the communication cost, but also benefit from a broader view of the task graph.

Consider the CEFT algorithm generates schedules for n tasks with |P| heterogeneous processors. Some specific terms and their usage are indicated in Table 1.

Table 1.

Specific terms and their usage for the CEFT algorithm.

EC_{P_r}(w)	Execution cost of a node w using processor P _r
CM(w, P _r, v, P _x)	Communication cost from node v to w, if P _x has been assigned to node v and P _r is assigned to node w
ST_{P_r}(w, v)	Possible start time of node w which is assigned the processor P _r with the v node being any predecessor of w which has already been scheduled
EFT_{P_r}(w)	Finish time of node w using processor P _r
AEFT_w	Actual finish time of node w
CEFT_{P_r}(CCP_j)	Finish time of the constrained critical path Q _j when processor P _r is assigned to it
AT_{P_r}	Availability time of P _r
Pred(w)	Set of predecessors of node w
Succ(w)	Set of successors of node w
AEC(w)	Average execution cost of node w

Parameter	Value
InitialKE	1000
θ	500
ϑ	10
Buffer	200
KELossRate	0.2
MoleColl	0.2
PopSize	10
g	0.33
Number of runs	50

Parameter	Possible values
CCR	{0.1, 0.2, 1, 2, 5}
Number of processors	{4, 8, 16, 32}
Number of tasks	27

The number of processors	HEFT_B (the average makespan)	HEFT_T (the average makespan)	DMSCRO (the average makespan)	TMSCRO (the average makespan)	TMSCRO (the best makespan)	TMSCRO (the worst makespan)	TMSCRO (the variance of resultant makespans)
4	112.2	122.227	109.9	109.31	109.2	109.9	0.2473
8	112.2	112.648	108.9	107.83	107.1	108.9	0.9613
16	80.4	92.354	77.5	76.62	76.3	78.9	1.6696
32	79.64	85.454	77.5	76.62	76.1	78.9	1.7201

CCR	HEFT_B (the average makespan)	HEFT_T (the average makespan)	DMSCRO (the average makespan)	TMSCRO (the average makespan)	TMSCRO (the best makespan)	TMSCRO (the worst makespan)	TMSCRO (the variance of resultant makespans)
0.1	108.2	110.312	106.78	105.04	104.76	106.6	1.7271
0.2	112.2	112.648	108.9	107.83	107.1	108.9	0.9613
1	120.752	124.536	115.63	114.717	114.3	115.4	0.3787
2	207.055	197.504	189.4	188.303	188.1	188.75	0.1522
5	263.8	263.8	252.39	250.671	250.3	251.79	0.9178

The number of processors	HEFT_B (the average makespan)	HEFT_T (the average makespan)	DMSCRO (the average makespan)	TMSCRO (the average makespan)	TMSCRO (the best makespan)	TMSCRO (the worst makespan)	TMSCRO (the variance of resultant makespans)
4	149.205	142.763	139.51	138.13	137.87	138.6	0.1749
8	131.031	122.265	118.8	116.9	116.2	117.33	0.2764
16	124.868	115.584	113.52	113.36	113.1	113.43	0.0237
32	120.047	103.784	102.617	101.29	101.023	101.47	0.0442

CCR	HEFT_B (the average makespan)	HEFT_T (the average makespan)	DMSCRO (the average makespan)	TMSCRO (the average makespan)	TMSCRO (the best makespan)	TMSCRO (the worst makespan)	TMSCRO (the variance of resultant makespans)
0.1	82.336	90.136	80.53	77.781	77.3	78.9	0.9459
0.2	82.356	87.504	80.53	78.704	78.21	79.13	0.2002
1	124.868	115.584	113.52	113.36	113.1	113.43	0.0237
2	216.735	174.501	167.612	164.7	164.32	164.91	0.0742
5	274.7	274.7	265.8	262.173	262.022	262.6	0.1344

Parameter	Possible values
CCR	{0.1, 0.2, 1, 2, 5, 10}
Number of processors	{4, 8, 16, 32}
Number of tasks	{10, 20, 50}

The number of tasks	TMSCRO (the average makespan)	TMSCRO (the best makespan)	TMSCRO (the worst makespan)	TMSCRO (the variance of resultant makespans)
10	73	67	65.1	62.2
20	148.9	143.9	139.421	136.8
50	350.7	341.7	334.17	331.9

The number of processors	HEFT_B (the average makespan)	HEFT_T (the average makespan)	DMSCRO (the average makespan)	TMSCRO (the average makespan)	TMSCRO (the best makespan)	TMSCRO (the worst makespan)	TMSCRO (the variance of resultant makespans)
4	167.12	178.023	159.234	157.63	157.12	158.3	0.3923
8	136.088	145.649	128.17	127.178	127.06	127.7	0.1949
16	119.292	125.986	115.9	114.33	114.1	115.2	0.4753
32	111.866	120.065	108.7	108.71	108.31	108.9	0.0733

The number of processors	HEFT_B (the average makespan)	HEFT_T (the average makespan)	DMSCRO (the average makespan)	TMSCRO (the average makespan)	TMSCRO (the best makespan)	TMSCRO (the worst makespan)	TMSCRO (the variance of resultant makespans)
4	178.662	175.52	168.12	167.703	167.42	168	0.0857
8	138.572	136.47	131.8	131.451	131.1	131.9	0.178
16	125.772	124.31	122.91	122.32	122.1	122.432	0.0233
32	117.11	116.4	114.124	113.127	112.9	113.54	0.1348

i	CCP_i
1	A-B-D
2	C-G
3	F
4	E
5	H
6	I
7	J

DAG	The value of P after Friedman test	Average convergence acceleration ratio
Gaussian elimination	7.10 × 10⁻⁸	4.23%
Molecular dynamics code	2.54 × 10⁻⁸	7.21%
Random graph with 10 tasks	4.26 × 10⁻⁸	23.27%
Random graph with 20 tasks	3.48 × 10⁻⁸	16.41%
Random graph with 50 tasks	2.58 × 10⁻⁸	13.32%

CCR	The number of processors is 4	The number of processors is 8	The number of processors is 16	The number of processors is 32
0.1	156.97	115.724	110.3	101.87
0.2	157.63	127.178	114.33	108.71
1	167.703	131.451	122.32	113.127
2	294.042	289.878	273.375	269.514
5	473.5	467.61	429.13	428.13

Parameter	Values
CCR	{0.2, 1}
Number of processors	{8, 16}
Number of tasks	{10, 20, 50}

PERMALINK

A DAG Scheduling Scheme on Heterogeneous Computing Systems Using Tuple-Based Chemical Reaction Optimization

Yuyi Jiang

Zhiqing Shao

Yi Guo

Abstract

1. Introduction

2. Related Work

2.1. Heuristic Scheduling

2.2. Metaheuristic Scheduling

3. Background

3.1. CEFT

Table 1.

Algorithm 1.

3.2. CRO

4. Models

4.1. System Model

4.2. Application Model

Figure 1.

Figure 2.

5. Design of TMSCRO

5.1. Molecular Structure, Data Pretreatment, and Fitness Function

5.1.1. Molecular Structure and Data Pretreatment

Table 2.

Algorithm 2.

Figure 3.

5.1.2. Fitness Function

Algorithm 3.

Algorithm 4.

Algorithm 5.

5.2. Elementary Chemical Reaction Operators

5.2.1. On-Wall Ineffective Collision

Figure 4.

Figure 5.

5.2.2. Decomposition

Figure 6.

Figure 7.

5.2.3. Intermolecular Ineffective Collision

Figure 8.

Figure 9.

5.2.4. Synthesis

Figure 10.

Figure 11.

5.3. The Framework and Analysis of TMSCRO

Algorithm 6.

6. Simulation and Results

Figure 12.

Figure 13.

Figure 14.

Table 3.

6.1. Real World Application Graphs

6.1.1. Gaussian Elimination

Table 4.

Figure 15.

Table 5.

Figure 16.

Table 6.

6.1.2. Molecular Dynamics Code

Table 7.

Figure 18.

Figure 19.

Figure 17.

Table 8.

Table 9.

6.2. Random Generated Application Graphs

Table 10.

Figure 20.

Figure 21.

Table 11.

Table 12.

Table 13.

Figure 22.

Table 14.

6.3. Convergence Trace of TMSCRO

Table 15.

Table 16.

Table 17.

Figure 23.

Figure 24.

Figure 25.