Exploration of Power Domain Partitioning with Concurrent Task Mapping and Scheduling for Application-Specific Multi-core SoCs

Bo Wang; Aneek Imtiaz; Joachim Falk; Michael Glaß; Jürgen Teich

doi:10.1007/978-3-030-52794-5_12

. 2020 Jun 12;12155:153–167. doi: 10.1007/978-3-030-52794-5_12

Exploration of Power Domain Partitioning with Concurrent Task Mapping and Scheduling for Application-Specific Multi-core SoCs

Bo Wang ^14,^✉, Aneek Imtiaz ¹⁵, Joachim Falk ¹⁴, Michael Glaß ¹⁶, Jürgen Teich ¹⁴

Editors: André Brinkmann⁸, Wolfgang Karl⁹, Stefan Lankes¹⁰, Sven Tomforde¹¹, Thilo Pionteck¹², Carsten Trinitis¹³

PMCID: PMC7343418

Abstract

This paper proposes a novel approach to explore the design space of Power Domain (PD) partitioning in the architecture definition phase of heterogeneous SoCs. By formulating an Integer Linear Program (ILP), task mapping and scheduling is determined concurrently while considering power-off dependencies among cores in the same PD and the power-gating break-even time. Compared to state-of-the-art approaches aiming at design phases where task mapping and scheduling has been frozen, our proposed approach shifts joint exploration into earlier design phases, creates more power-gating opportunities for PD partitions, and thus identifies better trade-offs in terms of energy consumption and design costs.

Keywords: Power domain partition, Task mapping and scheduling, Evolutionary algorithm, Integer linear programming

Introduction

Power gating is an effective technique to reduce static power consumption of System-on-Chips (SoCs), like 5G New Radio modems in which dozens of heterogeneous cores are often adopted to achieve Gbits/s uplink and downlink speed. An SoC is divided into multiple Power Domains (PDs), which can be switched off individually when all cores and Hardware (HW) IPs in the same PD are idle, a so-called common idle interval. Power-gating control is more flexible when finer-grained power domains are partitioned. However, this would indeed result in a huge design, verification, and layout effort, even increase area and degrade power consumption and timing closure [13]. On the other hand, due to parallelism among tasks, merging HW resources which are active simultaneously into the same power domain may reduce design complexity without sacrificing power efficiency.

Some researchers have started investigating methodologies for exploration of PD partitioning to trade off energy consumption and the number of PDs. In [13], PD partitioning is explored by using a Multi-Objective Evolutionary Algorithm (MOEA), but it aims at the design phases in which task mapping and scheduling has been accomplished already, and determines the idle intervals of each HW resource rather than optimizing them. During subsequent PD partitioning, common idle intervals are post-processed for each PD, as well as power-gating break-even times. Power gating is exploited only for common idle intervals longer than a break-even time. After that, energy consumption is evaluated for partition candidates. Finally, trade-off fronts are obtained by the MOEA in terms of energy consumption and the number of used power domains. However, this approach does not explore the influence of task mapping and scheduling. We illustrate the lost potential through a motivating example in the following.

Motivating Example

Figure 1 shows two periodic applications with the same period generated by TGFF [5], as well as a HW architecture consisting of three fully connected heterogeneous processors. Power consumption of each processor is modeled by three power states [14], i.e., Inline graphic , , and , where the state RUN denotes the resource actively executing a program task, IDLE denoting being powered on with no task in execution, and OFF denoting the power-gated mode. First, task mapping and scheduling is performed to minimize energy consumption, where only two states – RUN and IDLE – and transition energy between them are assumed for processors. After that, PD partitions are explored using the approach in [13]. The found trade-off fronts are presented in Fig. 2. Take the trade-off front with 2 PDs as an example shown in Fig. 3(a). Although Inline graphic is idle from 0 ms to 23 ms, cannot be powered off because is still executing. If the task mapping and scheduling would consider the power-off dependency between and , it may re-allocate the tasks and align the execution in the same PD. Unfortunately, scheduling before PD partitioning does not have such knowledge and, thus, misses optimization potential.

Fig. 1. — Task graphs for given applications and with the same period but different deadline , as well as underlying heterogeneous architecture R.

Inline graphic — Task graphs for given applications and with the same period but different deadline , as well as underlying heterogeneous architecture R.

Fig. 2. — Power domain partitioning and task mapping & scheduling for the trade-off front with 2 PDs for the motivating example: (a) PD partitioning performed after task mapping and scheduling; (b) PD partitioning with concurrent task mapping and scheduling as proposed in this work.

Fig. 3. — Trade-off fronts for normalized energy (to energy of 1-PD trade-off front obtained by partitioning PDs after task mapping and scheduling) vs. number of power domains (design complexity).

Based on this observation, we propose a methodology to explore power domain partitioning with concurrent task mapping and scheduling. For each candidate explored during PD partitioning, task mapping and scheduling is performed with additional constraints for the power domain dependency and power-gating break-even time. As a result, more and longer common idle intervals in each PD may be created by properly mapping and aligning task execution on processors, as shown for Inline graphic in Fig. 3. Power consumption is thus reduced due to longer power-gated state as shown in Fig. 3(b) and Fig. 2. More important, system architects may even prefer the 2-PD option identified by our approach to reduce design cost if it already meets the power target. The proposed approach actually expands the exploration space of PD partitioning.

Contribution

State-of-the-art approaches for PD partitioning exploration consider power vs. design cost for heterogeneous multi-core SoCs where task mapping and scheduling has been already frozen at design time, e.g., assuming multiple subsystems are re-used and integrated in an SoC. This paper discusses the further optimizations applicable to SoCs when task mapping and scheduling can be combined with PD partitioning and jointly optimized. Our major contributions are summarized as follows:

Tasks are mapped and scheduled specifically for each PD partition candidate, concurrently with PD partitioning exploration by a Multi-Objective Evolutionary Algorithm (MOEA). This aligns task execution and creates more common idle intervals for power gating.
Task mapping and scheduling is formulated as an Integer Linear Programming (ILP), in particular integrating: 1) power-on/off dependencies introduced by PD partitioning among HW resources in the same PD; 2) constraints of power-gating break-even time due to transition energy and latency overhead.
Experimental results show that our proposed joint exploration can identify much better trade-off fronts with significantly reduced design costs but without scarifying the power target. E.g., one experiment shows that the same power target can be achieved by 2 PDs, instead of 8 PDs when applying the approach in [13].

The aimed application domains of this work are time-critical or safety-critical [3] application specific embedded systems, such as wireless communications and electric vehicles [10]. There, most application tasks and use cases are known at design time, and static scheduling is also more favorable due to its determinism.

Related Work

Several research works exist on how to partition power domains at circuit level. In [2], Finite State Machine with Datapath (FSMD) circuits are decomposed into loosely coupled domains which may be power or clock gated. But, the workload characteristics are not considered. In [1], an approach leveraging rule-based design is proposed to automatically partition combinational logic into multiple PDs while considering usage characteristics. However, all of these studies [1, 2] focus on micro-architecture level and RTL design phases. In [13], PD partitioning is explored at the Electronic System Level (ESL), thus for SoC architecture definition phases, but after task mapping and scheduling has been accomplished. As motivated earlier, this may hinder the maximization of idle intervals to reduce power or to allow to lower the number of power domains. In [7], a relevant task mapping and scheduling problem is discussed to Maximize Common Idle Interval (MCIT) among all cores, though the objective is to reduce active time and power consumption of a shared memory. The ILP formulation of common idle intervals is based on a discrete time axis. In [6], the idle interval of each core is modeled at a continuous time axis. But the approach does not formulate common idle intervals. In both [7] and [6], homogeneous multi-core systems are considered. However, these are different from the power optimization of a heterogeneous architecture. Allocating tasks to more energy efficient cores may lead to lower power than merely pursuing MCIT. Moreover, both works do not investigate PD partitioning problem, but assume each core in an individual power domain.

Some other works address voltage-frequency islands partitioning at system level [9, 12], to reduce dynamic power. But the problem formulation is different. PD partitioning has to model power-off states, on/off dependencies within power domains, and power-off break-even times. This is difficult to model together with the problem of task mapping and scheduling. And, [12] does not consider task mapping and scheduling while [9] considers scheduling but not mapping.

Overview of the Methodology

An overview of our methodology is presented in Fig. 4.

Given

Periodic applications, each of which can be modeled as a directed acyclic task graph , in which a task v belongs to the set of tasks V, E denotes data dependencies among tasks, an arbitrary period and deadline .
An SoC architecture consisting of a set of HW resources denoted as R, power model of any resource in different power states, e.g., , , , power-gating transition latency and energy , as well as wake-up transition latency and energy from power-off state.
Mapping constraints that represent which task can be realized on which resource and the execution time of each task for a given resource.

Objective and Solution

The objective is to explore trade-off fronts in terms of energy consumption and the number of power domains (representing a measure of design complexity) for the problem of power domain partitioning including task mapping and scheduling.

An MOEA in [11] is used to explore the space of PD partitionings. Physical design or floorplan constraints can be added to prune the exploration space, if they can be forecast from previous products. For example, two resources far away in floorplan make less sense to be placed into the same PD.

For each PD partition, an ILP is generated and solved to determine a mapping and schedule for each task and a suitable schedule of power mode transitions for each PD, with the objective to minimize the energy consumption. Power-on/off dependencies of HW resources in the same PD, power-gating transition energy and break-even time are all considered here. The energy consumption value derived by the ILP solver is fed back to the MOEA as one evaluated objective of each PD partition. The state-based power modeling approach as in [14] is chosen because it achieves sufficient accuracy at system level and early design phases. The power models can be refined along the design cycle, e.g., consider different active power Inline graphic for different types of tasks running on a resource.

This work considers only static scheduling at design time. In principle, it may inspire the solution that considers the impact of run-time task migration. For example, add online scheduling algorithms after the ILP solver, and then evaluate the power consumption of each PD partition. However, it would take significantly longer exploration time, because the simulation is required to evaluate the power consumption. This is not the target application domain of this work.

ILP Formulation

The time is assumed to be discrete, divided into unit time intervals Inline graphic , for , which we call time slots [7]. We refer to as time slot t, or even as time t. Tasks are assigned to time slots and is an integer. The continuous-time version of the same problem can be approximated as a discrete-time version.

In this work, multiple independent periodic applications, e.g., Inline graphic , with arbitrary deadlines and periods, can be considered together in a single ILP. This applies to the architecture which supports multiple applications simultaneously. A hyper-period of all applications, denoted as M, is chosen to map and schedule tasks from all applications within this hyper-period. Moreover, when the deadline of an application is longer than the period, a pipelined schedule is performed, i.e., a task graph is divided into several pipeline stages so the current iteration of the task can overlap in execution with previous iterations [15]. However, our methodology is not limited to any specific pipeline approach, which is also not the focus of this paper. The following formulations are elaborated by using only one application with multiple periods for ease of explanation. But experiments in this work were done for problems containing multiple applications.

Table 1 defines ILP constants which are determined for each PD partitioning candidate by the MOEA. Table 2 explains the introduced ILP binary variables prior to introducing the ILP mapping and scheduling model.

Table 1.

Constants in ILP formulation related to power gating and PD partitioning, and determined by the MOEA.

Symbols	Description
pd	A power domain from power domain set PD
pd(r)	The power domain containing resource r
R(pd)	Set of all resources in power domain pd
	Break-even time of power domain pd
	Off-on transition time of power domain pd
	On-off transition time of power domain pd
	Off-on transition energy of power domain pd
	On-off transition energy of power domain pd

Open in a new tab

Table 2.

Binary variables in ILP formulation related to power gating and PD partitioning.

Symbols	Description
	1 iff task v is mapped to resource r in period k
	1 iff task v in period k is starting at time t
	1 iff resource r is busy at time t
	1 iff all resources in power domain pd are mutually idle at time t
	1 iff all resources in power domain pd are mutually idle from time t to time
	1 iff power domain pd is in off state at time t
	1 iff power domain pd has either on-off or off-on transition at time t

Open in a new tab

Objective Function

An application with N periods is to be scheduled on a heterogeneous architecture. The interval of time slots is denoted as Inline graphic , where the hyper-period in case of only one application. The objective function of the ILP is to minimize the total energy consumption according to Eqs. (1)–(8), including energy consumption of each resource in power states RUN, IDLE and OFF, denoted as , and , as well as total on-off and off-on transition energies of each power domain, denoted as Inline graphic .

Transition energy of a power domain is calculated by Eqs. (5)–(8). Inline graphic denotes total number of transitions (both on-off and off-on) in power domain pd. The PD transition latency is determined by the resource with the longest latency in this PD. During power-off and power-on transitions, other resources are assumed to be in OFF state and IDLE state after its own transition, respectively. The related energies are modeled as part of the PD transition energy, as calculated by

Constraints

Here, we focus on explanation of ILP formulation related to power gating and PD partitioning. Other ILP constraints for basic task mapping and scheduling are not elaborated, since they are very well-known and not novel, e.g. task mapping constraints, task dependency constraints, deadline constraints, and so on [8].

Unique Start Time Constraint: Each task must start exactly once, thus in one time slot.

Resource Busy Time Constraint: The number of busy slots of a resource should be equal to the total execution time of all tasks mapped on it. Moreover, from the start time slot of a task, it should be consecutive 1’s assigned to the busy vector of a resource on which the task is mapped.

Common Idle Time Constraint: A power domain is idle only when all resources in that domain are idle. This can be modeled by performing a logical NOR operation among the busy vectors of all resources in that domain:

The NOR operation is nonlinear, but the Boolean logic operation can be transformed to linear constraints. Let Inline graphic denote the number of resources in power domain pd. Equation (12) is transformed as below.

Off State Time Constraint: A power domain should be switched off only when its common idle interval is longer than its power-gating break-even time which can be modeled as Eqs. (15)–(17), and rounded to the nearest greater integer.

To derive off-state slot vectors, an auxiliary variable Inline graphic is introduced to represent adjacent slots of a pd from slot t to slot are all idle. This can be done by a logical AND operation:

And, Inline graphic zeros have to be padded at the beginning and the end of vector using Eq. (19).

Now, the final off state time slot vector Inline graphic can be derived from Eq. (20). The off state slot if any of from slot to slot t. It can be performed by a logical OR operation. Equations (18) and (20) are non-linear, but they can be transformed into linear inequalities in a similar way as shown in Eq. (12). The details are not shown here.

Transition State Time Constraint: On-off and off-on transition states are formulated by taking logical XOR operation of the current and previous one slot in the off state vector, as given in Eqs. (21). Similarly, it can be transformed into linear inequalities as well. The number of power domain transitions, i.e., Inline graphic , includes both off-on and on-off.

Experimental Results

The proposed approach has been experimented on different benchmarks. The first set of benchmarks is from a public benchmark suite E3S [4], while the second one consists of synthetic benchmarks generated using the tool TGFF [5]. The main program of the flow was implemented using Python, but the MOEA was implemented using Java [11]. All of programs have been executed on a laptop with an i5-5300U CPU @ 2.3 GHz (2 cores, 4 threads) and 12 GB DDR memory.

For comparison, the same experiments were performed by applying the approach [13] performing PD partitioning after task mapping and scheduling. We called it as the reference approach in the following. Here, various task mapping and scheduling algorithms can be applied before PD partitioning with desired optimization objectives, like execution time or power. They lead to different energy consumption after PD partitioning and power gating. Since our work focuses on energy optimization, as a fair comparison, we performed an energy-aware task mapping and scheduling also using the approach of ILP. But in this ILP formulation, processors are assumed to be only in RUN or IDLE states without OFF states. PD partitioning and power-gating related constraints are not applied during this step. Therefore, in the objective function, Inline graphic and in Eq. (1) become zero, and in Eq. (3) are zero too.

Benchmark Applications from E3S

Three benchmarks are selected from E3S [4], i.e., Networking, Telecom and Consumer. They are scheduled onto a heterogeneous architecture consisting of a 2-D Inline graphic mesh of processors whose power consumption is also specified in E3S. The transition latency in Table 1 varied in the range of 10–50 us, and the task execution times ranged in the interval of 0.5–1 ms. The transition energies and in Eqs. (5)–(6) were assumed zero in the following. Therefore, the power-gating break-even time Inline graphic was determined by the transition latency according to Eqs. (16)–(17).

The MOEA has been configured to use 20 generations with 10 individuals per generation. For each number of power domains, the solutions with the lowest normalized energy according to Eqs. (1)–(8) are shown in Fig. 5.

Fig. 5. — Trade-off fronts for E3S benchmarks with normalized energy (to energy of 1-PD partition obtained by the reference approach [13], i.e., PD partitioning performed after mapping and scheduling) vs. hardware complexity (number of power domains).

It can be noticed that for each number of power domains, the trade-off point using our approach has a lower energy. This is because our concurrent mapping and scheduling of tasks with PD partitioning is able to create more common idle intervals specific for each PD partition to allow more power-off opportunities. Therefore, better power savings can be achieved even with fewer power domains. For example, in the benchmark of Telecom, a lower power consumption can be achieved even with 2-PD partition, in comparison to a trade-off point for an 8-PD partition in the reference approach [13]. Much better PD partitioning trade-off points can be identified to meet the power target at significantly reduced design cost.

The exploration took 8–9 h in which we set the timelimit of the ILP solver to 3 min. Notably, this is longer than the approach [13] which took about 1–2 h This is expected, because our approach has to perform task mapping and scheduling in addition. Still, limiting the ILP solver to 3 min has two impacts: 1) the currently best found solution by the ILP may not be the optimal one in terms of energy consumption, but has a relative optimality gap of 10–20%, reported by the solver; 2) the ILP solver even may not find any feasible solution as the problem size increases, though it never happened in our benchmarks. Nevertheless, our approach was always able to find lower energy consumption points for each number of PD partitions. If more exploration time is acceptable, our approach would be able to probably find even better results. This is a trade-off that system architects can decide during system-level exploration.

Benchmarks Generated by TGFF

Three benchmarks have been generated by TGFF [5], with different tightness of deadline, i.e., the deadline is equal to, shorter, or longer than the period, as shown in Table 3. Each use case has multiple applications to be scheduled over their hyper-period. When the deadline is longer than the period, e.g., in use case 3, or a multimedia streaming application, different iterations of applications can overlap. Therefore, we partitioned the task set and performed scheduling for steady state in one hyper-period [15]. All three use cases have size in the range of 40–50 tasks whereas the heterogeneous architecture consists of a 2D mesh with Inline graphic processors. The power data, transition time, and energy for these processors were obtained from an in-house design. The EA parameters for the exploration are the same as for the E3S benchmarks. As shown in Fig. 6, our proposed approach also identifies better trade-off fronts than the reference approach [13].

Table 3.

Three use cases generated by TGFF.

Use case	Period	Deadline	Hyper-period	Iterations
1	9 ms	9 ms	18 ms	2
1	18 ms	18 ms	18 ms	1
2	7.5 ms	6 ms	15 ms	2
	15 ms	12 ms		1
	7.5 ms	6 ms		2
3	5 ms	10 ms	10 ms	2
	10 ms	15 ms		1
	10 ms	12 ms		1

Open in a new tab

Fig. 6. — Trade-off fronts for TGFF benchmarks with normalized energy (to energy of 1-PD partition obtained by the reference approach [13], i.e., PD partitioning performed after mapping and scheduling) vs. hardware complexity (number of power domains).

Scalability Analysis

The total exploration time depends on two parts: 1) EA parameters, mainly the number of generations and individuals per generation (PD partition options), which typically increases with the higher complexity of the hardware architecture; 2) execution time of ILP solver for each PD partition option, which scales non-linearly with the size of hardware architecture, the size of task graph, and most importantly, with the time scale of the schedule. Therefore, our approach is not easily scalable for bigger problems. We experimented the execution time of the ILP solver, given an architecture of a mesh network with 6 processors. When increasing the number of tasks to 80 and set the relative optimality gap of the ILP solver to 20%, a feasible schedule cannot be found within 2 h though it is preferred in the range of minutes as a part of whole flow. Alternative models for scheduling might be the key to reduce the number of binary variables and thus search space of the ILP formulation to improve scalability.

Although our performed experiments were solvable for real-world benchmarks in still an acceptable amount of time, we envision to investigate scalability in future work.

Conclusion

In this paper, an exploration approach is proposed to systematically explore PD partitioning for heterogeneous multi-core SoCs, jointly with task mapping and scheduling. An ILP-based task mapping and scheduling is performed for each PD partition candidates while partitioning PDs by a Multi-Objective Evolutionary Algorithm. The ILP formulation considers the constraints of power-off dependencies among hardware resources belonging to the same PD and the power-gating break-even time. For a given PD partition, it creates more and longer common idle intervals of PDs which can be switched off more often to save power. Compared to state-of-the-art approaches performed after task mapping and scheduling frozen, our approach offers significantly larger optimization opportunities for system architects. It has been shown that better trade-off fronts in terms of energy consumption and number of PDs and thus hardware costs may be found by shifting exploration to earlier design phases.

Acknowledgments

This research work was funded by Intel Deutschland GmbH, and finished before Bo Wang and Aneek Imtiaz left Intel. We would like to acknowledge Dr. Yang Xu and Dr. Ralph Hasholzner at Intel, and also Dr. Thomas Wild and Prof. Andreas Herkersdorf at Technische Universität München, for valuable discussions.

Footnotes

This research work was funded by Intel Deutschland GmbH.

Contributor Information

André Brinkmann, Email: brinkman@uni-mainz.de.

Wolfgang Karl, Email: wolfgang.karl@kit.edu.

Stefan Lankes, Email: slankes@eonerc.rwth-aachen.de.

Sven Tomforde, Email: st@informatik.uni-kiel.de.

Thilo Pionteck, Email: thilo.pionteck@ovgu.de.

Carsten Trinitis, Email: carsten.trinitis@tum.de.

Bo Wang, Email: bo.wang1102@gmail.com.

Aneek Imtiaz, Email: aneekimtiaz@gmail.com.

Joachim Falk, Email: joachim.falk@fau.de.

Michael Glaß, Email: michael.glass@uni-ulm.de.

Jürgen Teich, Email: juergen.teich@fau.de.

References

1.Agarwal, A., Arvind, A.: Leveraging rule-based designs for automatic power domain partitioning. In: ICCAD, November 2013
2.Agarwal, N., et al.: FSMD partitioning for low power using simulated annealing. In: ISCAS, May 2008
3.Baruah, S., Fohler, G.: Certification-cognizant time-triggered scheduling of mixed-criticality systems. In: RTSS, November 2011
4.Dick, R.: Embedded system synthesis benchmarks suites (E3S) (2017). http://ziyang.eecs.umich.edu/~dickrp/e3s/
5.Dick, R., Rhodes, D., Wolf, W.: TGFF: task graphs for free. In: CODES/CASHE, March 1998
6.Esmaili, A., Nazemi, M., Pedram, M.: Modeling processor idle times in MPSoC platforms to enable integrated DPM, DVFS, and task scheduling subject to a hard deadline. In: ASPDAC, January 2019
7.Fu, C., Zhao, Y., Li, M., Xue, C.J.: Maximizing common idle time on multicore processors with shared memory. In: IEEE Transactions on Very Large Scale Integration (VLSI) Systems (2017)
8.Glaß M, Teich J, Lukasiewycz M, Reimann F. Hybrid optimization techniques for system-level design space exploration. In: Ha S, Teich J, editors. Handbook of Hardware/Software Codesign. Dordrecht: Springer; 2017. [Google Scholar]
9.Liu, Y., Yang, Y., Hu, J.: Clustering-based simultaneous task and voltage scheduling for NoC systems. In: ICCAD, November 2010
10.Lukasiewycz, M., et al.: Cyber-physical systems design for electric vehicles. In: DSD, September 2012
11.Lukasiewycz, M., Glaß, M., Reimann, F., Teich, J.: Opt4J-a modular framework for meta-heuristic optimization. In: GECCO, July 2011
12.Ogras, U.Y., Marculescu, R., Choudhary, P., Marculescu, D.: Voltage-frequency island partitioning for GALS-based networks-on-chip. In: DAC, June 2007
13.Wang, B., et al.: Exploration of power domain partitioning for application-specific SoCs in system-level design. In: GI/ITG/GMM Workshop Methoden und Beschreibungssprachen zur Modellierung und Verifikation von Schaltungen und Systemen, MBMV, March 2016
14.Xu Y, et al. A very fast and quasi-accurate power-state-based system-level power modeling methodology. In: Herkersdorf A, Römer K, Brinkschulte U, et al., editors. Architecture of Computing Systems – ARCS 2012; Heidelberg: Springer; 2012. pp. 37–49. [Google Scholar]
15.Yang, H., Ha, S.: Pipelined data parallel task mapping/scheduling technique for MPSoC. In: DATE, April 2009

[CR1] 1.Agarwal, A., Arvind, A.: Leveraging rule-based designs for automatic power domain partitioning. In: ICCAD, November 2013

[CR2] 2.Agarwal, N., et al.: FSMD partitioning for low power using simulated annealing. In: ISCAS, May 2008

[CR3] 3.Baruah, S., Fohler, G.: Certification-cognizant time-triggered scheduling of mixed-criticality systems. In: RTSS, November 2011

[CR4] 4.Dick, R.: Embedded system synthesis benchmarks suites (E3S) (2017). http://ziyang.eecs.umich.edu/~dickrp/e3s/

[CR5] 5.Dick, R., Rhodes, D., Wolf, W.: TGFF: task graphs for free. In: CODES/CASHE, March 1998

[CR6] 6.Esmaili, A., Nazemi, M., Pedram, M.: Modeling processor idle times in MPSoC platforms to enable integrated DPM, DVFS, and task scheduling subject to a hard deadline. In: ASPDAC, January 2019

[CR7] 7.Fu, C., Zhao, Y., Li, M., Xue, C.J.: Maximizing common idle time on multicore processors with shared memory. In: IEEE Transactions on Very Large Scale Integration (VLSI) Systems (2017)

[CR8] 8.Glaß M, Teich J, Lukasiewycz M, Reimann F. Hybrid optimization techniques for system-level design space exploration. In: Ha S, Teich J, editors. Handbook of Hardware/Software Codesign. Dordrecht: Springer; 2017. [Google Scholar]

[CR9] 9.Liu, Y., Yang, Y., Hu, J.: Clustering-based simultaneous task and voltage scheduling for NoC systems. In: ICCAD, November 2010

[CR10] 10.Lukasiewycz, M., et al.: Cyber-physical systems design for electric vehicles. In: DSD, September 2012

[CR11] 11.Lukasiewycz, M., Glaß, M., Reimann, F., Teich, J.: Opt4J-a modular framework for meta-heuristic optimization. In: GECCO, July 2011

[CR12] 12.Ogras, U.Y., Marculescu, R., Choudhary, P., Marculescu, D.: Voltage-frequency island partitioning for GALS-based networks-on-chip. In: DAC, June 2007

[CR13] 13.Wang, B., et al.: Exploration of power domain partitioning for application-specific SoCs in system-level design. In: GI/ITG/GMM Workshop Methoden und Beschreibungssprachen zur Modellierung und Verifikation von Schaltungen und Systemen, MBMV, March 2016

[CR14] 14.Xu Y, et al. A very fast and quasi-accurate power-state-based system-level power modeling methodology. In: Herkersdorf A, Römer K, Brinkschulte U, et al., editors. Architecture of Computing Systems – ARCS 2012; Heidelberg: Springer; 2012. pp. 37–49. [Google Scholar]

[CR15] 15.Yang, H., Ha, S.: Pipelined data parallel task mapping/scheduling technique for MPSoC. In: DATE, April 2009

PERMALINK

Exploration of Power Domain Partitioning with Concurrent Task Mapping and Scheduling for Application-Specific Multi-core SoCs

Bo Wang

Aneek Imtiaz

Joachim Falk

Michael Glaß

Jürgen Teich

Abstract

Introduction

Motivating Example

Fig. 1.

Fig. 2.

Fig. 3.

Contribution

Related Work

Overview of the Methodology

Fig. 4.

ILP Formulation

Table 1.

Table 2.

Objective Function

Constraints

Experimental Results

Benchmark Applications from E3S

Fig. 5.

Benchmarks Generated by TGFF

Table 3.

Fig. 6.

Scalability Analysis

Conclusion

Acknowledgments

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Exploration of Power Domain Partitioning with Concurrent Task Mapping and Scheduling for Application-Specific Multi-core SoCs

Bo Wang

Aneek Imtiaz

Joachim Falk

Michael Glaß

Jürgen Teich

Abstract

Introduction

Motivating Example

Fig. 1.

Fig. 2.

Fig. 3.

Contribution

Related Work

Overview of the Methodology

Fig. 4.

ILP Formulation

Table 1.

Table 2.

Objective Function

Constraints

Experimental Results

Benchmark Applications from E3S

Fig. 5.

Benchmarks Generated by TGFF

Table 3.

Fig. 6.

Scalability Analysis

Conclusion

Acknowledgments

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases