#### DMCA

## Compile-time Dynamic Voltage Scaling Settings: Opportunities And Limits (2003)

### Cached

### Download Links

- [parapet.ee.princeton.edu]
- [parapet.ee.princeton.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proc. of 2003 PLDI |

Citations: | 57 - 7 self |

### Citations

1319 | Wattch: a framework for architectural-level power analysis and optimizations, in:
- Brooks, Tiwari, et al.
- 2000
(Show Context)
Citation Context ... performance counters to profile both performance and energy data for real, not simulated, application runs [16]. The data shown here have been gathered using the Wattch power/- performance simulator =-=[3]-=-, which is based on SimpleScalar [5]. Our simulations are run to completion for the provided inputs, so we get a full view of program execution. (Sampling methods might be accurate enough to give good... |

966 | Mediabench: a tool for evaluating and synthesizing multimedia and communications systems.
- Lee
- 1997
(Show Context)
Citation Context ..., our edge filtering method greatly prunes the search space for the MILP solver, and brings optimization times down from hours to seconds. (We gather these data for six of the MediaBench applications =-=[17]-=-, with a transition time of 12 s, and transition energy of 1.2J.) Table 3 shows that for the benchmarks considered the minimum energy determined by the solver remain essentially unchanged from the cas... |

527 |
AMPL: A Modeling Language for Mathematical Programming
- Fourer, Gay, et al.
- 1993
(Show Context)
Citation Context ...les have been collected and filtering strategies have been applied, the transition counts and the program graph structure are used to construct the equations that express DVS constraints. We use AMPL =-=[8]-=- to express the mathematical constraints and to enable pruning and optimizations before feeding the MILP problem into the CPLEX solver [13]. As shown in Figure 14, our edge filtering method greatly pr... |

475 | Evaluating future microprocessors: the simplescalar tool set.
- Burger, Austin, et al.
- 1996
(Show Context)
Citation Context ...h performance and energy data for real, not simulated, application runs [16]. The data shown here have been gathered using the Wattch power/- performance simulator [3], which is based on SimpleScalar =-=[5]-=-. Our simulations are run to completion for the provided inputs, so we get a full view of program execution. (Sampling methods might be accurate enough to give good profiles while reducing profile tim... |

403 | Voltage scheduling problem for dynamically variable voltage processors.
- Ishihara, Yasuura
- 1998
(Show Context)
Citation Context ...osal [19] and a taskbased algorithm like Luo and Jha's work [20]. Integer Linear Programming (ILP) based scheduling has also been used in algorithms at the OS level. For example, Ishihara and Yasuura =-=[15]-=- give an ILP formulation that does not take into account the transition costs. Swaminathan and Chakrabarty [28] incorporate the transition costs into the ILP formulation but make some simplifications ... |

287 | Efficient path profiling.
- Ball, Larus
- 1996
(Show Context)
Citation Context ... to hoist or coalesce mode-set instructions to avoid extra branches can potentially improve performance. More generally, we hope to broaden our MILP formulation to target larger code regions or paths =-=[2]-=-. Moving from edges to paths would allow us to build more program context into our analysis of mode-set positioning. Furthermore, it would also allow us to more accurately profile the time/energy cost... |

137 | Design issues for dynamic voltage scaling.
- Burd, Brodersen
- 2000
(Show Context)
Citation Context ... execution time switching cost ' Figure 13: Flow Diagram of the Technique from v i to v j . SE = (1 - u) # c # |v 2 i - v 2 j | ST = 2 # c IMAX |v i - v j | Equations for SE , ST have been taken from =-=[4]-=-, and are considered to be an accurate modeling of these transition costs. The variable c is the voltage regulator capacitance and u is the energy-efficiency of the voltage regulator. IMAX is the maxi... |

125 | The Design, Implementation, and Evaluation of a Compiler Algorithm for CPU Energy Reduction,”
- Hsu, Kremer
- 2003
(Show Context)
Citation Context ...termining optimal modes for each loop nest, but do not consider the energy overhead of switching modes. The efficiency of scheduling policies has also been discussed in the literature. Hsu and Kremer =-=[11]-=- have introduced a simple model to estimate theoretical bounds of energy reduction any DVS algorithm can produce. In [15], a simple ideal model which is solely based on the dynamic power dissipation o... |

116 | Intra-Task Voltage Scheduling for Low-Energy Hard Real-Time Applications",
- Shin, Kim
- 2001
(Show Context)
Citation Context ...t compile time. Mode-set instructions are inserted either evenly at regular intervals in the program like Lee and Sakurai's work [18], or on a limited number of control flow edges as proposed by Shin =-=[27]-=-. In the latter, the mode value is set using worst-case execution time analysis for each basic block. Hsu and Kremer [10] suggest lowering voltage/frequency in memory-bound regions using power-down in... |

111 | Run-time power estimation in high performance microprocessors,” in
- Joseph, Martonosi
- 2001
(Show Context)
Citation Context ...ver, that other means of profiling would also work well. One could for example, use hardware performance counters to profile both performance and energy data for real, not simulated, application runs =-=[16]-=-. The data shown here have been gathered using the Wattch power/- performance simulator [3], which is based on SimpleScalar [5]. Our simulations are run to completion for the provided inputs, so we ge... |

110 | Run-Time Voltage Hopping for Low-Power Real-Time Systems.
- Lee, Sakurai
- 2000
(Show Context)
Citation Context ... Some research efforts have targeted the use of mode-set instructions at compile time. Mode-set instructions are inserted either evenly at regular intervals in the program like Lee and Sakurai's work =-=[18]-=-, or on a limited number of control flow edges as proposed by Shin [27]. In the latter, the mode value is set using worst-case execution time analysis for each basic block. Hsu and Kremer [10] suggest... |

98 | Saving energy with architectural and frequency adaptations for multimedia applications,” MICRO,
- Hughes, Srinivasan, et al.
- 2001
(Show Context)
Citation Context ...tion in clock frequency (f). Thus, the voltage and frequency must be varied together. Proposals have been made for purely-hardware DVS [21] as well as for schemes that allow DVS with software control =-=[7, 14, 12]-=-. DVS accomplishes energy reduction through scheduling different parts of the computation to different (V,f) pairs so as to minimize energy while still meeting execution time deadlines. Over the past ... |

84 | Automatic performance setting for dynamic voltage scaling.
- Flautner, Reinhardt, et al.
- 2001
(Show Context)
Citation Context ...tion in clock frequency (f). Thus, the voltage and frequency must be varied together. Proposals have been made for purely-hardware DVS [21] as well as for schemes that allow DVS with software control =-=[7, 14, 12]-=-. DVS accomplishes energy reduction through scheduling different parts of the computation to different (V,f) pairs so as to minimize energy while still meeting execution time deadlines. Over the past ... |

70 | Using IPC variation in workloads with externally specified rates to reduce power consumption,”
- Ghiasi, Casmira, et al.
- 2000
(Show Context)
Citation Context ...rabarty [28] incorporate the transition costs into the ILP formulation but make some simplifications and approximations in order to make the formulation linear. At the microarchitecture level, Ghiasi =-=[9]-=- suggests the use of IPC (instructions per cycle) to direct DVS, and Marculescu [21] proposes the use of cache misses to direct DVS. Both are done through hardware support at run time. Some research e... |

53 | On the use of microarchitecture-driven dynamic voltage scaling,”
- Marculescu
- 2000
(Show Context)
Citation Context ... increases the device delay and so must be accompanied by a reduction in clock frequency (f). Thus, the voltage and frequency must be varied together. Proposals have been made for purely-hardware DVS =-=[21]-=- as well as for schemes that allow DVS with software control [7, 14, 12]. DVS accomplishes energy reduction through scheduling different parts of the computation to different (V,f) pairs so as to mini... |

41 | Energy-conscious compilation based on voltage scaling.
- Saputra, Kandemir, et al.
- 2002
(Show Context)
Citation Context ...ntially significant savings in energy consumption. Using this technique, they have been able to demonstrate modest energy savings. Subsequent work on using mathematical optimization by Saputra et al. =-=[25]-=- provides an exact mixed-integer linear programming (MILP) technique that can determine the appropriate (V,f) setting for each each loop nest. This optimization seems to result in better energy saving... |

35 | What is the limit of energy saving by dynamic voltage scaling. In:
- Qu
- 2001
(Show Context)
Citation Context ... model and ILP. Some other work focuses only on the limits of energy savings for DVS systems without taking into consideration actual policies. Qu provides models for feasible DVS systems in his work =-=[23]-=-. However, evaluating the potential energy savings of compile-time DVS policies for real programs has not received much attention thus far. We feel it is important as it gives us deep insight into opp... |

23 |
Power-proflie driven variable voltage scaling for heterogeneous distributed real-time embedded systems
- Luo, Jha
- 2003
(Show Context)
Citation Context ...re and compiler levels. Algorithms at the OS level using heuristic scheduling include an intervalbased algorithm like Lorch and Smith's proposal [19] and a taskbased algorithm like Luo and Jha's work =-=[20]-=-. Integer Linear Programming (ILP) based scheduling has also been used in algorithms at the OS level. For example, Ishihara and Yasuura [15] give an ILP formulation that does not take into account the... |

21 | Single region vs. multiple regions: A comparison of different compiler-directed dynamic voltage scheduling approaches,”
- Hsu, Kremer
- 2002
(Show Context)
Citation Context ...ogram would require careful analysis to determine when the mode-switch advantages outweigh the overhead. Hsu and Kremer provide a heuristic technique that lowers the voltage for memory bound sections =-=[10]-=-. The intuition is that the execution time here is bound by memory access time, and thus the compute time can be slowed down with little impact on the total execution time, but with potentially signif... |

16 |
Improving dynamic voltage algorithms with PACE
- Lorch, Smith
- 2001
(Show Context)
Citation Context ... exhaustively at the operating system, micro-architecture and compiler levels. Algorithms at the OS level using heuristic scheduling include an intervalbased algorithm like Lorch and Smith's proposal =-=[19]-=- and a taskbased algorithm like Luo and Jha's work [20]. Integer Linear Programming (ILP) based scheduling has also been used in algorithms at the OS level. For example, Ishihara and Yasuura [15] give... |

12 |
Alpha-power model, and its application to CMOS inverter delay and other formulas
- Sakurai, Newton
- 1990
(Show Context)
Citation Context ...hen the processor is idle. 4. The relationship between frequency and voltage is: f = k(v - v t ) # /v where v t is the threshold voltage, and # is a technology-dependent factor (currently around 1.5) =-=[24]-=-. 5. Computation can be assigned to different frequencies at an arbitrarily fine grain, i.e. a continuous partitioning of the computation and its assignment to different voltages is possible. 6. There... |

9 |
Circuit Design of XScaleTM Microprocessors
- Clark
- 2001
(Show Context)
Citation Context ...ed with a supply voltage of 0.7V, 600MHz at 1.3V, and a maximum performance setting of 800MHz at 1.65V. This is similar to some of the voltagefrequency pairings available in Intel's XScale processors =-=[6]-=-. Parameter Value RUU size 64 instructions LSQ size 32 instructions Fetch Queue size 8 instructions Fetch width 4 instructions/cycle Decode width 4 instructions/cycle Issue width 4 instructions/cycle ... |

6 |
Investigating the effect of voltage switching on low-energy task scheduling in hard real-time systems
- Swaminathan, Chakrabarty
- 2001
(Show Context)
Citation Context ...ling has also been used in algorithms at the OS level. For example, Ishihara and Yasuura [15] give an ILP formulation that does not take into account the transition costs. Swaminathan and Chakrabarty =-=[28]-=- incorporate the transition costs into the ILP formulation but make some simplifications and approximations in order to make the formulation linear. At the microarchitecture level, Ghiasi [9] suggests... |

3 |
Intel XScale (tm
- Corp
- 1999
(Show Context)
Citation Context ...tion in clock frequency (f). Thus, the voltage and frequency must be varied together. Proposals have been made for purely-hardware DVS [21] as well as for schemes that allow DVS with software control =-=[7, 14, 12]-=-. DVS accomplishes energy reduction through scheduling different parts of the computation to different (V,f) pairs so as to minimize energy while still meeting execution time deadlines. Over the past ... |

1 |
Web page for ILOG CPLEX mathematical programming software
- CPLEX
- 2002
(Show Context)
Citation Context ...truct the equations that express DVS constraints. We use AMPL [8] to express the mathematical constraints and to enable pruning and optimizations before feeding the MILP problem into the CPLEX solver =-=[13]-=-. As shown in Figure 14, our edge filtering method greatly prunes the search space for the MILP solver, and brings optimization times down from hours to seconds. (We gather these data for six of the M... |

1 |
Mpeg video test bitstreams. http://www.mpeg.org/MPEG/video.html
- MpegTv
- 1998
(Show Context)
Citation Context ...category uses no `B' frames; it includes 100b.m2v and bbc.m2v. The second category uses 2 `B' frames between I and P frames; it includes flwr.m2v and cact.m2v. All mpeg files are Test Bitstreams from =-=[22]-=-. Figure 19 shows program execution times for different input data and profiling runs for the mpeg benchmark. In particular, the xaxis shows four different input files for the benchmark. For each benc... |