Results 1 - 10
of
24
An analysis of efficient multi-core global power management policies: Maximizing performance for a given power budget
- in Proc. Intl’ Symp. Microarch. (MICRO
, 2006
"... Chip-level power and thermal implications will continue to rule as one of the primary design constraints and performance limiters. The gap between average and peak power actually widens with increased levels of core integration. As such, if per-core control of power levels (modes) is possible, a glo ..."
Abstract
-
Cited by 48 (1 self)
- Add to MetaCart
Chip-level power and thermal implications will continue to rule as one of the primary design constraints and performance limiters. The gap between average and peak power actually widens with increased levels of core integration. As such, if per-core control of power levels (modes) is possible, a global power manager should be able to dynamically set the modes suitably. This would be done in tune with the workload characteristics, in order to always maintain a chip-level power that is below the specified budget. Furthermore, this should be possible without significant degradation of chip-level throughput performance. We analyze and validate this concept in detail in this paper. We assume a per-core DVFS (dynamic voltage and frequency scaling) knob to be available to such a conceptual global power manager. We evaluate several different policies for global multi-core power management. In this analysis, we consider various different objectives such as prioritization and optimized throughput. Overall, our results show that in the context of a workload comprised of SPEC benchmark threads, our best architected policies can come within 1 % of the performance of an ideal oracle, while meeting a given chip-level power budget. Furthermore, we show that these global dynamic management policies perform significantly better than static management, even if static scheduling is given oracular knowledge. 1
Analysis of dynamic voltage/frequency scaling in chip-multiprocessors
- in International Symposium on Low Power Electronics and Design
, 2007
"... Fine-grained dynamic voltage/frequency scaling (DVFS) demonstrates great promise for improving the energy-efficiency of chip-multiprocessors (CMPs), which have emerged as a popular way for designers to exploit growing transistor budgets. We examine the tradeoffs involved in the choice of both DVFS c ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
Fine-grained dynamic voltage/frequency scaling (DVFS) demonstrates great promise for improving the energy-efficiency of chip-multiprocessors (CMPs), which have emerged as a popular way for designers to exploit growing transistor budgets. We examine the tradeoffs involved in the choice of both DVFS control scheme and method by which the processor is partitioned into voltage/frequency islands (VFIs). We simulate real multithreaded commercial and scientific workloads, demonstrating the large real-world potential of DVFS for CMPs. Contrary to the conventional wisdom, we find that the benefits of per-core DVFS are not necessarily large enough to overcome the complexity of having many independent VFIs per chip.
Voltage and Frequency Control With Adaptive Reaction Time
- in Multiple-Clock Domain Processors,” Proc. of 11 th Symposium on HPCA
, 2005
"... Dynamic voltage and frequency scaling (DVFS) is a widely-used method for energy-efficient computing. In this paper, we present a new intra-task online DVFS scheme for multiple clock domain (MCD) processors. Most existing online DVFS schemes for MCD processors use a fixed time interval between possib ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Dynamic voltage and frequency scaling (DVFS) is a widely-used method for energy-efficient computing. In this paper, we present a new intra-task online DVFS scheme for multiple clock domain (MCD) processors. Most existing online DVFS schemes for MCD processors use a fixed time interval between possible voltage /frequency changes. The downside to this approach is that the interval boundaries are predetermined and independent of workload changes. Thus, they can be late in responding to large, severe activity swings. In this work, we propose an alternative online DVFS scheme in which the reaction time is self-tuned and adaptive to application and workload changes. In addition to designing such a scheme, we model the proposed DVFS control and use the derived model in a formal stability analysis. The obtained analytical insight is then used to guide and improve the design in terms of stability margin and control effectiveness. We evaluate our DVFS scheme through cycle-accurate simulation over a wide set of MediaBench and SPEC2000 benchmarks. Compared to the best-known prior fixed-interval DVFS schemes for MCD processors, the proposed DVFS scheme has a simpler decision process, which leads to smaller and cheaper hardware. Our scheme has achieved significant energy savings over all studied benchmarks (19 % energy savings with 3 % performance degradation on average, which is close to the best results from existing fixed-interval DVFS schemes). For a group of applications with fast workload variations, our scheme outperforms existing fixedinterval DVFS schemes significantly due to its adaptive nature. Overall, we feel the proposed adaptive online DVFS scheme is an effective and promising alternative to existing fixed-interval DVFS schemes. Designers may choose the new scheme for processors with limited hardware budget, or if the anticipated workload behavior is variable. In addition, the modeling and analysis techniques in this work serve as examples of using stability analysis in other aspects of high-performance CPU design and control.
Coordinated, distributed, formal energy management of chip multiprocessors
- In ISLPED ’05: Proceedings of the 2005 International Symposium on Low Power Electronics and Design
, 2005
"... ABSTRACT Designers are moving toward chip-multiprocessors (CMPs) to lever-age application parallelism for higher performance while keeping ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
ABSTRACT Designers are moving toward chip-multiprocessors (CMPs) to lever-age application parallelism for higher performance while keeping
A High Performance, Energy Efficient, GALS Processor Microarchitecture with Reduced Implementation Complexity
- In International Symposium on Performance Analysis of Systems and Software
, 2005
"... As the costs and challenges of global clock distribution grow with each new microprocessor generation, a Globally Asynchronous, Locally Synchronous (GALS) approach becomes an attractive alternative. One proposed GALS approach, called a Multiple Clock Domain (MCD) processor, achieves impressive energ ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
As the costs and challenges of global clock distribution grow with each new microprocessor generation, a Globally Asynchronous, Locally Synchronous (GALS) approach becomes an attractive alternative. One proposed GALS approach, called a Multiple Clock Domain (MCD) processor, achieves impressive energy savings for a relatively low performance cost. However, the approach requires separating the processor into four domains, including separating the integer and memory domains which complicates load scheduling, and the implementation of 32 voltage and frequency levels in each domain. In addition, the hardwarebased control algorithm, though effective overall, produces a significant performance degradation for some applications. In this paper, we devise modifications to the MCD design that retain many of its benefits while greatly reducing the implementation complexity. We first determine that the synchronization channels that are most responsible for the MCD performance degradation are those involving cache access, and propose merging the integer and memory domains to virtually eliminate this overhead. We further propose significantly reducing the number of voltage levels, separating the Reorder Buffer into its own domain to permit front-end frequency scaling, separating the L2 cache to permit standard power optimizations to be used, and a new online algorithm that provides consistent results across our benchmark suite. The overall result is a significant reduction in the performance degradation of the original MCD approach and greater energy savings, with a greatly simplified microarchitecture that is much easier to implement.
Independent front-end and back-end dynamic voltage scaling for a gals microarchitecture
- In ISLPED ’06: Proceedings of the 2006 International Symposium on Low Power Electronics and Design
, 2006
"... In recent years, Globally Asynchronous Locally Synchronous (GALS) designs and dynamic voltage scaling (DVS) have emerged as some of the most popular approaches to address the ever increasing microprocessor energy consumption. In this work, we propose two on-line algorithms for adjusting dynamically, ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
In recent years, Globally Asynchronous Locally Synchronous (GALS) designs and dynamic voltage scaling (DVS) have emerged as some of the most popular approaches to address the ever increasing microprocessor energy consumption. In this work, we propose two on-line algorithms for adjusting dynamically, and independently, the voltage and frequency of the front-end and back-end domains of a novel two-domain microprocessor. We evaluate our mechanisms for both internal and external voltage regulators, and we present optimal dynamic voltage scaling results for the proposed microarchitecture. Our schemes achieve average improvement of 12 % of the energy-delay 2 metric, when using internal voltage regulators.
Thermal Modeling and Management of DRAM Memory Systems
- In Proceedings of ISCA
, 2007
"... With increasing speed and power density, high-performance memories, including FB-DIMM (Fully Buffered DIMM) and DDR2 DRAM, now begin to require dynamic thermal management (DTM) as processors and hard drives did. The DTM of memories, nevertheless, is different in that it should take the processor per ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
With increasing speed and power density, high-performance memories, including FB-DIMM (Fully Buffered DIMM) and DDR2 DRAM, now begin to require dynamic thermal management (DTM) as processors and hard drives did. The DTM of memories, nevertheless, is different in that it should take the processor performance and power consumption into consideration. Existing schemes have ignored that. In this study, we investigate a new approach that controls the memory thermal issues from the source generating memory activities – the processor. It will smooth the program execution when compared with shutting down memory abruptly, and therefore improve the overall system performance and power efficiency. For multicore systems, we propose two schemes called adaptive core gating and coordinated DVFS. The first scheme activates clock gating on selected processor cores and the second one scales down the frequency and voltage levels of processor cores when the memory is to be overheated. They can successfully control the memory activities and handle thermal emergency. More importantly, they improve performance significantly under the given thermal envelope. Our simulation results show that adaptive core gating improves performance by up to 23.3 % (16.3 % on average) on a four-core system with FB-DIMM when compared with DRAM thermal shutdown; and coordinated DVFS with control-theoretic methods improves the performance by up to 18.5 % (8.3 % on average).
DVS for buffer-constrained architectures with predictable QoS-energy tradeoffs
- In: CODES+ISSS ’05: Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
, 2005
"... We present a new scheme for dynamic voltage and frequency scaling (DVS) for processing multimedia streams on architectures with restricted buffer sizes. The main advantage of our scheme over previously published DVS schemes is its ability to provide hard QoS guarantees while still achieving consider ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We present a new scheme for dynamic voltage and frequency scaling (DVS) for processing multimedia streams on architectures with restricted buffer sizes. The main advantage of our scheme over previously published DVS schemes is its ability to provide hard QoS guarantees while still achieving considerable energy savings. Our scheme can handle workloads characterized by both, the datadependent variability in the execution time of multimedia tasks and the burstiness in the on-chip traffic arising out of multimedia processing. Many previous DVS algorithms capable of handling such workloads rely on control-theoretic feedback mechanisms or prediction schemes based on probabilistic techniques. Usually it is difficult to provide QoS guarantees with such schemes. In contrast, our scheme relies on worst-case interval-based characterization of the workload. The main novelty of our scheme is a combination of offline analysis and runtime monitoring to obtain worst case bounds on the workload and then improving these bounds at runtime. Our scheme is fully scalable and has a bounded application-independent runtime overhead. Categories and Subject Descriptors C.3 [Computer Systems Organization]: Special-purpose and application-based systems—Real-time and embedded systems
Hardware based frequency/voltage control of voltage frequency island systems
- In CODES+ISSS ’06: Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
, 2006
"... The ability to do fine grain power management via local voltage selection has shown much promise via the use of Voltage/ Frequency Islands (VFIs). VFI-based designs combine the advantages of using fine-grain speed and voltage control for reducing energy requirements, while allowing for maintaining p ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
The ability to do fine grain power management via local voltage selection has shown much promise via the use of Voltage/ Frequency Islands (VFIs). VFI-based designs combine the advantages of using fine-grain speed and voltage control for reducing energy requirements, while allowing for maintaining performance constraints. We propose a hardware based technique to dynamically change the clock frequencies and potentially voltages of a VFI system driven by the dynamic workload. This technique tries to change the frequency of a synchronous island such that it will have efficient power utilization while satisfying performance constraints. We propose a hardware design that can be used to change the frequencies of various synchronous islands interconnected together by mixed-clock/mixed-voltage FIFO interfaces. Results show up to 65 % power savings for the set of benchmarks considered with no loss in throughput.
Compiler control power saving scheme for multi core processors
- Comm. Math. Univ. Carolinae
, 2005
"... Abstract. With the increase of transistors integrated onto a chip, multi core processor architectures have attracted much attention to achieve high effective performance, shorten development period and reduce the power consumption. To this end, the compiler for a multi core processor is expected not ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract. With the increase of transistors integrated onto a chip, multi core processor architectures have attracted much attention to achieve high effective performance, shorten development period and reduce the power consumption. To this end, the compiler for a multi core processor is expected not only to parallelize program effectively, but also to control the voltage and clock frequency of processors and storages carefully inside an application program. This paper proposes a compilation scheme for reduction of power consumption under the multigrain parallel processing environment that controls Voltage/Frequency and power supply of each processor core on a chip. In the evaluation, the OSCAR compiler with the proposed scheme achieves 60.7 percent energy savings for SPEC CFP95 applu without performance degradation on 4 processors, and 45.4 percent energy savings for SPEC CFP95 tomcatv with real-time deadline constraint on 4 processors, and 46.5 percent energy savings for SPEC CFP95 swim with the deadline constraint on 4 processors. 1

