Results 1 - 10
of
85
Wattch: A Framework for Architectural-Level Power Analysis and Optimizations
- In Proceedings of the 27th Annual International Symposium on Computer Architecture
, 2000
"... Power dissipation and thermal issues are increasingly significant in modern processors. As a result, it is crucial that power/performance tradeoffs be made more visible to chip architects and even compiler writers, in addition to circuit designers. Most existing power analysis tools achieve high ..."
Abstract
-
Cited by 843 (34 self)
- Add to MetaCart
Power dissipation and thermal issues are increasingly significant in modern processors. As a result, it is crucial that power/performance tradeoffs be made more visible to chip architects and even compiler writers, in addition to circuit designers. Most existing power analysis tools achieve high accuracy by calculating power estimates for designs only after layout or floorplanning are complete In addition to being available only late in the design process, such tools are often quite slow, which compounds the difficulty of running them for a large space of design possibilities.
Dynamic Thermal Management for High-Performance Microprocessors
- In Proceedings of the 7th IEEE Symposium on High-Performance Computer Architecture
, 2001
"... With the increasing clock rate and transistor count of today’s microprocessors, power dissipation is becoming a critical component of system design complexity. Thermal and power-delivery issues are becoming especially critical for high-performance computing systems. In this work, we investigate dyna ..."
Abstract
-
Cited by 189 (3 self)
- Add to MetaCart
With the increasing clock rate and transistor count of today’s microprocessors, power dissipation is becoming a critical component of system design complexity. Thermal and power-delivery issues are becoming especially critical for high-performance computing systems. In this work, we investigate dynamic thermal management as a technique to control CPUpower dissipation. With the increasing usage of clock gating techniques, the average power dissipation typically seen by common applications is becoming much less than the chip’s rated maximum power dissipation. However; system designers still must design thermal heat sinks to withstand the worst-case scenario. We define and investigate the major components of any dynamic thermal management scheme. Specijcally we explore the tradeoffs between several mechanisms for responding to periods of thermal trauma and we consider the effects of hardware and sofnyare implementations. With appropriate dynamic thermal management, the CPU can be designed for a much lower maximum power rating, with minimal performance impact for typical applications. 1
A static power model for architects
- In Proceedings of the 33rd International Symposium on Microarchitecture (MICRO-33
, 2000
"... Static power dissipation due to transistor leakage constitutes an increasing fraction of the total power in modern semiconductor technologies. Current technology trends indicate that the contribution will increase rapidly, reaching one half of total power dissipation within three process generations ..."
Abstract
-
Cited by 112 (1 self)
- Add to MetaCart
Static power dissipation due to transistor leakage constitutes an increasing fraction of the total power in modern semiconductor technologies. Current technology trends indicate that the contribution will increase rapidly, reaching one half of total power dissipation within three process generations. Developing power efficient products will require consideration of static power in the earliest phases of design, including architecture and microarchitecture definition. We propose a simple equation for estimating static power consumption at the architectural level: Pstatic = VCC ⋅ N ⋅ kdesign ⋅ Îleak, where VCC is the supply voltage, N is the number of transistors, kdesign is a design dependent parameter, and Îleak is a technology dependent parameter. This model enables high-level reasoning about the likely static power demands of alternative microarchitectures. Reasonably accurate values for the factors within the equation may be obtained directly from the high-level designs or by straightforward scaling arguments. The factors within the equation also suggest opportunities for static power optimization, including reducing the total number of devices, partitioning the design to allow for lower supply voltages or slower, less leaky transistors, turning off unused devices, favoring certain design styles, and favoring high bandwidth over low latency. Speculation is also examined as a means to employ slower transistors without a significant performance penalty. 1.
An Integrated Circuit/Architecture Approach to Reducing Leakage in Deep-Submicron High-Performance I-Caches
, 2001
"... Deep-submicron CMOS designs maintain high transistor switching speeds by scaling down the supply voltage and proportionately reducing the transistor threshold voltage. Lowering the threshold voltage increases leakage energy dissipation due to subthreshold leakage current even when the transistor is ..."
Abstract
-
Cited by 103 (6 self)
- Add to MetaCart
Deep-submicron CMOS designs maintain high transistor switching speeds by scaling down the supply voltage and proportionately reducing the transistor threshold voltage. Lowering the threshold voltage increases leakage energy dissipation due to subthreshold leakage current even when the transistor is not switching. Estimates suggest a five-fold increase in leakage energy in every future generation. In modern microarchitectures, much of the leakage energy is dissipated in large on-chip cache memory structures with high transistor densities. While cache utilization varies both within and across applications, modern cache designs are fixed in size resulting in transistor leakage inefficiencies. This paper
Bitwidth Analysis with Application to Silicon Compilation
, 2000
"... This paper introduces Bitwise, a compiler that minimizes the bitwidth --- the number of bits used to representeach operand --- for both integers and pointers in a program. By propagating static information both forward and backward in the program dataflowgraph,Bitwise frees the programmer from decla ..."
Abstract
-
Cited by 80 (0 self)
- Add to MetaCart
This paper introduces Bitwise, a compiler that minimizes the bitwidth --- the number of bits used to representeach operand --- for both integers and pointers in a program. By propagating static information both forward and backward in the program dataflowgraph,Bitwise frees the programmer from declaring bitwidth invariants in cases where the compiler can determine bitwidths automatically. We find a rich opportunity for bitwidth reduction in modern multimedia and streaming application workloads. For new architectures that support sub-word quantities, we expect that our bitwidth reductions will savepower and increase processor performance. This paper
Very low power pipelines using significance compression
, 2000
"... Data, addresses, and instructions are compressed by maintaining only significant bytes with two or three extension bits appended to indicate the significant byte positions. This significance compression method is integrated into a 5-stage pipeline, with the extension bits flowing down the pipeline t ..."
Abstract
-
Cited by 54 (2 self)
- Add to MetaCart
Data, addresses, and instructions are compressed by maintaining only significant bytes with two or three extension bits appended to indicate the significant byte positions. This significance compression method is integrated into a 5-stage pipeline, with the extension bits flowing down the pipeline to enable pipeline operations only for the significant bytes. Consequently register, logic, and cache activity (and dynamic power) are substantially reduced. An initial trace-driven study shows reduction in activity of approximately 30-40 % for each pipeline stage. Several pipeline organizations are studied. A byte serial pipeline is the simplest implementation, but suffers a CPI (cycles per instruction) increase of 79 % compared with a conventional 32-bit pipeline. Widening certain pipeline stages in order to balance processing bandwidth leads to an implementation with a CPI 24 % higher than the baseline 32-bit design. Finally, full-width pipeline stages with operand gating achieve a CPI within 2-6 % of the baseline 32-bit pipeline. 1.
Joint Local and Global Hardware Adaptations for Energy
- IN PROC. OF THE 10TH INTL. CONF. ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS
, 2002
"... This work concerns algorithms to control energy-driven architecture adaptations for multimedia applications, without and with dynamic voltage scaling (DVS). We identify a broad design space for adaptation control algorithms based on two attributes: (1) when to adapt or temporal granularity and (2) w ..."
Abstract
-
Cited by 44 (8 self)
- Add to MetaCart
This work concerns algorithms to control energy-driven architecture adaptations for multimedia applications, without and with dynamic voltage scaling (DVS). We identify a broad design space for adaptation control algorithms based on two attributes: (1) when to adapt or temporal granularity and (2) what structures to adapt or spatial granularity. For each attribute, adaptation may be global or local. Our previous work developed a temporally and spatially global algorithm. It invokes adaptation at the granularity of a full frame of a multimedia application (temporally global) and considers the entire hardware con guration at a time (spatially global). It exploits inter-frame execution time variability, slowing computation just enough to eliminate idle time before the real-time deadline. This paper explores temporally and spatially local algorithms and their integration with the previous global algorithm. The local algorithms invoke architectural adaptation within an application frame to exploit intra-frame execution variability, and attempt to save energy without aecting execution time. We consider local algorithms previously studied for non-real-time applications as well as propose new algorithms. We nd that, for systems without and with DVS, the local algorithms are eective in saving energy for multimedia applications, but the new integrated global and local algorithm is best for the systems and applications studied.
DRAM Energy Management Using Software and Hardware Directed Power Mode Control
- IN PROC. THE 7TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTER ARCHITECTURE
, 2001
"... While there have been several studies and proposals for energy conservation for CPUs and peripherals, energy optimization techniques for selective operating mode control of DRAMs have not been fully explored. It has been shown that as much as 90% of overall system energy (excluding I/O) is consumed ..."
Abstract
-
Cited by 44 (10 self)
- Add to MetaCart
While there have been several studies and proposals for energy conservation for CPUs and peripherals, energy optimization techniques for selective operating mode control of DRAMs have not been fully explored. It has been shown that as much as 90% of overall system energy (excluding I/O) is consumed by the DRAM modules, serving as a good candidate for energy optimizations. Further, DRAM technology has also matured to provide several low energy operating modes (power modes), making it an opportunistic moment to conduct studies exploring the potential benefits of mode control techniques. This paper conducts an in-depth investigation of software and hardware techniques to avail of the DRAM mode control capabilities at a module granularity for energy savings.
Reducing Power with Dynamic Critical Path Information
, 2001
"... Recent research has shown that dynamic information regarding instruction criticality can be used to increase microprocessor performance. Critical path information can also be used in processors to achieve a better balance of power and performance. This paper uses the output of a dynamic critical pat ..."
Abstract
-
Cited by 42 (2 self)
- Add to MetaCart
Recent research has shown that dynamic information regarding instruction criticality can be used to increase microprocessor performance. Critical path information can also be used in processors to achieve a better balance of power and performance. This paper uses the output of a dynamic critical path predictor to decrease the power consumption of key portions of the processor without incurring a corresponding decrease in performance. The optimizations include effective use of functional units with different power and latency characteristics and decreased issue logic power. 1.
BitValue Inference: Detecting and Exploiting Narrow Bitwidth Computations
- IN PROCEEDINGS OF THE EUROPAR 2000 EUROPEAN CONFERENCE ON PARALLEL COMPUTING
, 2000
"... We present a compiler algorithm called BitValue, which can discover unused and constant bits in dusty-deck C programs. BitValue uses forward and backward dataflow analyses, generalizing constant-folding and dead-code detection at the bit-level. This algorithm enables compiler optimizations targeting ..."
Abstract
-
Cited by 41 (7 self)
- Add to MetaCart
We present a compiler algorithm called BitValue, which can discover unused and constant bits in dusty-deck C programs. BitValue uses forward and backward dataflow analyses, generalizing constant-folding and dead-code detection at the bit-level. This algorithm enables compiler optimizations targeting special processor architectures for computing on non-standard bitwidths. Using this algorithm we show that up to 36% of the computed bytes are thrown away; also, we show that on average 26.8% of the values computed require 16 bits or less (for programs from SpecINT95 and Mediabench). A compiler for reconfigurable hardware uses this algorithm to achieve substantial reductions (up to 20-fold) in the size of the synthesized circuits.

