Results 1 - 10
of
16
Wattch: A Framework for Architectural-Level Power Analysis and Optimizations
- In Proceedings of the 27th Annual International Symposium on Computer Architecture
, 2000
"... Power dissipation and thermal issues are increasingly significant in modern processors. As a result, it is crucial that power/performance tradeoffs be made more visible to chip architects and even compiler writers, in addition to circuit designers. Most existing power analysis tools achieve high ..."
Abstract
-
Cited by 843 (34 self)
- Add to MetaCart
Power dissipation and thermal issues are increasingly significant in modern processors. As a result, it is crucial that power/performance tradeoffs be made more visible to chip architects and even compiler writers, in addition to circuit designers. Most existing power analysis tools achieve high accuracy by calculating power estimates for designs only after layout or floorplanning are complete In addition to being available only late in the design process, such tools are often quite slow, which compounds the difficulty of running them for a large space of design possibilities.
Dynamically Exploiting Narrow Width Operands to Improve Processor Power and Performance
, 1999
"... In general-purpose microprocessors, recent trends have pushed towards 64-bit word widths, primarily to accommodate the large addressing needs of some programs. Many integer problems, however, rarely need the full 64-bit dynamic range these CPUs provide. In fact, another recent instruction set trend ..."
Abstract
-
Cited by 124 (7 self)
- Add to MetaCart
In general-purpose microprocessors, recent trends have pushed towards 64-bit word widths, primarily to accommodate the large addressing needs of some programs. Many integer problems, however, rarely need the full 64-bit dynamic range these CPUs provide. In fact, another recent instruction set trend has been increased support for sub-word operations (that is, manipulating data in quantities less than the full word size). In particular, most major processor families have introduced "multimedia" instruction set extensions that operate in parallel on several sub-word quantities in the same ALU. This paper notes that across the SPECint95 benchmarks, over half of the integer operation executions require 16 bits or less. With this as motivation, our work proposes hardware mechanisms that dynamically recognize and capitalize on these "narrow-bitwidth" instances. Both optimizations require little additional hardware, and neither requires compiler support. The first, power-oriented, optimizati...
Value-Based Clock Gating and Operation Packing: Dynamic Strategies for Improving Processor Power and Performance
- ACM Transactions on Computer Systems
, 2000
"... This article presents our observations demonstrating that operations on "narrow-width" quantities are common not only in multimedia codes, but also in more general workloads. In fact, across the SPECint95 benchmarks, over half the integer operation executions require 16 bits or less. Based on this d ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
This article presents our observations demonstrating that operations on "narrow-width" quantities are common not only in multimedia codes, but also in more general workloads. In fact, across the SPECint95 benchmarks, over half the integer operation executions require 16 bits or less. Based on this data, we propose two hardware mechanisms that dynamically recognize and capitalize on these narrow-width operations. The first, power-oriented optimization reduces processor power consumption by using operand-value-based clock gating to turn off portions of arithmetic units that will be unused by narrow-width operations. This optimization results in a 45%-60% reduction in the integer unit's power consumption for the SPECint95 and MediaBench benchmark suites. Applying this optimization to SPECfp95 benchmarks results in slightly smaller power reductions, but still seems warranted. These reductions in integer unit power consumption equate to a 5%--10% full-chip power savings. Our second, performance-oriented optimization improves processor performance by packing together narrow-width operations so that they share a single arithmetic unit. Conceptually similar to a dynamic form of MMX, this optimization offers speedups of 4.3%--6.2% for SPECint95 and 8.0%--10.4% for MediaBench. Overall, these optimizations highlight an increasing opportunity for value-based optimizations to improve both power and performance in current microprocessors
Interleaving buffer insertion and transistor sizing into a single optimization
- IEEE Transactions on VLSI
, 1998
"... Buffer insertion is a technique that is used either to increase the driving power of a path in a circuit, or to isolate large capacitive loads that lie on noncritical or less critical paths. Gate sizing sets the sizes of gates within a circuit to achieve a given timing specification. Traditional des ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
Buffer insertion is a technique that is used either to increase the driving power of a path in a circuit, or to isolate large capacitive loads that lie on noncritical or less critical paths. Gate sizing sets the sizes of gates within a circuit to achieve a given timing specification. Traditional design techniques perform gate sizing and buffer insertion as two separate and independent steps during synthesis. However, until sizing is performed, any information on capacitive loads is incomplete and therefore a buffer insertion algorithm must operate with incomplete information, leading to suboptimal results. Moreover, the insertion of buffers can change the structure of the circuit sufficiently so that it may lead to a different sizing solution from the unbuffered circuit. Therefore, these techniques of buffer insertion and sizing are intimately linked and it makes a lot of sense to integrate them into a single optimization. This work presents strategies to insert buffers in a circuit, combined with gate sizing, to achieve better power-delay and area-delay tradeoffs. The purpose of this work is to examine how combining sizing algorithm with buffer insertion will help us achieve better area-delay or power-delay tradeoffs, and to determine where and when to insert buffers in a circuit. The delay model incorporates placement-based information and the effect of input slew rates on gate delays. The results obtained by using the new method are significantly better than the results
Power-Delay Optimizations in Gate Sizing
, 2000
"... The problem of power-delay tradeoffs in transistor sizing is examined using a nonlinear optimization formulation. Both the dynamic and the short-circuit power are considered, and a new modeling technique is used to calculate the short-circuit power. The notion of transition density is used, with an ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
The problem of power-delay tradeoffs in transistor sizing is examined using a nonlinear optimization formulation. Both the dynamic and the short-circuit power are considered, and a new modeling technique is used to calculate the short-circuit power. The notion of transition density is used, with an enhancement that considers the effect of gate delays on the transition density. When the short-circuit power is neglected, the minimum power circuit is identical to the minimum area circuit. However, under our more realistic models, our experimental results on several circuits show that the minimum power circuit is not necessarily the same as the minimum area circuit.
On Short Circuit Power Estimation of CMOS Inverters
- IEEE ICCD
, 1998
"... Traditional power optimization and estimation techniques for digital CMOS circuits have focused on the dynamic power dissipation, caused by charging and discharging the load capacitances at the gate outputs. However, as the device size and threshold voltage continue to decrease, the short circuit po ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Traditional power optimization and estimation techniques for digital CMOS circuits have focused on the dynamic power dissipation, caused by charging and discharging the load capacitances at the gate outputs. However, as the device size and threshold voltage continue to decrease, the short circuit power dissipation is no longer a negligible factor. We show that previously published models for the short circuit power can not provide the accuracies required for current technologies. To improve the accuracy, we propose a new semi-empirical short circuit power model. Comparison of the porposed model with HSPICE simulation results on CMOS inverters using the Rockwell 0.25 ¯m CMOS process parameters show that proposed model is significantly more accurate for estimating the short circuit power than the models reported in the literature. 1 Introduction Reduction of power dissipation in CMOS digital circuits has become an increasingly important design optimization goal. Although there are seve...
Nassek, “Minimizing Gate Capacitances with Transistor Sizing
- in Proc. of IEEE International Symp. Circuits and Systems
, 2001
"... In this paper a method for choosing appropriate transistor topology for use with transistor sizing is presented. In combinatorial blocks of static CMOS circuits transistor sizing can be applied for delay balancing in order to guarantee synchronously arriving signal slopes at the input of logic gates ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
In this paper a method for choosing appropriate transistor topology for use with transistor sizing is presented. In combinatorial blocks of static CMOS circuits transistor sizing can be applied for delay balancing in order to guarantee synchronously arriving signal slopes at the input of logic gates. Since the delay of a logic gate depends directly on transistor sizes, the variation of channel-widths and-lengths (W and L) allows to equalize different path delays without influencing the total propagation delay of the circuit. Thus, glitching can be avoided. To achieve optimal results, transistor lengths have to be increased, which results in both increased gate capacitances and area. Splitting the long transistors counteracts this negative influence and reduces the power dissipated. A program GliMATS for automated circuit optimization has been implemented. Experimental results show that significant power savings can be achieved with this method. 1.
Asynchronous Techniques for Power-Adaptive Processing
, 2002
"... Declaration 10 Copyright 11 The author 12 Acknowledgements 13 1 ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Declaration 10 Copyright 11 The author 12 Acknowledgements 13 1
Microarchitectural Level Power Analysis and Optimization in Single Chip Parallel Computers
, 2004
"... As device technologies migrate into Deep Submicron (DSM) feature sizes, highperformance power-efficient computer architectures that keep pace with improving technologies need to be explored. Technology scaling increases the effects of wire latencies, inductive effects, noise and crosstalk in on-chip ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
As device technologies migrate into Deep Submicron (DSM) feature sizes, highperformance power-efficient computer architectures that keep pace with improving technologies need to be explored. Technology scaling increases the effects of wire latencies, inductive effects, noise and crosstalk in on-chip communication, limiting the performance of DSM designs. Power efficient performance gains from Instruction Level Parallelism (ILP) are reaching a limit. Single-Chip Parallel Computers are promising solutions to the DSM design challenges and the performance limitations of ILP. These systems are explicitly modular architectures that efficiently support Thread Level Parallelism (TLP) while avoiding global signals and shared resources. Microarchitectural level power analysis is required for evaluating the feasibility of newly conceived architectures in terms of power dissipation and energy efficiency. Accounting for power in the early stages of design shortens the time-to-market due to reduced design iteration times. Power optimizations at the architectural level can yield large power savings. This thesis proposes a microarchitectural level power estimation and
Burn-in temperature projections for deep sub-micron technologies
- in Proc. Int. Test Conf
"... Burn-in faces significant challenges in recent CMOS technologies. The self-generated heat of each IC in a burn-in environment contributes to larger currents that can lead to further increase in junction temperatures, possible thermal run away, and yield-loss of good parts. Calculations show that the ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Burn-in faces significant challenges in recent CMOS technologies. The self-generated heat of each IC in a burn-in environment contributes to larger currents that can lead to further increase in junction temperatures, possible thermal run away, and yield-loss of good parts. Calculations show that the junction temperature is increasing by 1.45X/generation. This paper estimates the increase in junction temperature with scaling and discusses the optimal burn-in temperature with scaling. Our research indicates that the burn-in temperature must also be reduced with technology scaling. The impact on commercial burn-in ovens is also described. 1.

