Results 1 - 10
of
11
Orion: A Power-Performance Simulator for Interconnection Networks
, 2002
"... With the prevalence of server blades and systems-ona -chip (SoCs), interconnection networks are becoming an important part of the microprocessor landscape. However, there is limited tool support available for their design. While performance simulators have been built that enable performance estimati ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
With the prevalence of server blades and systems-ona -chip (SoCs), interconnection networks are becoming an important part of the microprocessor landscape. However, there is limited tool support available for their design. While performance simulators have been built that enable performance estimation while varying network parameters, these cover only one metric of interest in modern designs. System power consumption is increasingly becoming equally, if not more important than performance. It is now critical to get detailed power-performance tradeoff information early in the microarchitectural design cycle. This is especially so as interconnection networks consume a significant fraction of total system power. It is exactly this gap that the work presented in this paper aims to fill. We present
Managing Static Leakage Energy in Microprocessor Functional Units
- In Proceedings of the 35th Annual IEEE/ACM International Symposium on Microarchitecture
, 2002
"... Static energy due to subthreshold leakage current is projected to become a major component of the total energy in high performance microprocessors. Many studies so far have examined and proposed techniques to reduce leakage in on-chip storage structures. In this study, static energy is reduced in th ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
Static energy due to subthreshold leakage current is projected to become a major component of the total energy in high performance microprocessors. Many studies so far have examined and proposed techniques to reduce leakage in on-chip storage structures. In this study, static energy is reduced in the integer functional units by leveraging the unique qualities of dual threshold voltage domino logic.
Width-adaptive and non-uniform access asynchronous register files
, 2003
"... Register files of microprocessors have often been cited as performance bottlenecks and significant consumers of energy. The robust and modular nature of quasi-delay insensitive (QDI) design offers a toolchest of techniques for improving average-case performance and reducing energy consumption of reg ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Register files of microprocessors have often been cited as performance bottlenecks and significant consumers of energy. The robust and modular nature of quasi-delay insensitive (QDI) design offers a toolchest of techniques for improving average-case performance and reducing energy consumption of register files, which cannot be leveraged as easily in synchronous designs. In this paper, we focus on the design of an asynchronous register core, the heart of a register file. We describe the vertical pipelining transformation and describe the locking mechanism that maintains pipelined mutual exclusion among reads and writes to the same register. The primary contributions of this paper are 1) detailed evaluation of the width-adaptive datapath (WAD) representation in register files, which leads to significant energy reduction by conditionally communicating higher significant bits of integers with little performance degradation, and 2) ‘nesting ’ the register core to create non-uniform banks to facilitate faster and lower energy accesses to more frequently used registers and slower accesses to less frequently used registers without increasing the interconnect requirement or control complexity. We present spice-simulated results for a wide variety of register files laid out in TSMC.18µm technology. 1.
Power-Aware Operating Systems using ACPI
, 2001
"... The idea behind today's power management is to reduce the power usage of systems when someone walks away from their PC or stops using after a period of time, by sending the hardware into a ``sleep mode'' or ``power-saving mode''. Hardware components such as processors, memory chips and disks have ad ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The idea behind today's power management is to reduce the power usage of systems when someone walks away from their PC or stops using after a period of time, by sending the hardware into a ``sleep mode'' or ``power-saving mode''. Hardware components such as processors, memory chips and disks have advanced power management techniques (besides ``power-saving'' modes) to consume less power. However, operating systems are being designed with little regard for power consumption. Even though, operating systems control hardware resources, and provide a virtualized interface for applications running on top it, they are largely unaware of power-saving optimizations that hardware has (does). With the prolific growth in the usage of laptops, PDAs, servers and workstations, operating systems and applications need to be aware of how much power they consume. In this project, we propose a model that uses the ACPI specifications to make operating systems and applications power-aware. The model uses an user-level power manager to make most of policy decisions for conserving power without significantly compromising performance. We use this model to implement hard-disk spin-down and temperature sensitive scheduler.
A Power Reduction Scheme for Data Buses by Dynamic Detection of Active Bits
- IEICE Trans. Electron
, 2004
"... To transfer a small number, we inherently need the small number of bits. But all bit lines on a data bus change their status and redundant power is consumed. To reduce the redundant power consumption, we introduce a concept named active bits. In this paper, we propose a power reduction scheme for da ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
To transfer a small number, we inherently need the small number of bits. But all bit lines on a data bus change their status and redundant power is consumed. To reduce the redundant power consumption, we introduce a concept named active bits. In this paper, we propose a power reduction scheme for data buses using the active bits. Suppressing switching activity of inactive bits, we can reduce redundant power consumption. We propose various power reduction techniques using active bits and the implementation methods. Experimental results illustrate 20 %- 35 % on average and up to 54.2% switching activity reduction. 1.
A Small, Fast and Low-Power Register File by Bit-Partitioning
"... A large multi-ported register file is indispensable for exploiting instruction level parallelism (ILP) in today’s dynamically scheduled superscalar processors. The number of ports and the size of the register file must be enlarged as the issue width and instruction window size increase. However, a l ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
A large multi-ported register file is indispensable for exploiting instruction level parallelism (ILP) in today’s dynamically scheduled superscalar processors. The number of ports and the size of the register file must be enlarged as the issue width and instruction window size increase. However, a larger register file causes longer access delays and more power consumption. To tackle these problems, we propose Bit-Partitioned Register File which reduces the area, access time, and energy consumption of the register file. The proposed method relies on the fact that many operands do not need the full-bit width (typically a 32-bit or 64-bit width) of a register entry. Because the effective bit-width of most register operands is narrower than the full-bit width of a register entry, the upper bits of the register entries assigned to such narrow-width operands are useless. Thus, we propose to use of these useless upper bits for other operands by partitioning the register entries. In this paper, we show the mechanism of the proposed register file and evaluate its performance and power consumption. The evaluation results reveal that the proposed register file achieves higher Instruction Per Cycle (IPC) in a smaller physical area, and consequently with shorter access time and less power consumption. 1.
Asynchronous Techniques for Power-Adaptive Processing
, 2002
"... Declaration 10 Copyright 11 The author 12 Acknowledgements 13 1 ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Declaration 10 Copyright 11 The author 12 Acknowledgements 13 1
This research was sponsored in part by DARPA PAC/C, by Intel Corporation, and by Semiconductor Research Corporation.
- In Proc. of 9 th Int’l Symp. on High Performance Computer Architecture (HPCA
, 2003
"... With the scaling of technology and the need for higher performance and more functionality, power dissipation is becoming a major bottleneck for microprocessor designs. Pipeline balancing (PLB), a previous technique, is essentially a methodology to clockgate unused components whenever a program's ins ..."
Abstract
- Add to MetaCart
With the scaling of technology and the need for higher performance and more functionality, power dissipation is becoming a major bottleneck for microprocessor designs. Pipeline balancing (PLB), a previous technique, is essentially a methodology to clockgate unused components whenever a program's instruction-level parallelism is predicted to be low. However, no non-predictive methodologies are available in the literature for efficient clock gating. This paper introduces deterministic clock gating (DCG) based on the key observation that for many of the stages in a modern pipeline, a circuit block's usage in a specific cycle in the near future is deterministically known a few cycles ahead of time. Our experiments show an average of 19.9% reduction in processor power with virtually no performance loss for an 8-issue, out-of-order superscalar processor by applying DCG to execution units, pipeline latches, D-Cache wordline decoders, and result bus drivers. In contrast, PLB achieves 9.9% average power savings at 2.9% performance loss.
A Detailed Study of Hardware Techniques That Dynamically Exploit Frequent Operands to Reduce Power Consumption in Integer Function Units
, 2003
"... The use of multiple pipelines in superscalar processors and increasing hardware support for intelligent compilers have resulted in processors becoming computationally intensive. There has been an increase in the number of function units, which are wider and operate at higher speeds. A high percentag ..."
Abstract
- Add to MetaCart
The use of multiple pipelines in superscalar processors and increasing hardware support for intelligent compilers have resulted in processors becoming computationally intensive. There has been an increase in the number of function units, which are wider and operate at higher speeds. A high percentage of integer instructions executed in standard benchmarks reveal a high usage of integer function units.
WIDTH-ADAPTIVE AND NON-UNIFORM ACCESS
, 2004
"... At the heart of practically every modern microprocessor core sits some form of register file, whose purpose is to hold and supply intermediate results of computations to other computation units. As register files grow in size and in the number of ports to support increasing instruction-level paralle ..."
Abstract
- Add to MetaCart
At the heart of practically every modern microprocessor core sits some form of register file, whose purpose is to hold and supply intermediate results of computations to other computation units. As register files grow in size and in the number of ports to support increasing instruction-level parallelism (ILP), it becomes extremely difficult to meet timing requirements in clocked designs, and the energy consumed by accesses increases significantly. Asynchronous microprocessors share many of the same design issues, however, we have at our disposal a different family of techniques due to the robust and modular nature of self-timed design. Starting with a sequential specification of a typical asynchronous register file, we decompose the specification into fine-grain parallel processes for the core, bypass and control that implement the specified register file. To improve the throughput of the core, we vertically pipeline the read and write ports into smaller blocks of data, and we describe the locking mechanism that maintains pipelined mutual exclusion among reads and writes. Using standard handshaking expansion templates, we synthesize quasi-delay insensitive production rules that describe the circuits for

