Results 1 - 10
of
12
Low-Power FPGA Using Pre-defined Dual-Vdd/Dual-Vt Fabrics
- FPGA'04
, 2004
"... Traditional FPGAs use uniform supply voltage Vdd and uniform threshold voltage Vt. We propose to use pre-defined dual-Vdd and dual-Vt fabrics to reduce FPGA power. We design FPGA circuits with dual-Vdd/dual-Vt to e#ectively reduce both dynamic power and leakage power, and define dual-Vdd/dual-Vt ..."
Abstract
-
Cited by 31 (10 self)
- Add to MetaCart
Traditional FPGAs use uniform supply voltage Vdd and uniform threshold voltage Vt. We propose to use pre-defined dual-Vdd and dual-Vt fabrics to reduce FPGA power. We design FPGA circuits with dual-Vdd/dual-Vt to e#ectively reduce both dynamic power and leakage power, and define dual-Vdd/dual-Vt FPGA fabrics based on the profiling of benchmark circuits. We further develop CAD algorithms including power-sensitivity based voltage assignment and simulated-annealing based placement to leverage such fabrics. Compared to the conventional fabric using uniform Vdd/Vt at the same target clock frequency, our new fabric using dual Vt achieves 9% to 20% power reduction. However, the pre-defined FPGA fabric using both dual Vdd and dual Vt only achieves on average 2% extra power reduction. It is because that the pre-designed dual-Vdd layout pattern introduces non-negligible performance penalty. Therefore, programmability of supply voltage is needed to achieve significant power saving for dual-Vdd FPGAs. To our best knowledge, it is the first in-depth study on applying both dual-Vdd and dual-Vt to FPGA considering circuits, fabrics and CAD algorithms.
FPGA Power Reduction Using Configurable Dual-Vdd
, 2004
"... Power optimization is of growing importance for FPGAs in nanometer technologies. Considering dual-Vdd technique, we show that configurable power supply is required to obtain a satisfactory performance and power tradeo#. We design FPGA circuits and logic fabrics using configurable dualVdd and develop ..."
Abstract
-
Cited by 28 (12 self)
- Add to MetaCart
Power optimization is of growing importance for FPGAs in nanometer technologies. Considering dual-Vdd technique, we show that configurable power supply is required to obtain a satisfactory performance and power tradeo#. We design FPGA circuits and logic fabrics using configurable dualVdd and develop the corresponding CAD flow to leverage such circuits and logic fabrics. We then carry out a highly quantitative study using area, delay and power models obtained from detailed circuit design and SPICE simulation in 100nm technology. Compared to single-Vdd FPGAs with optimized Vdd level for the same target clock frequency, configurable dual-Vdd FPGAs with full and partial supply programmability for logic blocks reduce logic power by 35.46% and 28.62% respectively and reduce total FPGA power by 14.29% and 9.04% respectively. To the best of our knowledge, it is the first in-depth study on FPGAs with configurable dual-Vdd for power reduction.
Methods for True Energy-Performance Optimization
, 2004
"... This paper presents methods for efficient energyperformance optimization at the circuit and micro-architectural levels. The optimal balance between energy and performance is achieved when the sensitivity of energy to a change in performance is equal for all the design variables. The sensitivity-base ..."
Abstract
-
Cited by 24 (9 self)
- Add to MetaCart
This paper presents methods for efficient energyperformance optimization at the circuit and micro-architectural levels. The optimal balance between energy and performance is achieved when the sensitivity of energy to a change in performance is equal for all the design variables. The sensitivity-based optimizations minimize energy subject to a delay constraint. Energy savings of about 65% can be achieved without delay penalty with equalization of sensitivities to sizing, supply, and threshold voltage in a 64-bit adder, compared to the reference design sized for minimum delay. Circuit optimization is effective only in the region of about 30% around the reference delay; outside of this region the optimization becomes too costly either in terms of energy or delay. Using optimal energy--delay tradeoffs from the circuit level and introducing more degrees of freedom, the optimization is hierarchically extended to higher abstraction layers. We focus on the micro-architectural optimization and demonstrate that the scope of energy-efficient optimization can be extended by the choice of circuit topology or the level of parallelism. In a 64-bit ALU example, parallelism of five provides a three-fold performance increase, while requiring the same energy as the reference design. Parallel or time-multiplexed solutions significantly affect the area of their respective designs, so the overall design cost is minimized when optimal energy--area tradeoff is achieved.
Level Conversion for Dual-Supply Systems
- IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS
, 2004
"... Dual-supply voltage design using a clustered voltage scaling (CVS) scheme is an effective approach to reduce chip power. The optimal CVS design relies on a level converter implemented in a flip-flop to minimize energy, delay, and area penalties due to level conversion. Additionally, circuit robustne ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
Dual-supply voltage design using a clustered voltage scaling (CVS) scheme is an effective approach to reduce chip power. The optimal CVS design relies on a level converter implemented in a flip-flop to minimize energy, delay, and area penalties due to level conversion. Additionally, circuit robustness against supply bounce is a key property that differentiates good level converter design. Novel flip-flops presented in this paper incorporate a half-latch level converter and a precharged level converter. These flip-flops are optimized in the energy-delay design space to achieve over 30% reduction of energy-delay product and about 10% savings of total power in a CVS design as compared to the conventional flip-flop. These benefits are accompanied by 24% flip-flop robustness improvement leading to 13% delay spread reduction in a CVS critical path. The proposed flip-flops also show 18% layout area reduction. Advantages of level conversion in a flip-flop over asynchronous level conversion in combinational logic are also discussed in terms of delay penalty and its sensitivity to supply bounce.
A shared-well dual-supply-voltage 64-bit ALU
- in IEEE Int. Solid-State Circuits Conf. (ISSCC’03) Dig. Tech. Papers
, 2003
"... Abstract—A shared n-well layout technique is developed for the design of dual-supply-voltage logic blocks. It is demonstrated on a design of a 64-bit arithmetic logic unit (ALU) module in domino logic. The second supply voltage is used to lower the power of noncritical paths in the sparse, radix-4 6 ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Abstract—A shared n-well layout technique is developed for the design of dual-supply-voltage logic blocks. It is demonstrated on a design of a 64-bit arithmetic logic unit (ALU) module in domino logic. The second supply voltage is used to lower the power of noncritical paths in the sparse, radix-4 64-bit carry-lookahead adder and in the loopback bus. A 3 mm2 test chip in 0.18- m 1.8-V five-metal with local interconnect CMOS technology that contains six ALUs and test circuitry operates at 1.16 GHz at the nominal supply. For target delay increase of 2.8 % energy savings are 25.3 % using dual supplies, while for 8.3 % increase in delay, 33.3 % can be saved. Index Terms—Adder, ALU, CMOS, dual-supply design, low power.
Total Power Optimization through Simultaneously Multiple-VDD Multiple-VTH Assignment and Device Sizing with Stack Forcing
- In Proceedings of International Symposium on Low Power Electronics and Design (ISLPED
, 2004
"... In this paper, we present an algorithm for the minimization of total power consumption via multiple V DD assignment, multiple V TH assignment, device sizing and stack forcing, while maintaining performance requirements. These four power reduction techniques are properly encoded in genetic algorithm ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
In this paper, we present an algorithm for the minimization of total power consumption via multiple V DD assignment, multiple V TH assignment, device sizing and stack forcing, while maintaining performance requirements. These four power reduction techniques are properly encoded in genetic algorithm and evaluated simultaneously. The overhead imposed by the insertion of level converters is also taken into account. The effectiveness of each power reduction mechanism is verified, as are the combinations of different approaches. Experimental results are given for a number of 65 nm benchmark circuits that span typical circuit topologies, including inverter chains, SRAM decoders, multiplier and a 32bit carry adders. From the experimental results, we show that the combination of four low power techniques is the effective way to achieve low power budget.
Maximizing GFLOPS-per-Watt: High-bandwidth, low power photonic on-chip networks
- In P=ac 2 Conference
, 2006
"... As high-performance processors move towards multicore architectures, packet-switched on-chip networks are gaining wide acceptance as interconnect solutions that can directly address the bandwidth and latency requirements as well as provide partial relief to the broader challenge of power dissipation ..."
Abstract
-
Cited by 5 (5 self)
- Add to MetaCart
As high-performance processors move towards multicore architectures, packet-switched on-chip networks are gaining wide acceptance as interconnect solutions that can directly address the bandwidth and latency requirements as well as provide partial relief to the broader challenge of power dissipation. Still, studies show that the power consumed by on-chip networks will remain a major issue that has to be addressed to enable a true leap in future multi-core processors performance. Based on recent and expected technological advances in the integration of silicon photonic elements with CMOS electronics, we consider the usage of photonics to construct an on-chip network, offering unique advantages in terms of energy, bandwidth, and latency. We propose a novel architecture for a photonic on-chip network based on a hybrid approach: a network of wideband photonic switches combined with a parallel electronic control network. A high-level power analysis and comparison with electronic on-chip networks show that some of the advantages that have made photonics ubiquitous in long-haul transmission systems can be leveraged to construct photonic on-chip networks, delivering unprecedented computational capabilities, while operating at a fraction of the power of their electronic counterparts. 1.
On the Design of a Photonic Network-on-Chip
, 2007
"... Recent remarkable advances in nanoscale silicon-photonic integrated circuitry specifically compatible with CMOS fabrication have generated new opportunities for leveraging the unique capabilities of optical technologies in the on-chip communications infrastructure. Based on these nano-photonic build ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Recent remarkable advances in nanoscale silicon-photonic integrated circuitry specifically compatible with CMOS fabrication have generated new opportunities for leveraging the unique capabilities of optical technologies in the on-chip communications infrastructure. Based on these nano-photonic building blocks, we consider a photonic network-on-chip architecture designed to exploit the enormous transmission bandwidths, low latencies, and low power dissipation enabled by data exchange in the optical domain. The novel architectural approach employs a broadband photonic circuit-switched network driven in a distributed fashion by an electronic overlay control network which is also used for independent exchange of short messages. We address the critical network design issues for insertion in chip multiprocessors (CMP) applications, including topology, routing algorithms, path-setup and teardown procedures, and deadlock avoidance. Simulations show that this class of photonic networks-on-chip offers a significant leap in the performance for CMP intrachip communication systems delivering low-latencies and ultra-high throughputs per core while consuming minimal power.
Mixed-Clock Issue Queue Design for Energy Aware, High-Performance Cores
- in Proc. of the Asia and South Pacific Design Automation Conference
, 2004
"... Globally-Asynchronous, Locally-Synchronous (GALS) design style has started to gain interest recently as a possible solution to the increased design complexity, power and thermal costs, as well as an enabler for allowing fine grain speed and voltage management. Due to its inherent complexity, a possi ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Globally-Asynchronous, Locally-Synchronous (GALS) design style has started to gain interest recently as a possible solution to the increased design complexity, power and thermal costs, as well as an enabler for allowing fine grain speed and voltage management. Due to its inherent complexity, a possible driver application for such a design style is the case of superscalar, out-of-order processors. This paper proposes a novel mixed-clock issue queue design, and compares and contrasts this new implementation with existing synchronous or mixed-clock versions of issue queues, used in standalone mode or in conjunction with mixed-clock FIFO (First-In, FirstOut) buffers for inter-domain synchronization. Both transistor level, SPICE simulation, as well as cycle-accurate, microarchitectural analysis, show that cores using mixed-clock issue queues are able to provide better energy-performance operating points when compared to their synchronous or asynchronous FIFO-based counterparts.
Dual-vdd interconnect with chip-level time slack allocation for FPGA power reduction
, 2006
"... To reduce FPGA power, Vdd programmability has been proposed recently to select Vdd-level for interconnects and to powergate unused interconnects. However, Vdd-level converters used in the existing Vdd-programmable method consume a large amount of leakage. In this paper, we propose two ways to avoid ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
To reduce FPGA power, Vdd programmability has been proposed recently to select Vdd-level for interconnects and to powergate unused interconnects. However, Vdd-level converters used in the existing Vdd-programmable method consume a large amount of leakage. In this paper, we propose two ways to avoid using level converters in interconnects, tree based level converter insertion (TLC) and dual-Vdd tree based level converter insertion (dTLC). TLC enforces that there is only one Vdd-level within each routing tree while dTLC can have different Vdd-levels within a routing tree, but no VddL switch drives VddH switches. We develop dual-Vdd assignment algorithms considering chip-level time slack allocation for maximum power reduction. Our algorithms include TLC-S and dTLC-S, power sensitivity based algorithms with implicit time slack allocation and dTLC-LP, a linear programming (LP) based algorithm with explicit time slack allocation. All allocate time slack first to interconnects with higher power sensitivity and assign low-Vdd to them for more power reduction. Experiments show that dTLC-LP obtains the lowest power consumption. Compared to dTLC-LP, dTLC-S obtains slightly higher power consumption but runs 3X faster. Compared to the existing segmentbased level converter insertion (SLC) for dual-Vdd, dTLC-LP reduces interconnect power by 52.90 % without performance loss for the MCNC benchmark circuits.

