Results 1 - 10
of
12
Circuits and Architectures for Field Programmable Gate Array with Configurable Supply Voltage
- IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION SYSTEMS
, 2005
"... Field programmable gate arrays (FPGAs) with supply voltage (Vdd) programmability have been proposed recently to reduce FPGA power, where the Vdd-level can be customized for FPGA circuit elements and unused circuit elements can be power-gated. In this paper, we first design novel Vdd-programmable and ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Field programmable gate arrays (FPGAs) with supply voltage (Vdd) programmability have been proposed recently to reduce FPGA power, where the Vdd-level can be customized for FPGA circuit elements and unused circuit elements can be power-gated. In this paper, we first design novel Vdd-programmable and Vdd-gateable interconnect switches with minimal number of configuration SRAM cells. We then evaluate Vdd-programmable FPGA architectures using the new switches. The best architecture in our study uses Vdd-programmable logic blocks and Vdd-gateable interconnects. Compared to the baseline architecture similar to the leading commercial architecture, our best architecture reduces the minimal energy-delay product by 54.39 % with 17 % more area and 3 % more configuration SRAM cells. Our evaluation results also show that LUT size 4 gives the lowest energy consumption, and LUT size 7 leads to the highest performance, both for all evaluated architectures.
Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates
, 2007
"... ... with lookup tables (LUTs) inside the programmable logic block (PLB) to reduce area and power and increase performance in FP-GAs. However, it is unclear whether incorporating macro-gates with wide inputs inside PLBs is beneficial. In this paper, we first propose a methodology to extract a small s ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
... with lookup tables (LUTs) inside the programmable logic block (PLB) to reduce area and power and increase performance in FP-GAs. However, it is unclear whether incorporating macro-gates with wide inputs inside PLBs is beneficial. In this paper, we first propose a methodology to extract a small set of logic functions that are able to implement a large portion of functions for given FPGA applications. Assuming that the extracted logic functions are implemented by macro-gates in PLBs, we then develop a complete synthesis flow for such heterogeneous PLBs with mixed LUTs and macro-gates. The flow includes a cut-based delay and area optimized technology mapping, a mixed binary integer and linear programming based area recovery algorithm to balance the resource utilization of macro-gates and LUTs for area-efficient packing, and a SAT-based packing. We finally evaluate the proposed heterogeneous FPGA using the newly developed flow and show that mixing LUT and macro-gates, both with 6 inputs, improves performance by 16.5 % and reduces logic area by 30 % compared to using merely 6-input LUTs.
Area and Delay Trade-offs in the Circuit and Architecture . . .
, 2008
"... Field-programmable gate arrays (FPGAs) are used in a wide range of markets that have differing cost, performance and power consumption requirements. It would be advantageous if a single device family could serve these varied needs but the economics of catering to this wide distribution of market dem ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Field-programmable gate arrays (FPGAs) are used in a wide range of markets that have differing cost, performance and power consumption requirements. It would be advantageous if a single device family could serve these varied needs but the economics of catering to this wide distribution of market demands suggest more than one family is appropriate. Consequently, FPGA vendors have moved to provide a more diverse set of families that sit at different points in the areaspeed-power design space. In this work, our goal is to understand the circuit and architectural design attributes of an FPGA that enable tradeoffs between area and speed, and to determine the magnitude of the possible trade-offs. This will be useful for architects seeking to determine the number of device families in a suite of offerings, as well as the changes to make between families. We have found that varying both architecture and transistor sizing of an FPGA allows the effective area to change by a factor of 3.6 from largest to smallest and the speed to change by a factor of 2.6 from fastest to slowest. It is interesting to observe that the range of area and delay tradeoffs possible by varying only the transistor sizing of a single architecture is larger than the ranges observed in past architectural experiments. In addition to transistor size, we note that LUT size is one of the most useful parameters for trading off area and delay.
Device and architecture co-optimization for FPGA power reduction
- in Proc. Design Automation Conf
, 2005
"... Abstract — Device optimization considering supply voltage Vdd and threshold voltage Vt has little chip area increase, but a great impact on power and performance in the nanometer technology. This paper studies simultaneous evaluation of device and architecture optimization for FPGAs. We first develo ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Abstract — Device optimization considering supply voltage Vdd and threshold voltage Vt has little chip area increase, but a great impact on power and performance in the nanometer technology. This paper studies simultaneous evaluation of device and architecture optimization for FPGAs. We first develop an efficient yet accurate timing and power evaluation method, called trace-based model. By collecting trace information from cycleaccurate simulation of placed and routed FPGA benchmark circuits and re-using the trace for different Vdd and Vt, we enable device and architecture co-optimization considering hundreds of device and architecture combinations. Compared to the baseline FPGA architecture, which uses the VPR architecture model and the same LUT and cluster sizes as those used by the Xilinx Virtex-II, Vdd suggested by ITRS, and Vt optimized with respect to the above architecture and Vdd, architecture and device cooptimization can reduce energy-delay product by 20.5 % and chip area by 23.3%. Furthermore, considering power-gating of unused logic blocks and interconnect switches (in this case sleep transistor size is a parameter of device tuning), our cooptimization reduces energy-delay product by 55.0 % and chip area by 8.2 % compared to the baseline FPGA architecture. To the best of our knowledge, this is the first in-depth study in the literature on architecture and device co-optimization for FPGAs. Index Terms — FPGA, Architecture, Delay estimation I.
Performance-Energy Tradeoffs for Matrix Multiplication on FPGA-Based Mixed-Mode Chip Multiprocessors
- in Proceedings of the 8th International Symposium on Quality Electronic Design, 2007
"... Abstract-Chip multiprocessing has demonstrated to be a promising approach in microprocessor design. With ever increasing concerns for energy consumption, performanceenergy trade-offs are often necessary, especially in the design of real-time embedded systems. This paper presents our performance and ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract-Chip multiprocessing has demonstrated to be a promising approach in microprocessor design. With ever increasing concerns for energy consumption, performanceenergy trade-offs are often necessary, especially in the design of real-time embedded systems. This paper presents our performance and energy study on an in-house developed FPGAbased mixed-mode chip multiprocessor, where the SIMD (Single-Instruction, Multiple-Data), MIMD (Multiple-Instruction, Multiple-Data) and M-SIMD (Multiple-SIMD) computing modes can exist simultaneously in one system. We propose performance-energy trade-off techniques based on the observation that SIMD and MIMD task executions involve substantially different amounts of computation and communication, which result in different time and energy behavior and provide us with opportunities to realize various performance-energy objectives. Generalized matrix-matrix multiplication (MMM) is employed as an example to illustrate our analysis. Experimental results on a Xilinx Virtex II XC2V6000-5 FPGA demonstrate the effectiveness of the proposed approach. I.
Leakage Power Reduction of Embedded Memories on FPGAs Through Location Assignment ABSTRACT
"... Transistor leakage is poised to become the dominant source of power dissipation in digital systems, and reconfigurable devices are not immune to this problem. Modern FPGAs already have a significant amount of memory on the die, and with each generation the proportion of embedded memory to logic cell ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Transistor leakage is poised to become the dominant source of power dissipation in digital systems, and reconfigurable devices are not immune to this problem. Modern FPGAs already have a significant amount of memory on the die, and with each generation the proportion of embedded memory to logic cells is growing. While assigning high Vth can limit the leakage power, embedded memory timing is critical to performance and will draw an increasingly significant amount of leakage current. However, unlike in many processor based systems, on-chip memory accesses are often fully deterministic and completely under the control of the scheduler. In this paper we explore a variety of techniques to battle the problem of leakage in FPGA embedded memories that range in complexity and effectiveness. Through the addition of sleep and drowsy modes, controlled by the scheduler, the amount of leakage power can be reduced by several orders of magnitude. We show how even very simple schemes offer large amounts of benefit, and that further reductions are possible through careful leakage-aware data placement.
3D nFPGA: A Reconfigurable Architecture for 3D CMOS/Nanomaterial Hybrid Digital Circuits
"... Abstract — In this paper, we introduce a novel reconfigurable architecture, named 3D nFPGA, which utilizes 3D integration techniques and new nanoscale materials synergistically. The proposed architecture is based on CMOS-nano hybrid techniques that incorporate nanomaterials such as carbon nanotube b ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract — In this paper, we introduce a novel reconfigurable architecture, named 3D nFPGA, which utilizes 3D integration techniques and new nanoscale materials synergistically. The proposed architecture is based on CMOS-nano hybrid techniques that incorporate nanomaterials such as carbon nanotube bundles and nanowire crossbars into CMOS fabrication process. This architecture also has built-in features for fault tolerance and heat alleviation. Using unique features of FPGAs and a novel 3D stacking method enabled by the application of nanomaterials, 3D nFPGA obtains a 4X footprint reduction comparing to the traditional CMOS-based 2D FPGAs. With a customized design automation flow, we evaluate the performance and power of 3D nFPGA driven by the 20 largest MCNC benchmarks. Results demonstrate that 3D nFPGA is able to provide a performance gain of 2.6X with a small power overhead comparing to the traditional 2D FPGA architecture. Index Terms — Nanoelectronics; nanowire; nanotube; reconfigurable logic; 3D integration; performance.
A Low-Power Field-Programmable Gate Array Routing Fabric
"... Abstract—This paper describes a new programmable routing fabric for field-programmable gate arrays (FPGAs). Our results show that an FPGA using this fabric can achieve 1.57 times lower dynamic power consumption and 1.35 times lower average net delays with only 9 % reduction in logic density over a b ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract—This paper describes a new programmable routing fabric for field-programmable gate arrays (FPGAs). Our results show that an FPGA using this fabric can achieve 1.57 times lower dynamic power consumption and 1.35 times lower average net delays with only 9 % reduction in logic density over a baseline island-style FPGA implemented in the same 65-nm CMOS technology. These improvements in power and delay are achieved by 1) using only short interconnect segments to reduce routed net lengths, and 2) reducing interconnect segment loading due to programming overhead relative to the baseline FPGA without compromising routability. The new routing fabric is also well-suited to monolithically stacked 3-D-IC implementation. It is shown that a 3-D-FPGA using this fabric can achieve a 3.3 times improvement in logic density, a 2.51 times improvement in delay, and a 2.93 times improvement in dynamic power consumption over the same baseline 2-D-FPGA. Index Terms—Field-programmable gate arrays (FPGAs), lowpower, performance analysis, routing architecture/fabric. I.
LOPASS: A low-power architectural synthesis system for FPGAs with interconnect estimation and optimization
- IEEE Transactions on Very Large Scale Integration (VLSI) Systems
"... Abstract—In this paper, we present a low-power architectural synthesis system (LOPASS) for field-programmable gate-array (FPGA) designs with interconnect power estimation and optimization. LOPASS includes three major components: 1) a flexible high-level power estimator for FPGAs considering the powe ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract—In this paper, we present a low-power architectural synthesis system (LOPASS) for field-programmable gate-array (FPGA) designs with interconnect power estimation and optimization. LOPASS includes three major components: 1) a flexible high-level power estimator for FPGAs considering the power consumption of various FPGA logic components and interconnects; 2) a simulated-annealing optimization engine that carries out resource selection and allocation, scheduling, functional unit binding, register binding, and interconnection estimation simultaneously to reduce power effectively; and 3) a-cofamily-based register binding algorithm and an efficient port assignment algorithm that reduce interconnections in the data path through multiplexer optimization. The experimental results show that LOPASS produces promising results on latency optimization compared to an academic high-level synthesis tool SPARK. Compared to an early commercial high-level synthesis tool, namely, Synopsys Behavioral Compiler, LOPASS is 61.6 % better on power consumption and 10.6 % better on clock period on average. Compared to a current commercial tool, namely, Impulse C, LOPASS is 31.1% better on power reduction with an 11.8 % penalty on clock period. Index Terms—Behavioral synthesis, field-programmable gate array (FPGA), interconnect, power optimization. I.
Technology Mapping and Clustering for FPGA Architectures with Dual Supply Voltages
"... Abstract—This paper presents a technology mapping algorithm for field-programmable gate array architectures with dual supply voltages (Vdds) for power optimization. This is done with the guarantee that the mapping depth of the circuit will not increase compared to the circuit with a single Vdd. This ..."
Abstract
- Add to MetaCart
Abstract—This paper presents a technology mapping algorithm for field-programmable gate array architectures with dual supply voltages (Vdds) for power optimization. This is done with the guarantee that the mapping depth of the circuit will not increase compared to the circuit with a single Vdd. This paper also presents an enhanced clustering algorithm that considers dual supply voltages, honoring the dual-Vdd mapping solution. To carry out various comparisons, we first design a single-Vdd mapping algorithm, named SVmap-2, which achieves a 3.8% total power reduction (15.6 % dynamic power reduction) over a previously published low-power mapping algorithm, Emap [11]. We then show that our dual-Vdd mapping algorithm, named DVmap-2, can further improve total power savings by 12.8 % over SVmap-2, with a 52.7 % dynamic power reduction. Compared to the early single-Vdd version SVmap [14], DVmap-2 is 14.3% better for total power reduction. This is achieved through an ideal selection of the low-Vdd/high-Vdd ratio and the consideration of various voltage changing scenarios during the mapping process. Index Terms—Dual-supply voltages, field-programmable gate array (FPGA), power optimization, technology mapping.

