Results 1 - 10
of
31
Reconfigurable Computing: A Survey of Systems and Software
, 2000
"... Due to its potential to greatly accelerate a wide variety of applications, reconfigurable computing has become a subject of a great deal of research. Its key feature is the ability to perform computations in hardware to increase performance, while retaining much of the flexibility of a software solu ..."
Abstract
-
Cited by 141 (5 self)
- Add to MetaCart
Due to its potential to greatly accelerate a wide variety of applications, reconfigurable computing has become a subject of a great deal of research. Its key feature is the ability to perform computations in hardware to increase performance, while retaining much of the flexibility of a software solution. In this survey we explore the hardware aspects of reconfigurable computing machines, from single chip architectures to multi-chip systems, including internal structures and external coupling. We also focus on the software that targets these machines, such as compilation tools that map high-level algorithms directly to the reconfigurable substrate. Finally, we consider the issues involved in run-time reconfigurable systems, which re-use the configurable hardware during program execution.
Bitwidth Analysis with Application to Silicon Compilation
, 2000
"... This paper introduces Bitwise, a compiler that minimizes the bitwidth --- the number of bits used to representeach operand --- for both integers and pointers in a program. By propagating static information both forward and backward in the program dataflowgraph,Bitwise frees the programmer from decla ..."
Abstract
-
Cited by 80 (0 self)
- Add to MetaCart
This paper introduces Bitwise, a compiler that minimizes the bitwidth --- the number of bits used to representeach operand --- for both integers and pointers in a program. By propagating static information both forward and backward in the program dataflowgraph,Bitwise frees the programmer from declaring bitwidth invariants in cases where the compiler can determine bitwidths automatically. We find a rich opportunity for bitwidth reduction in modern multimedia and streaming application workloads. For new architectures that support sub-word quantities, we expect that our bitwidth reductions will savepower and increase processor performance. This paper
Reconfigurable Computing for Digital Signal Processing: A Survey
- Journal of VLSI Signal Processing
, 2000
"... Steady advances in VLSI technology and design tools have extensively expanded the application domain of digital signal processing over the past decade. While application-specific integrated circuits (ASICs) and programmable digital signal processors (PDSPs) remain the implementation mechanisms of ch ..."
Abstract
-
Cited by 45 (2 self)
- Add to MetaCart
Steady advances in VLSI technology and design tools have extensively expanded the application domain of digital signal processing over the past decade. While application-specific integrated circuits (ASICs) and programmable digital signal processors (PDSPs) remain the implementation mechanisms of choice for many DSP applications, increasingly new system implementations based on reconfigurable computing are being considered. These flexible platforms, which offer the functional efficiency of hardware and the programmability of software, are quickly maturing as the logic capacity of programmable devices follow Moore's Law and advanced automated design techniques become available. As initial reconfigurable technologies have emerged, new academic and commercial efforts have been initiated to support power optimization, cost reduction, and enhanced run-time performance. This paper presents a survey of academic research and commercial development in reconfigurable computing for DSP systems o...
BitValue Inference: Detecting and Exploiting Narrow Bitwidth Computations
- IN PROCEEDINGS OF THE EUROPAR 2000 EUROPEAN CONFERENCE ON PARALLEL COMPUTING
, 2000
"... We present a compiler algorithm called BitValue, which can discover unused and constant bits in dusty-deck C programs. BitValue uses forward and backward dataflow analyses, generalizing constant-folding and dead-code detection at the bit-level. This algorithm enables compiler optimizations targeting ..."
Abstract
-
Cited by 41 (7 self)
- Add to MetaCart
We present a compiler algorithm called BitValue, which can discover unused and constant bits in dusty-deck C programs. BitValue uses forward and backward dataflow analyses, generalizing constant-folding and dead-code detection at the bit-level. This algorithm enables compiler optimizations targeting special processor architectures for computing on non-standard bitwidths. Using this algorithm we show that up to 36% of the computed bytes are thrown away; also, we show that on average 26.8% of the values computed require 16 bits or less (for programs from SpecINT95 and Mediabench). A compiler for reconfigurable hardware uses this algorithm to achieve substantial reductions (up to 20-fold) in the size of the synthesized circuits.
ASOC: A Scalable, Single-Chip Communications Architecture
, 2000
"... Draft - submitted to PACT'00. Do not distribute. Contact authors for final version. Over the past decade the number of transistors available to VLSI chip designers has grown exponentially. While the physical capacity to integrate large systems on a single chip will soon be available, there is curren ..."
Abstract
-
Cited by 38 (3 self)
- Add to MetaCart
Draft - submitted to PACT'00. Do not distribute. Contact authors for final version. Over the past decade the number of transistors available to VLSI chip designers has grown exponentially. While the physical capacity to integrate large systems on a single chip will soon be available, there is currently little agreement regarding the types of architectures and compilation environments that will be appropriate for these new systems. This paper examines systems-on-a-chip with an eye towards system-level adaptability and scalability. We believe that the performance-limiting bottleneck for many future systems-ona -chip will be same as the one found in many of today's board-level systems: system-wide interconnect. In this paper, a new single-chip interconnection architecture is described that not only provides scalable data transfer but also can be easily reconfigured as communication patterns change. An important aspect of the architecture is its support for compile-time, scheduled communi...
An Architecture and Compiler for Scalable On-Chip Communication
- IEEE Transactions on Very Large Scale Integration (VLSI) Systems
, 2004
"... Abstract—A dramatic increase in single chip capacity has led to a revolution in on-chip integration. Design reuse and ease of implementation have became important aspects of the design process. This paper describes a new scalable single-chip communication architecture for heterogeneous resources, ad ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
Abstract—A dramatic increase in single chip capacity has led to a revolution in on-chip integration. Design reuse and ease of implementation have became important aspects of the design process. This paper describes a new scalable single-chip communication architecture for heterogeneous resources, adaptive system-on-a-chip (aSOC) and supporting software for application mapping. This architecture exhibits hardware simplicity and optimized support for compile-time scheduled communication. To illustrate the benefits of the architecture, four high-bandwidth signal processing applications including an MPEG-2 video encoder and a Doppler radar processor have been mapped to a prototype aSOC device using our design mapping technology. Through experimentation it is shown that aSOC communication outperforms a hierarchical bus-based system-on-chip (SoC) approach by up to a factor of five. A VLSI implementation of the communication architecture indicates clock rates of 400 MHz in 0.18- m technology for sustained on-chip communication. In comparison to previously-published results for an MPEG-2 decoder, our on-chip interconnect shows a runtime improvement of over a factor of four. Index Terms—Communications architecture, on-chip interconnect, system-on-chip (SoC).
A Compiler Approach to Fast Hardware Design Space Exploration in FPGA-based Systems
- In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation
, 2002
"... This paper describes an automated approach to hardware design space exploration, through a collaboration between parallelizing compiler technology and high-level synthesis tools. We present a compiler algorithm that automatically explores the large design spaces resulting from the application of sev ..."
Abstract
-
Cited by 24 (3 self)
- Add to MetaCart
This paper describes an automated approach to hardware design space exploration, through a collaboration between parallelizing compiler technology and high-level synthesis tools. We present a compiler algorithm that automatically explores the large design spaces resulting from the application of several program transformations commonly used in application-specific hardware designs. Our approach uses synthesis estimation techniques to quantitatively evaluate alternate designs for a loop nest computation. We have implemented this design space exploration algorithm in the context of a compilation and synthesis system called DEFACTO, and present results of this implementation on five multimedia kernels. Our algorithm derives an implementation that closely matches the performance of the fastest design in the design space, and among implementations with comparable performance, selects the smallest design. We search on average only 0.3% of the design space. This technology thus significantly raises the level of abstraction for hardware design and explores a design space much larger than is feasible for a human designer.
The Multiple Wordlength Paradigm
- IEEE SYMPOSIUM ON FPGAS FOR CUSTOM COMPUTING MACHINES,
, 2001
"... This paper presents a paradigm for the design of multiple wordlength parallel processing systems for DSP applications, based on varying the wordlength and scaling of each signal in a DSP block diagram. A technique for estimating the observable effects of truncation and roundoff error is illustrated, ..."
Abstract
-
Cited by 22 (7 self)
- Add to MetaCart
This paper presents a paradigm for the design of multiple wordlength parallel processing systems for DSP applications, based on varying the wordlength and scaling of each signal in a DSP block diagram. A technique for estimating the observable effects of truncation and roundoff error is illustrated, and used to form the basis of an optimization algorithm to automate the design of such multiple wordlength systems. Results from implementation on a reconfigurable computing platform show that significant logic usage savings and increased clock rates can be obtained by customizing the datapath precision to the algorithm according to the techniques described in this paper. On selected DSP benchmarks, we obtain up to J5 area reduction and up to 39 speed increase over standard design techniques.
Molecular Electronics: Devices, Systems and Tools for Gigagate, Gigabit Chips
- In ICCAD-2002
, 2002
"... New electronics technologies are emerging which may carry us beyond the limits of lithographic processing down to molecularscale feature sizes. Devices and interconnects can be made from a variety of molecules and materials including bistable and switchable organic molecules, carbon nanotubes, and, ..."
Abstract
-
Cited by 19 (4 self)
- Add to MetaCart
New electronics technologies are emerging which may carry us beyond the limits of lithographic processing down to molecularscale feature sizes. Devices and interconnects can be made from a variety of molecules and materials including bistable and switchable organic molecules, carbon nanotubes, and, single-crystal semiconductor nanowires. They can be self-assembled into organized structures and attached onto lithographic substrates. This tutorial reviews emerging molecular-scale electronics technology for CAD and system designers and highlights where ICCAD research can help support this technology.
Hardware Synthesis from Term Rewriting Systems
, 1999
"... Term Rewriting System (TRS) is a good formalism for describing concurrent systems that embody asynchronous and nondeterministic behavior in their specifications. Elsewhere, we have used TRS's to describe speculative micro-architectures and complex cache-coherence protocols, and proven the correctnes ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
Term Rewriting System (TRS) is a good formalism for describing concurrent systems that embody asynchronous and nondeterministic behavior in their specifications. Elsewhere, we have used TRS's to describe speculative micro-architectures and complex cache-coherence protocols, and proven the correctness of these systems. In this paper, we describe the compilation of TRS's into a subset of Verilog that can be simulated and synthesized using commercial tools. TRAC, Term Rewriting Architecture Compiler, enables a new hardware development framework that can match the ease of today's software programming environment. TRAC reduces the time and effort in developing and debugging hardware. For several examples, we compare TRAC-generated RTL's with hand-coded RTL's after they are both compiled for Field Programmable Gate Arrays by Xilinx tools. The circuits generated from TRS are competitive with those described using Verilog RTL, especially for larger designs.

