Results 1 - 10
of
98
Reconfigurable Computing: A Survey of Systems and Software
, 2000
"... Due to its potential to greatly accelerate a wide variety of applications, reconfigurable computing has become a subject of a great deal of research. Its key feature is the ability to perform computations in hardware to increase performance, while retaining much of the flexibility of a software solu ..."
Abstract
-
Cited by 141 (5 self)
- Add to MetaCart
Due to its potential to greatly accelerate a wide variety of applications, reconfigurable computing has become a subject of a great deal of research. Its key feature is the ability to perform computations in hardware to increase performance, while retaining much of the flexibility of a software solution. In this survey we explore the hardware aspects of reconfigurable computing machines, from single chip architectures to multi-chip systems, including internal structures and external coupling. We also focus on the software that targets these machines, such as compilation tools that map high-level algorithms directly to the reconfigurable substrate. Finally, we consider the issues involved in run-time reconfigurable systems, which re-use the configurable hardware during program execution.
PipeRench: A Coprocessor for Streaming Multimedia Acceleration
, 1999
"... Future computing workloads will emphasize an architecture 's ability to perform relatively simple calculations on massive quantities of mixed-width data. This paper describes a novel reconfigurable fabric architecture, PipeRench, optimized to accelerate these types of computations. PipeRench enables ..."
Abstract
-
Cited by 109 (11 self)
- Add to MetaCart
Future computing workloads will emphasize an architecture 's ability to perform relatively simple calculations on massive quantities of mixed-width data. This paper describes a novel reconfigurable fabric architecture, PipeRench, optimized to accelerate these types of computations. PipeRench enables fast, robust compilers, supports forward compatibility, and virtualizes configurations, thus removing the fixed size constraint present in other fabrics. For the first time we explore how the bit-width of processing elements affects performance and show how the PipeRench architecture has been optimized to balance the needs of the compiler against the realities of silicon. Finally, we demonstrate extreme performance speedup on certain computingkernels (up to 190x versus a modernRISC processor), and analyze how this acceleration translates to application speedup. 1.
CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit
- IN PROCEEDINGS OF THE 27TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE
, 2000
"... Reconfigurable hardware has the potential for significant performance improvements by providing support for application−specific operations. We report our experience with Chimaera, a prototype system that integrates a small and fast reconfigurable functional unit (RFU) into the pipeline of an aggres ..."
Abstract
-
Cited by 76 (1 self)
- Add to MetaCart
Reconfigurable hardware has the potential for significant performance improvements by providing support for application−specific operations. We report our experience with Chimaera, a prototype system that integrates a small and fast reconfigurable functional unit (RFU) into the pipeline of an aggressive, dynamically−scheduled superscalar processor. Chimaera is capable of performing 9−input/1−output operations on integer data. We discuss the Chimaera C compiler that automatically maps computations for execution in the RFU. Chimaera is capable of: (1) collapsing a set of instructions into RFU operations, (2) converting control−flow into RFU operations, and (3) supporting a more powerful fine−grain data−parallel model than that supported by current multimedia extension instruction sets (for integer operations). Using a set of multimedia and communication applications we show that even with simple optimizations, the Chimaera C compiler is able to map 22 % of all instructions to the RFU on the average. A variety of computations are mapped into RFU operations ranging from as simple as add/sub−shift pairs to operations of more than 10 instructions including several branches. Timing experiments demonstrate that for a 4−way out−of−order superscalar processor Chimaera results in average performance improvements of 21%, assuming a very aggressive core processor design (most pessimistic RFU latency model) and communication overheads from and to the RFU.
Morphosys: an integrated reconfigurable system for data-parallel and computation-intensive applications
- IEEE Transactions on Computers
, 2000
"... Abstract: This paper introduces MorphoSys, a reconfigurable computing system developed to investigate the effectiveness of combining reconfigurable hardware with general-purpose processors for word-level, computation-intensive applications. MorphoSys is a coarse-grain, integrated reconfigurable syst ..."
Abstract
-
Cited by 74 (3 self)
- Add to MetaCart
Abstract: This paper introduces MorphoSys, a reconfigurable computing system developed to investigate the effectiveness of combining reconfigurable hardware with general-purpose processors for word-level, computation-intensive applications. MorphoSys is a coarse-grain, integrated reconfigurable system-on-chip targeted at high-throughput and data-parallel applications. It comprises of a reconfigurable array of processing cells, a modified RISC processor core and an efficient memory interface unit. This paper describes the MorphoSys architecture, including the reconfigurable processing array, the control processor and data and configuration memories. The suitability of MorphoSys for the target application domain is then illustrated with examples such as video compression, data encryption and target recognition. Performance evaluation of these applications indicates improvements of up to an order of magnitude on MorphoSys in comparison with other systems.
BThe Molen Polymorphic Processor
- IEEE Trans. Comput
"... Abstract—In this paper, we present a polymorphic processor paradigm incorporating both general purpose and custom computing processing. The proposal incorporates an arbitrary number of programmable units, exposes the hardware to the programmers/ designers, and allows them to modify and extend the pr ..."
Abstract
-
Cited by 67 (44 self)
- Add to MetaCart
Abstract—In this paper, we present a polymorphic processor paradigm incorporating both general purpose and custom computing processing. The proposal incorporates an arbitrary number of programmable units, exposes the hardware to the programmers/ designers, and allows them to modify and extend the processor functionality at will. To achieve the previously stated attributes, we present a new programming paradigm, a new instruction set architecture, a microcode-based microarchitecture, and a compiler methodology. The programming paradigm, in contrast with the conventional programming paradigms, allows general-purpose conventional code and hardware descriptions to coexist in a program. In our proposal, for a given instruction set architecture, a onetime instruction set extension of eight instructions is sufficient to implement the reconfigurable functionality of the processor. We propose a microarchitecture based on reconfigurable hardware emulation to allow high-speed reconfiguration and execution. To prove the viability of the proposal, we experimented with the MPEG-2 encoder and decoder and a Xilinx Virtex II Pro FPGA. We have implemented three operations, SAD, DCT, and IDCT. The overall attainable application speedup for the MPEG-2 encoder and decoder is between 2.64-3.18 and between 1.56-1.94, respectively, representing between 93 percent and 98 percent of the theoretically obtainable speedups.
Instruction Generation for Hybrid Reconfigurable Systems
- ACM Transactions on Design Automation of Electronic Systems
, 2001
"... Building Blocks (ABBs), or instructions available from a given hardware library. The customized data path generated from many ABBs was referred to as an application specific unit (ASU). Cathedral's synthesis targeted ASUs, which could be executed in very few clock cycles. This goal was achieved via ..."
Abstract
-
Cited by 53 (5 self)
- Add to MetaCart
Building Blocks (ABBs), or instructions available from a given hardware library. The customized data path generated from many ABBs was referred to as an application specific unit (ASU). Cathedral's synthesis targeted ASUs, which could be executed in very few clock cycles. This goal was achieved via manual clustering of necessary operations into more compact operations, essentially a form of template construction. Whereas our template generation and matching algorithms are automated, the definition of clusters in Cathedral was a manual operation, mainly clustering loop and function bodies. Their results demonstrated an expected reduction of critical path length as well as interconnect as a result of clustering.
Configuration Prefetch for Single Context Reconfigurable Coprocessors
, 1998
"... Current reconfigurable systems suffer from a significant overhead due to the time it takes to reconfigure their hardware. In order to deal with this overhead, and increase the power of reconfigurable systems, it is important to develop hardware and software systems to reduce or eliminate this delay. ..."
Abstract
-
Cited by 48 (8 self)
- Add to MetaCart
Current reconfigurable systems suffer from a significant overhead due to the time it takes to reconfigure their hardware. In order to deal with this overhead, and increase the power of reconfigurable systems, it is important to develop hardware and software systems to reduce or eliminate this delay. In this paper we propose one technique for significantly reducing the reconfiguration latency: the prefetching of configurations. By loading a configuration into the reconfigurable logic in advance of when it is needed, we can overlap the reconfiguration with useful computation. We demonstrate the power of this technique, and propose an algorithm for automatically adding prefetch operations into reconfigurable applications. This results in a significant decrease in the reconfiguration overhead for these applications. 1 Introduction When FPGAs were first introduced in the mid 1980s they were viewed as a technology for replacing standard gate arrays for some applications. In these first gener...
Application-specific instruction generation for configurable processor architectures
- in Proc. ACM International Symposium on Field-Programmable Gate Arrays
, 2004
"... Designing an application-specific embedded system in nanometer technologies has become more difficult than ever due to the rapid increase in design complexity and manufacturing cost. Efficiency and flexibility must be carefully balanced to meet different application requirements. The recently emerge ..."
Abstract
-
Cited by 42 (6 self)
- Add to MetaCart
Designing an application-specific embedded system in nanometer technologies has become more difficult than ever due to the rapid increase in design complexity and manufacturing cost. Efficiency and flexibility must be carefully balanced to meet different application requirements. The recently emerged configurable and extensible processor architectures offer a favorable tradeoff between efficiency and flexibility, and a promising way to minimize certain important metrics (e.g., execution time, code size, etc.) of the embedded processors. This paper addresses the problem of generating the application-specific instructions to improve the execution speed for configurable processors. A set of algorithms, including pattern generation, pattern selection, and application mapping, are proposed to efficiently utilize the instruction set extensibility of the target configurable processor. Applications of our approach to several real-life benchmarks on the Altera Nios processor show encouraging performance speedup (2.75X on average and up to 3.73X in some cases).
A Quantitative Analysis of Reconfigurable Coprocessors for Multimedia Applications
, 1998
"... Recently, computer architectures that combine a reconfigurable (or retargetable) coprocessor with a general-purpose microprocessor have been proposed. These architectures are designed to exploit large amounts of fine grain parallelism in applications. In this paper, we study the performance of the r ..."
Abstract
-
Cited by 39 (0 self)
- Add to MetaCart
Recently, computer architectures that combine a reconfigurable (or retargetable) coprocessor with a general-purpose microprocessor have been proposed. These architectures are designed to exploit large amounts of fine grain parallelism in applications. In this paper, we study the performance of the reconfigurable coprocessors on multimedia applications. We compare a Field Programmable Gate Array (FPGA) based reconfigurable coprocessor with the array processor called REMARC (Reconfigurable Multimedia Array Coprocessor). REMARC uses a 16-bit simple processor that is much larger than a Configurable Logic Block (CLB) of an FPGA. We have developed a simulator, a programming environment, and multimedia application programs to evaluate the performance of the two coprocessor architectures. The simulation results show that REMARC achieves speedups ranging from a factor of 2.3 to 7.3 on these applications. The FPGA coprocessor achieves similar performance improvements. However, the FPGA coprocess...
HW/SW partitioning and code generation of embedded control applications on a reconfigurable architecture platform
- In Proceedings of the 10th International Workshop on Hardware/Software Codesign
, 2002
"... This paper studies the usage of a reconfigurable architecture platform for embedded control applications aimed at improving real time performance. The hw/sw codesign methodology from POLIS is used. It starts from high-level specifications, optimizes an intermediate model of computation (Extended Fin ..."
Abstract
-
Cited by 30 (2 self)
- Add to MetaCart
This paper studies the usage of a reconfigurable architecture platform for embedded control applications aimed at improving real time performance. The hw/sw codesign methodology from POLIS is used. It starts from high-level specifications, optimizes an intermediate model of computation (Extended Finite State Machines) and derives both hardware and software, based on performance constraints. We study a particular architecture platform, which consists of a general purpose processor core, augmented with a reconfigurable function unit and data-path to improve run time performance. A new mapping flow and algorithms to partition hardware and software are proposed to generate implementation that best utilizes this architecture. Encouraging preliminary results are shown for automotive electronic control examples.

