Results 1 - 10
of
55
Reconfigurable Computing: A Survey of Systems and Software
, 2000
"... Due to its potential to greatly accelerate a wide variety of applications, reconfigurable computing has become a subject of a great deal of research. Its key feature is the ability to perform computations in hardware to increase performance, while retaining much of the flexibility of a software solu ..."
Abstract
-
Cited by 141 (5 self)
- Add to MetaCart
Due to its potential to greatly accelerate a wide variety of applications, reconfigurable computing has become a subject of a great deal of research. Its key feature is the ability to perform computations in hardware to increase performance, while retaining much of the flexibility of a software solution. In this survey we explore the hardware aspects of reconfigurable computing machines, from single chip architectures to multi-chip systems, including internal structures and external coupling. We also focus on the software that targets these machines, such as compilation tools that map high-level algorithms directly to the reconfigurable substrate. Finally, we consider the issues involved in run-time reconfigurable systems, which re-use the configurable hardware during program execution.
Instruction Generation for Hybrid Reconfigurable Systems
- ACM Transactions on Design Automation of Electronic Systems
, 2001
"... Building Blocks (ABBs), or instructions available from a given hardware library. The customized data path generated from many ABBs was referred to as an application specific unit (ASU). Cathedral's synthesis targeted ASUs, which could be executed in very few clock cycles. This goal was achieved via ..."
Abstract
-
Cited by 53 (5 self)
- Add to MetaCart
Building Blocks (ABBs), or instructions available from a given hardware library. The customized data path generated from many ABBs was referred to as an application specific unit (ASU). Cathedral's synthesis targeted ASUs, which could be executed in very few clock cycles. This goal was achieved via manual clustering of necessary operations into more compact operations, essentially a form of template construction. Whereas our template generation and matching algorithms are automated, the definition of clusters in Cathedral was a manual operation, mainly clustering loop and function bodies. Their results demonstrated an expected reduction of critical path length as well as interconnect as a result of clustering.
Instruction Generation and Regularity Extraction for Reconfigurable Processors
- In CASES
, 2002
"... The increasing demand for complex and specialized embedded hardware must be met by processors which are optimized for performance, yet are also extremely flexible. In our work, we explore the tradeoff between flexibility and performance in the domain of reconfigurable processor design. Specifically, ..."
Abstract
-
Cited by 32 (3 self)
- Add to MetaCart
The increasing demand for complex and specialized embedded hardware must be met by processors which are optimized for performance, yet are also extremely flexible. In our work, we explore the tradeoff between flexibility and performance in the domain of reconfigurable processor design. Specifically, we seek to identify regularly occurring, computation-heavy patterns in an application or set of applications. These patterns become candidates for hard-logic implementation, potentially embedded in the flexible reconflgurable fabric as special optimized instructions. In this work we present an extension to previous work in instruction generation: an algorithm that identifies parallel templates. We discuss the advantages of parallel templates, and prove the correctness of our algorithm. We introduce an All-Pairs Common Slack Graph (APCSG) as an effective tool for parallel template generation. Finally, we demonstrate the effectiveness of our algorithm on several applications' dataflow graphs, reducing latency on average by 51.98%, without unreasonably increasing chip area.
Towards Nanocomputer Architecture
, 2002
"... At the nanometer scale, the focus of micro-architecture will move from processing to communication. Most general computer architectures to date have been based on a "stored program" paradigm that differentiates between memory and processing and relies on communication over busses and other (relative ..."
Abstract
-
Cited by 28 (0 self)
- Add to MetaCart
At the nanometer scale, the focus of micro-architecture will move from processing to communication. Most general computer architectures to date have been based on a "stored program" paradigm that differentiates between memory and processing and relies on communication over busses and other (relatively) long distance mechanisms. Nanometer-scale electronics -- nanoelectronics - promises to fundamentally change the ground-rules. Processing will be cheap and plentiful, interconnection expensive but pervasive. This will tend to move computer architecture in the direction of locallyconnected, reconfigurable hardware meshes that merge processing and memory. If the overheads associated with reconfigurability can be reduced or even eliminated, architectures based on non-volatile, reconfigurable, finegrained meshes with rich, local interconnect offer a better match to the expected characteristics of future nanoelectronic devices.
The Effect of Reconfigurable Units in Superscalar Processors
- Proc. Ninth International Symposium on Field Programmable Gate Arrays, FPGA 2001
, 2001
"... This paper describes OneChip, a third generation reconfigurable processor architecture that integrates a Reconfigurable Functional Unit (RFU) into a superscalar Reduced Instruction Set Computer (RISC) processor's pipeline. The architecture allows dynamic scheduling and dynamic reconfiguration. It al ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
This paper describes OneChip, a third generation reconfigurable processor architecture that integrates a Reconfigurable Functional Unit (RFU) into a superscalar Reduced Instruction Set Computer (RISC) processor's pipeline. The architecture allows dynamic scheduling and dynamic reconfiguration. It also provides support for pre-loading configurations and for Least Recently Used (LRU) configuration management.
The Reconfigurable Streaming Vector Processor (RSVP™)
"... The need to process multimedia data places large computational demands on portable/embedded devices. These multimedia functions share common characteristics: they are computationally intensive and data-streaming, performing the same operation(s) on many data elements. The Reconfigurable Streaming Ve ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
The need to process multimedia data places large computational demands on portable/embedded devices. These multimedia functions share common characteristics: they are computationally intensive and data-streaming, performing the same operation(s) on many data elements. The Reconfigurable Streaming Vector Processor (RSVP ) is a vector coprocessor architecture that accelerates streaming data operations. Programming the RSVP architecture involves describing the shape and location of vector streams in memory and describing computations as data-flow graphs. These descriptions are intuitive and independent of each other, making the RSVP architecture easy to program. They are also machine independent, allowing binary-compatible implementations with varying cost-performance tradeoffs. This paper presents the RSVP architecture and programming model, a programming case study, and our first implementation. Our results show significant speedups on streaming data functions. Speedups for kernels and applications range from 2 to over 20 times that of an ARM9 host processor alone.
Reconfigurable Instruction Set Processors: A survey
- IEEE Transactions on Software Engineering
, 2000
"... Future interactive multimedia applications are characterized by a large variety of compression algorithms with highly parallel nested loops. It will not be efficient to design custom processors suitable for this wide range of applications due to the uncertainty on what is going to be executed. Inste ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
Future interactive multimedia applications are characterized by a large variety of compression algorithms with highly parallel nested loops. It will not be efficient to design custom processors suitable for this wide range of applications due to the uncertainty on what is going to be executed. Instead, we must find ways to cope with such dynamic and compute intensive tasks. Reconfigurable instruction set processors can cope with this dynamism by specializing the hardware to the algorithm at hand at runtime. They achieve this thanks to a flexible fabric of coarse-grained processing elements that can be reconfigured to perform different complex algorithms. This paper analyzes the performance improvements obtained by such programmable structures and discusses some of the critical issues, such as reconfiguration times. 1
A compiler framework for mapping applications to a coarse-grained reconfigurable computer architecture
- Computer Architecture. Conf. on Compiler, Architecture and Synthesis for Embedded Systems (CASES
, 2001
"... The rapid growth of silicon densities has made it feasible to deploy reconfigurable hardware as a highly parallel computing platform. However, in most cases, the application needs to be programmed in hardware description or assembly languages, whereas most application programmers are familiar with t ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
The rapid growth of silicon densities has made it feasible to deploy reconfigurable hardware as a highly parallel computing platform. However, in most cases, the application needs to be programmed in hardware description or assembly languages, whereas most application programmers are familiar with the algorithmic programming paradigm. SA-C has been proposed as an expression-oriented language designed to implicitly express data parallel operations. Morphosys is a reconfigurable system-on-chip architecture that supports a data-parallel, SIMD computational model. This paper describes a compiler framework to analyze SA-C programs, perform optimizations, and map the application onto the Morphosys architecture. The mapping process involves operation scheduling, resource allocation and binding and register allocation in the context of the Morphosys architecture. The execution times of some compiled image-processing kernels can achieve up to 42x speed-up over an 800 MHz Pentium III machine. 1.
Reconfigurable computing: architectures and design methods
- IEE Proceedings - Computers and Digital Techniques
, 2005
"... Abstract: Reconfigurable computing is becoming increasingly attractive for many applications. This survey covers two aspects of reconfigurable computing: architectures and design methods. The paper includes recent advances in reconfigurable architectures, such as the Alters Stratix II and Xilinx Vir ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
Abstract: Reconfigurable computing is becoming increasingly attractive for many applications. This survey covers two aspects of reconfigurable computing: architectures and design methods. The paper includes recent advances in reconfigurable architectures, such as the Alters Stratix II and Xilinx Virtex 4 FPGA devices. The authors identify major trends in general-purpose and specialpurpose
Rescue: A Microarchitecture for Testability and Defect Tolerance
- In ISCA ’05: Proceedings of the 32nd annual international symposium on Computer Architecture
, 2005
"... Scaling feature size improves processor performance but increases each device’s susceptibility to defects (i.e., hard errors). As a result, fabrication technology must improve significantly to maintain yields. Redundancy techniques in memory have been successful at improving yield in the presence of ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Scaling feature size improves processor performance but increases each device’s susceptibility to defects (i.e., hard errors). As a result, fabrication technology must improve significantly to maintain yields. Redundancy techniques in memory have been successful at improving yield in the presence of defects. Apart from core sparing which disables faulty cores in a chip multiprocessor, little has been done to target the core logic. While previous work has proposed that either inherent or added redundancy in the core logic can be used to tolerate defects, the key issues of realistic testing and fault isolation have been ignored. This paper is the first to consider testability and fault isolation in designing modern high-performance, defect-tolerant microarchitectures. We define intra-cycle logic independence (ICI) as the condition needed for conventional scan test to isolate faults quickly to the microarchitectural-block granularity. We propose logic transformations to redesign conventional superscalar microarchitecture to comply with ICI. We call our novel, testable, and defecttolerant microarchitecture Rescue. We build a verilog model of Rescue and verify that faults can be isolated to the required precision using only conventional scan test. Using performace simulations, we show that ICI transformations reduce IPC only by 4 % on average for SPEC2000 programs. Taking yield improvement into account, Rescue improves average yield-adjusted instruction throughput over core sparing by 12% and 22 % at 32nm and 18nm technology nodes, respectively. 1

