Results 1 - 10
of
55
Garp: A MIPS Processor with a Reconfigurable Coprocessor
, 1997
"... Typical reconfigurable machines exhibit shortcomings that make them less than ideal for general-purpose computing. The Garp Architecture combines reconfigurable hardware with a standard MIPS processor on the same die to retain the better features of both. Novel aspects of the architecture are presen ..."
Abstract
-
Cited by 321 (6 self)
- Add to MetaCart
Typical reconfigurable machines exhibit shortcomings that make them less than ideal for general-purpose computing. The Garp Architecture combines reconfigurable hardware with a standard MIPS processor on the same die to retain the better features of both. Novel aspects of the architecture are presented, as well as a prototype software environment and preliminary performance results. Compared to an UltraSPARC, a Garp of similar technology could achieve speedups ranging from a factor of 2 to as high as a factor of 24 for some useful applications.
Processor Reconfiguration Through Instruction Set Metamorphosis: Compiler and Architecture
- IEEE Computer
, 1993
"... Many computationally-intensive tasks spend nearly all of their execution time within a small fraction of the executable code. A new hardware/software system, called PRISM, is presented which improves the performance of many of these computationally intensive tasks by utilizing information extract ..."
Abstract
-
Cited by 156 (5 self)
- Add to MetaCart
Many computationally-intensive tasks spend nearly all of their execution time within a small fraction of the executable code. A new hardware/software system, called PRISM, is presented which improves the performance of many of these computationally intensive tasks by utilizing information extracted at compile-time to synthesize new operations which augment the functionality of a core processor. By integrating adaptation into a general-purpose computer, one can not only reap the performance benefits of applicationspecific processors, but also retain the general-purpose nature by accommodating a wide variety of tasks. Newly synthesized operations are targeted to RAM-based logic devices which provide a mechanism for fast processor reconfiguration. A proof-of-concept system called PRISM-I, consisting of a specialized C configuration compiler and a reconfigurable hardware platform is presented. Compilation and performance results are provided which confirm the concept viability, and demonstrate significant speed-up over conventional general-purpose architectures. Keywords: Adaptive Architectures, Reconfigurable Instruction Sets, Performance Improvements, General-Purpose Computers, Logic Synthesis I
Virtual Wires: Overcoming Pin Limitations in FPGA-based Logic Emulators
"... Existing FPGA-based logic emulators suffer from limited inter-chip communication bandwidth, resulting in low gate utilization (10 to 20 percent). This resource imbalance increases the number of chips needed to emulate a particular logic design and thereby decreases emulation speed, since signals mus ..."
Abstract
-
Cited by 75 (11 self)
- Add to MetaCart
Existing FPGA-based logic emulators suffer from limited inter-chip communication bandwidth, resulting in low gate utilization (10 to 20 percent). This resource imbalance increases the number of chips needed to emulate a particular logic design and thereby decreases emulation speed, since signals must cross more chip boundaries. Current emulators only use a fraction of potential communication bandwidth because they dedicate each FPGA pin (physical wire) to a single emulated signal (logical wire). These logical wires are not active simultaneouslyand are only switched at emulation clock speeds. Virtual wires overcome pin limitations by intelligently multiplexing each physical wire among multiple logical wires and pipelining these connections at the maximum clocking frequency of the FPGA. A virtual wire represents a connection from a logical output on one FPGA to a logical input on another FPGA. Virtual wires not only increase usable bandwidth, but also relax the absolute limits imposed on gate utilization. The resulting improvement in bandwidth reduces the need for global interconnect, allowing effective use of low dimension inter-chip connections (such as nearest-neighbor). Nearest-neighbor topologies, coupled with the ability of virtual wires to overlap communication with computation, can even improve emulation speeds. We present the concept of virtual wires and describe our first implementation, a “softwire ” compiler which utilizes static routing and relies on minimal hardware support. Results from compiling netlists for the 18K gate Sparcle microprocessor and the 86K gate Alewife Communications and Cache Controllerindicate that virtual wires can increase FPGA gate utilizationbeyond 80 percent withouta significant slowdown in emulation speed.
Morphosys: an integrated reconfigurable system for data-parallel and computation-intensive applications
- IEEE Transactions on Computers
, 2000
"... Abstract: This paper introduces MorphoSys, a reconfigurable computing system developed to investigate the effectiveness of combining reconfigurable hardware with general-purpose processors for word-level, computation-intensive applications. MorphoSys is a coarse-grain, integrated reconfigurable syst ..."
Abstract
-
Cited by 74 (3 self)
- Add to MetaCart
Abstract: This paper introduces MorphoSys, a reconfigurable computing system developed to investigate the effectiveness of combining reconfigurable hardware with general-purpose processors for word-level, computation-intensive applications. MorphoSys is a coarse-grain, integrated reconfigurable system-on-chip targeted at high-throughput and data-parallel applications. It comprises of a reconfigurable array of processing cells, a modified RISC processor core and an efficient memory interface unit. This paper describes the MorphoSys architecture, including the reconfigurable processing array, the control processor and data and configuration memories. The suitability of MorphoSys for the target application domain is then illustrated with examples such as video compression, data encryption and target recognition. Performance evaluation of these applications indicates improvements of up to an order of magnitude on MorphoSys in comparison with other systems.
A Real-time Matching System for Large Fingerprint Databases
, 1996
"... With the current rapid growth in multimedia technology, there is an imminent need for efficient techniques to search and query large image databases. Because of their unique and peculiar needs, image databases cannot be treated in a similar fashion to other types of digital libraries. The contextual ..."
Abstract
-
Cited by 68 (12 self)
- Add to MetaCart
With the current rapid growth in multimedia technology, there is an imminent need for efficient techniques to search and query large image databases. Because of their unique and peculiar needs, image databases cannot be treated in a similar fashion to other types of digital libraries. The contextual dependencies present in images and the complex nature of two-dimensional image data make the representation issues more difficult for image databases. An invariant representation of an image is still an open research issue. For these reasons, it is difficult to find a universal content-based retrieval technique. Current approaches based on shape, texture, and color for indexing image databases have met with limited success. Further, these techniques have not been adequately tested in the presence of noise and distortions. A given application domain offers stronger constraints for improving the retrieval performance. Fingerprint databases are characterized by their large size as well as nois...
Instruction Generation for Hybrid Reconfigurable Systems
- ACM Transactions on Design Automation of Electronic Systems
, 2001
"... Building Blocks (ABBs), or instructions available from a given hardware library. The customized data path generated from many ABBs was referred to as an application specific unit (ASU). Cathedral's synthesis targeted ASUs, which could be executed in very few clock cycles. This goal was achieved via ..."
Abstract
-
Cited by 53 (5 self)
- Add to MetaCart
Building Blocks (ABBs), or instructions available from a given hardware library. The customized data path generated from many ABBs was referred to as an application specific unit (ASU). Cathedral's synthesis targeted ASUs, which could be executed in very few clock cycles. This goal was achieved via manual clustering of necessary operations into more compact operations, essentially a form of template construction. Whereas our template generation and matching algorithms are automated, the definition of clusters in Cathedral was a manual operation, mainly clustering loop and function bodies. Their results demonstrated an expected reduction of critical path length as well as interconnect as a result of clustering.
Reconfigurable Computing for Digital Signal Processing: A Survey
- Journal of VLSI Signal Processing
, 2000
"... Steady advances in VLSI technology and design tools have extensively expanded the application domain of digital signal processing over the past decade. While application-specific integrated circuits (ASICs) and programmable digital signal processors (PDSPs) remain the implementation mechanisms of ch ..."
Abstract
-
Cited by 45 (2 self)
- Add to MetaCart
Steady advances in VLSI technology and design tools have extensively expanded the application domain of digital signal processing over the past decade. While application-specific integrated circuits (ASICs) and programmable digital signal processors (PDSPs) remain the implementation mechanisms of choice for many DSP applications, increasingly new system implementations based on reconfigurable computing are being considered. These flexible platforms, which offer the functional efficiency of hardware and the programmability of software, are quickly maturing as the logic capacity of programmable devices follow Moore's Law and advanced automated design techniques become available. As initial reconfigurable technologies have emerged, new academic and commercial efforts have been initiated to support power optimization, cost reduction, and enhanced run-time performance. This paper presents a survey of academic research and commercial development in reconfigurable computing for DSP systems o...
Improving Functional Density Through Run-Time Circuit Reconfiguration
, 1997
"... orting a C compiler to the DISC processor. Justin Diether assisted in the design, hand-layout, and testing of many partially reconfigured circuits. I would also like to thank Paul Graham for his generous assistance and support of our many mutual activities, classes, and projects at BYU. Other gradua ..."
Abstract
-
Cited by 42 (2 self)
- Add to MetaCart
orting a C compiler to the DISC processor. Justin Diether assisted in the design, hand-layout, and testing of many partially reconfigured circuits. I would also like to thank Paul Graham for his generous assistance and support of our many mutual activities, classes, and projects at BYU. Other graduate students assisting me with this work include Russel Peterson, Mike Rencher, Richard Ross, and Peter Bellows. My advisor, Brad Hutchings, provided essential assistance and encouragement in all of the projects, ideas, and results presented within this work. My decision to complete this degree and write this dissertation was influenced largely by his advice and positive encouragement. Brent Nelson and other faculty members within the Electrical and Computer Engineering department at BYU have provided critical feedback on a wide variety of topics relating to this work. I would also like to acknowledge the insight and assistance of many collaborators researching closely related subjects. For
Parallelizing applications into silicon
- IEEE Symposium on Field-Programmable Custom Computing Machines
, 1999
"... The next decade of computing will be dominated by embedded systems, information appliances and application-speci c computers. In order to build these systems, designers will need high-level compilation and CAD tools that generate architectures that e ectively meet the needs of each application. In t ..."
Abstract
-
Cited by 40 (4 self)
- Add to MetaCart
The next decade of computing will be dominated by embedded systems, information appliances and application-speci c computers. In order to build these systems, designers will need high-level compilation and CAD tools that generate architectures that e ectively meet the needs of each application. In this paper we present a novel compilation system that allows sequential programs, written in C or FOR-TRAN, to be compiled directly into custom silicon or recon gurable architectures. This capability is also interesting because trends in computer architecture are moving towards more recon gurable hardware-like substrates, suchasFPGA based systems. Our system works by successfully combining two resource-e cient computing disciplines: Small Memories and Virtual Wires. For a given application, the compiler rst analyzes the memory access patterns of pointers and arrays in the program and constructs a partitioned memory system made up of many small memories. The computation is implemented by active computing elements that are spatially distributed within the memory array. A space-time scheduler assigns instructions to the computing elements in a way that maximizes locality and minimizes physical communication distance. It also generates an e cient static schedule for the interconnect. Finally, specialized hardware for the resulting schedule of memory accesses, wires, and computation is generated as a multi-process state machine in synthesizable Verilog. With this system, implementedasasetofSUIFcompiler passes, we havesuccessfully compiled programs into hardware and achieve specialization performance enhancements by up to an order of magnitude versus a single generalpurpose processor. We also achieve additional parallelization speedups similar to those obtainable using a tightlyinterconnected multiprocessor. 1
The RAW Benchmark Suite: Computation Structures for General Purpose Computing
- IN IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES
, 1997
"... The RAW benchmark suite consists of twelve programs designed to facilitate comparing, validating, and improving reconfigurable computing systems. These benchmarks run the gamut of algorithms found in general purpose computing, including sorting, matrix operations, and graph algorithms. The suite inc ..."
Abstract
-
Cited by 37 (7 self)
- Add to MetaCart
The RAW benchmark suite consists of twelve programs designed to facilitate comparing, validating, and improving reconfigurable computing systems. These benchmarks run the gamut of algorithms found in general purpose computing, including sorting, matrix operations, and graph algorithms. The suite includes an architecture-independent compilation framework, Raw Computation Structures (RawCS), to express each algorithm's dependencies and to support automatic synthesis, partitioning, and mapping to a reconfigurable computer. Within this framework, each benchmark is portably designed in both C and Behavioral Verilog and scalably parameterized to consume a range of hardware resource capacities. To establish initial benchmark ratings, we have targeted a commercial logic emulation system based on virtual wires technology to automatically generate designs up to millions of gates (14 to 379 FPGAs). Because the virtual wires techniques abstract away machine-level details like FPGA capacity and interconnect, our hardware target for this system is an abstract reconfigurable logic fabric with memorymapped host I/O. We report initial speeds in the range of 2X to 1800X faster than a 2.82 SPECint95 SparcStation 20 and encourage others in the field to run these benchmarks on other systems to provide a standard comparison.

