Results 1 - 10
of
11
Programmable Active Memories: Reconfigurable Systems Come of Age
- IEEE Transactions on VLSI Systems
, 1996
"... Programmable Active Memories (PAM) are a novel form of universal reconfigurable hardware co-processor. Based on Field-Programmable Gate Array (FPGA) technology, a PAM is a virtual machine, controlled by a standard microprocessor, which can be dynamically and indefinitely reconfigured into a large nu ..."
Abstract
-
Cited by 123 (5 self)
- Add to MetaCart
Programmable Active Memories (PAM) are a novel form of universal reconfigurable hardware co-processor. Based on Field-Programmable Gate Array (FPGA) technology, a PAM is a virtual machine, controlled by a standard microprocessor, which can be dynamically and indefinitely reconfigured into a large number of application-specific circuits. PAMs offer a new mixture of hardware performance and software versatility. We review the important architectural features of PAMs, through the example of DECPeRLe-1, an experimental device built in 1992. PAM programming is presented, in contrast to classical gate-array and full custom circuit design. Our emphasis is on large, code-generated synchronous systems descriptions
Improving Functional Density Through Run-Time Circuit Reconfiguration
, 1997
"... orting a C compiler to the DISC processor. Justin Diether assisted in the design, hand-layout, and testing of many partially reconfigured circuits. I would also like to thank Paul Graham for his generous assistance and support of our many mutual activities, classes, and projects at BYU. Other gradua ..."
Abstract
-
Cited by 42 (2 self)
- Add to MetaCart
orting a C compiler to the DISC processor. Justin Diether assisted in the design, hand-layout, and testing of many partially reconfigured circuits. I would also like to thank Paul Graham for his generous assistance and support of our many mutual activities, classes, and projects at BYU. Other graduate students assisting me with this work include Russel Peterson, Mike Rencher, Richard Ross, and Peter Bellows. My advisor, Brad Hutchings, provided essential assistance and encouragement in all of the projects, ideas, and results presented within this work. My decision to complete this degree and write this dissertation was influenced largely by his advice and positive encouragement. Brent Nelson and other faculty members within the Electrical and Computer Engineering department at BYU have provided critical feedback on a wide variety of topics relating to this work. I would also like to acknowledge the insight and assistance of many collaborators researching closely related subjects. For
Configurable computing solutions for automatic target recognition
- Proceedings of IEEE Workshop on FPGAs for Custom Computing Machines
, 1996
"... FPGAs can be used to build systems for automatic target recognition (ATR) that achieve an order of magnitude increase in performance over systems built using general purpose processors. This improvement is possible because the bit-level operations that comprise much of the ATR computational burden m ..."
Abstract
-
Cited by 40 (0 self)
- Add to MetaCart
FPGAs can be used to build systems for automatic target recognition (ATR) that achieve an order of magnitude increase in performance over systems built using general purpose processors. This improvement is possible because the bit-level operations that comprise much of the ATR computational burden map extremely efficiently into FPGAs, and because the specificity of ATR target templates can be leveraged via fast reconfiguration. We describe here algorithms, design tools, and implementation strategies that are being used in a configurable computing
Mesh Routing Topologies for Multi-FPGA Systems
, 1999
"... There is currently great interest in using fixed arrays of FPGAs for logic emulators, custom computing devices, and software accelerators. An important part of designing such a system is determining the proper routing topology to use to interconnect the FPGAs. This topology can have a great effect o ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
There is currently great interest in using fixed arrays of FPGAs for logic emulators, custom computing devices, and software accelerators. An important part of designing such a system is determining the proper routing topology to use to interconnect the FPGAs. This topology can have a great effect on the area and delay of the resulting system. Crossbar, Hierarchical Crossbar, and Mesh interconnection schemes have all been proposed for use in FPGA-based systems. In this paper we examine Mesh interconnection schemes, and propose several constructs for more efficient topologies. These reduce inter-chip delays by more than 60% over the basic 4-way Mesh.
Adaptive Explicitly Parallel Instruction Computing
, 2000
"... Current processors are programmed through a fixed interface called the Instruction Set Architecture (ISA). Consequently, a compiler targeting such a processor is forced to choose instructions from the provided instruction set while generating code for a given application. Often this instruction set ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Current processors are programmed through a fixed interface called the Instruction Set Architecture (ISA). Consequently, a compiler targeting such a processor is forced to choose instructions from the provided instruction set while generating code for a given application. Often this instruction set is not a suitable match for the computational requirements of the application program. With in this context, we ask ourselves the following questions. 1. Can application performance be improved if the compiler had the freedom to pick the instruction set on a per application basis? 2. Can we build cost-effective processors that provide the ability to efficiently emulate compiler determined instruction sets and yet are not application specific? 3. Given that the desired processor capabilities are feasible, can the compiler determine an optimal set of instructions for a given application and generate code that can effectively exploit the processor capabilities? In this thesis, we provide sufficient evidence to answer these questions in the affirmative. Through a combination of architectural innovations and novel compilation techniques, this dissertation demonstrates that it is possible to attain significant improvement in performance, up to an order of magnitude in some cases, on general purpose and multimedia applications over comparable fixed ISA processors. We propose classes of microprocessors that allow application programs to add and subtract functional units yielding a dynamically varying instruction set interface to the running application without compromising current compatibility model. First half of this dissertation describes this novel class of architectures, focusing on a specific subclass called Adaptive Explicitly Parallel Instruction Computing (AEPIC) architectures...
FPGA-Based Sonar Processing
- In Proceedings of the Sixth ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA ’98
, 1998
"... This paper presents the application of time-delay sonar beamforming and discusses a multi-board FPGA system for performing several variations of this beamforming method in real-time for realistic sonar arrays. Additionally, we show that our proposed FPGA system has a six to twelve times performance ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
This paper presents the application of time-delay sonar beamforming and discusses a multi-board FPGA system for performing several variations of this beamforming method in real-time for realistic sonar arrays. Additionally, we show that our proposed FPGA system has a six to twelve times performance advantage over an equivalent system created using currently available, high-performance DSPs designed for multiprocessing systems. This performance advantage is due to the simplicity of the core calculation, the limitations of the the DSP's address calculation hardware, and the ability to customize the I/O of the FPGA to the application. 1 Introduction Field-programmable gate arrays (FPGAs) have been used for many computational tasks since their invention [2, 1, 6, 9, 11]. In much of the work to date, FPGAs have been found to be reasonable alternatives to custom hardware (ASICs) or software implementations of applications --- they provide speed-ups over software through hardware specializat...
A Comparison Of FPGA Platforms Through SAR/ATR Algorithm Implementation
, 1996
"... As computing platforms gain greater and greater computational power, new applications that previously were unthinkable are being developed. One such application is the ability to automatically identify objects in radar images called Automatic Target Recognition (ATR). This thesis specifically deals ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
As computing platforms gain greater and greater computational power, new applications that previously were unthinkable are being developed. One such application is the ability to automatically identify objects in radar images called Automatic Target Recognition (ATR). This thesis specifically deals with ATR algorithms developed to search for objects in Synthetic Aperture Radar (SAR) images. The algorithms require more computational power than is currently available in any platforms. These algorithms were used as a tool to compare two reconfigurable hardware platforms because of these high computational requirements. Two implementations of ATR for SAR have been developed to compare Teramac and Splash-2 Field Programmable Gate Array (FPGA) based platforms. This comparison shows Teramac's strength as an exploratory platform and Splash-2's strength as an implementation platform for linear systolic array designs. COMMITTEE APPROVAL: Brad L. Hutchings, Committee Chairman James K. Archibald, ...
A description, analysis, and comparison of a hardware and a software implementation of the SPLASH genetic algorithm for optimizing symmetric traveling salesman problems
, 1996
"... This thesis presents Splash 2 hardware and HP PA-RISC software implementations of a genetic algorithm for symmetric traveling salesman problems, providing an analysis and comparison of the implementations. Despite an 11 times clock rate disadvantage, the hardware outperformed the software executing ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This thesis presents Splash 2 hardware and HP PA-RISC software implementations of a genetic algorithm for symmetric traveling salesman problems, providing an analysis and comparison of the implementations. Despite an 11 times clock rate disadvantage, the hardware outperformed the software executing on a state-of-the-art workstation by factors of 2.25 to 4.33 times in execution time. Through detailed cyclelevel and instruction-level analysis, the thesis shows that the factors which contributed the most to the more than 20 times execution cycle advantage of the hardware over the software are hardware pipelining, hard-wired control, coarse-grained parallelism, memory hierarchy e ciency, and random number generation, in order of decreasing contribution. Also presented in the thesis are the design and performance results of a parallel genetic algorithm implemented on the Splash 2 platform for the traveling salesman problem.
Analysis, Field Programmable Gate Arrays (FPGA), Reconfigurable Computing, High Performance Computing,
"... The integration of methodologies and techniques from parallel processing or High Performance Computing (HPC) with those of Reconfigurable Computing (RC) systems offers great potential for increased performance and flexibility for a wide range of computing problems. High Performance Computing archite ..."
Abstract
- Add to MetaCart
The integration of methodologies and techniques from parallel processing or High Performance Computing (HPC) with those of Reconfigurable Computing (RC) systems offers great potential for increased performance and flexibility for a wide range of computing problems. High Performance Computing architectures and Reconfigurable Computing systems have independently demonstrated performance advantages for applications such as digital signal processing, circuit simulation, and pattern recognition. By exploiting the near “hardware specific ” speed of Reconfigurable Computing systems in a Beowolf cluster there is potential for significant performance advantages over other software-only or uniprocessor solutions. In this paper we present our initial results for an analytical modeling framework for High Performance Reconfigurable Computing systems.

