Results 1 -
6 of
6
Field-Programmable Custom Computing Machines - A Taxonomy
, 2002
"... The ability for providing a hardware platform which can be customized on a per-application basis under software control has established Reconfigurable Computing (RC) as a new computing paradigm. A machine employing the RC paradigm is referred to as a Field-Programmable Custom Computing Machine ( ..."
Abstract
-
Cited by 18 (11 self)
- Add to MetaCart
The ability for providing a hardware platform which can be customized on a per-application basis under software control has established Reconfigurable Computing (RC) as a new computing paradigm. A machine employing the RC paradigm is referred to as a Field-Programmable Custom Computing Machine (FCCM). So far, the FCCMs have been classified according to implementation criteria. For the previous classifications do not reveal the entire meaning of the RC paradigm, we propose to classify the FCCMs according to architectural criteria.
Augmenting modern superscalar architectures with configurable extended instructions
- In Proceedings of the Reconfigurable Architectures Workshop RAW
, 2000
"... Abstract. The instruction sets of general-purpose microprocessors are designed to offer good performance across a wide range of programs. The size and complexity of the instruction sets, however, are limited by a need for generality and for streamlined implementation. The particular needs of one app ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract. The instruction sets of general-purpose microprocessors are designed to offer good performance across a wide range of programs. The size and complexity of the instruction sets, however, are limited by a need for generality and for streamlined implementation. The particular needs of one application are balanced against the needs of the full range of applications considered. For this reason, one can “design ” a better instruction set when considering only a single application than when considering a general collection of applications. Configurable hardware gives us the opportunity to explore this option. This paper examines the potential for automatically identifying application-specific extended instructions and implementing them in programmable functional units based on configurable hardware. Adding fine-grained reconfigurable hardware to the datapath of an out-of-order issue superscalar processor allows 4-44% speedups on the MediaBench benchmarks [1]. As a key contribution of our work, we present a selective algorithm for choosing extended instructions to minimize reconfiguration costs within loops. Our selective algorithm constrains instruction choices so that significant speedups are achieved with as few as 4 moderately sized programmable functional units, typically containing less than 150 look-up tables each. 1
an overview of the architecture
- In Proceedings 2nd International Conference on Databases: Improving demic
, 1981
"... Garp’s on-chip, reconfigurable coprocessor was tailored specifically for accelerating loops of general-purpose software applications. Its novel features inspired a unique approach to automatic compilation from C. Various projects and products have been built using off-the-shelf field-programmable ga ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Garp’s on-chip, reconfigurable coprocessor was tailored specifically for accelerating loops of general-purpose software applications. Its novel features inspired a unique approach to automatic compilation from C. Various projects and products have been built using off-the-shelf field-programmable gate arrays (FPGAs) as compute accelerators for specific tasks. Such systems typically connect one or more FPGAs to the host computer via an I/O bus. Some have shown remarkable speedups, albeit limited to specific application domains. Many factors limit the general usefulness of such systems. Long reconfiguration times prevent acceleration of applications that spread their time over many different tasks. Low-bandwidth paths for data transfer limit
Experience with a Hybrid Processor: K-Means Clustering
- Journal of Supercomputing, Vol
"... We discuss hardware/software co-processing on a hybrid processor for a compute- and data-intensive multispectral imaging algorithm, k-means clustering. The experiments are performed on two models of the Altera Excalibur board, the first using the soft IP core 32-bit NIOS 1.1 RISC processor, and the ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We discuss hardware/software co-processing on a hybrid processor for a compute- and data-intensive multispectral imaging algorithm, k-means clustering. The experiments are performed on two models of the Altera Excalibur board, the first using the soft IP core 32-bit NIOS 1.1 RISC processor, and the second with the hard IP core ARM processor. In our experiments, we compare performance of the sequential k-means algorithm with three different accelerated versions. We consider granularity and synchronization issues when mapping an algorithm to a hybrid processor. Our results show that speed-up of 11.8X is achieved by migrating computation to the Excalibur ARM hardware/software as compared to software only on a Gigahertz Pentium III. Speedup on the Excalibur NIOS is limited by the communication cost of transferring data from external memory through the processor to the customized circuits. This limitation is overcome on the Excalibur ARM, in which dual port memories, accessible to both the processor and configurable logic, have the biggest performance impact of all the techniques studied.
Early Experience with a Hybrid Processor: K-Means Clustering
- in ERSA '01: First International Conference on Engineering of Reconfigurable Systems and Algorithms (Las Vegas, NV
, 2000
"... We discuss hardware/software coprocessing on a hybrid processor for a compute- and data-intensive hyper-spectral imaging algorithm, KMeans Clustering. The experiments are performed on the Altera Excalibur board using the soft IP core 32-bit NIOS RISC processor. In our experiments, we compare perform ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We discuss hardware/software coprocessing on a hybrid processor for a compute- and data-intensive hyper-spectral imaging algorithm, KMeans Clustering. The experiments are performed on the Altera Excalibur board using the soft IP core 32-bit NIOS RISC processor. In our experiments, we compare performance of the sequential algorithm with two dierent accelerated versions. We consider granularity and synchronization issues when mapping an algorithm to a hybrid processor. Our results show that on the Excalibur NIOS, a 15% speedup can be achieved over the sequential algorithm on images with 8 spectral bands where the pixels are divided into 8 categories. Speedup is limitd by the communication cost of transferring data from external memory through the NIOS processor to the customized circuits. Our results indicate that future hybrid processors must either (1) have a clock rate 10X the speed of the congurable logic circuits or (2) include dual port memories that both the processor and congurable logic can access. If either of these conditions is met, the hybrid processor will show a factor of 10 speedup over the sequential algorithm. Such systems will combine the convenience of conventional processors with the speed of congurable logic. Keywords: congurable system on a chip, CSOC, Excalibur, K-means Clustering, image processing 1

