Results 1 -
5 of
5
Special Purpose Parallel Computing
- Lectures on Parallel Computation
, 1993
"... A vast amount of work has been done in recent years on the design, analysis, implementation and verification of special purpose parallel computing systems. This paper presents a survey of various aspects of this work. A long, but by no means complete, bibliography is given. 1. Introduction Turing ..."
Abstract
-
Cited by 77 (5 self)
- Add to MetaCart
A vast amount of work has been done in recent years on the design, analysis, implementation and verification of special purpose parallel computing systems. This paper presents a survey of various aspects of this work. A long, but by no means complete, bibliography is given. 1. Introduction Turing [365] demonstrated that, in principle, a single general purpose sequential machine could be designed which would be capable of efficiently performing any computation which could be performed by a special purpose sequential machine. The importance of this universality result for subsequent practical developments in computing cannot be overstated. It showed that, for a given computational problem, the additional efficiency advantages which could be gained by designing a special purpose sequential machine for that problem would not be great. Around 1944, von Neumann produced a proposal [66, 389] for a general purpose storedprogram sequential computer which captured the fundamental principles of...
The Warp Computer: Architecture, Implementation, and Performance
- IEEE Transactions on Computers
, 1987
"... The Warp machine is a systolic array computer of linearly connected cells, each of which is a programmable processor capable of performing 10 million floating-point operations per second (10 MFLOPS). A typical Warp array includes 10 cells, thus having a peak computation rate of 100 MFLOPS. The Warp ..."
Abstract
-
Cited by 42 (2 self)
- Add to MetaCart
The Warp machine is a systolic array computer of linearly connected cells, each of which is a programmable processor capable of performing 10 million floating-point operations per second (10 MFLOPS). A typical Warp array includes 10 cells, thus having a peak computation rate of 100 MFLOPS. The Warp array can be extended to include more cells to accommodate applications capable of using the increased computational bandwidth. Warp is integrated as an attached processor into a UN host system. Programs for Warp are written in a high-level language supported by an optimizing compiler.
Parallel algorithms for image enhancement and segmentation by region growing with an experimental study
- THE JOURNAL OF SUPERCOMPUTING
, 1996
"... This paper presents efficient and portable implementations of a useful image enhancement process, the Symmetric Neighborhood Filter (SNF), and an image segmentation technique which makes use of the SNF and a variant of the conventional connected components algorithm which we call delta-Connected Com ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
This paper presents efficient and portable implementations of a useful image enhancement process, the Symmetric Neighborhood Filter (SNF), and an image segmentation technique which makes use of the SNF and a variant of the conventional connected components algorithm which we call delta-Connected Components. Our general framework is a single-address space, distributed memory programming model. We use efficient techniques for distributing and coalescing data as well as efficient combinations of task and data parallelism. The image segmentation algorithm makes use of an efficient connected components algorithm which uses a novel approach for parallel merging. The algorithms have been coded in Split-C and run on a variety of platforms, including the Thinking Machines CM-5, IBM SP-1 and SP-2, Cray Research T3D, Meiko Scientific CS-2, Intel Paragon, and workstation clusters. Our experimental results are consistent with the theoretical analysis (and provide the best known execution times for segmentation, even when compared with machine-specific implementations.) Our test data include difficult images from the Landsat Thematic Mapper (TM) satellite data. More efficient implementations of Split-C will likely result in even faster execution times.
Computational Architectures for Responsive Vision: the Vision Engine
- In Proceedings of Computer Architectures for Machine Perception
, 1991
"... To respond actively to a dynamic environment, a vision system must process perceptual data in real time, and in multiple modalities. The structure of the computational load varies across the levels of vision, requiring multiple architectures. We describe the Vision Engine, a system with a pipelined ..."
Abstract
-
Cited by 10 (8 self)
- Add to MetaCart
To respond actively to a dynamic environment, a vision system must process perceptual data in real time, and in multiple modalities. The structure of the computational load varies across the levels of vision, requiring multiple architectures. We describe the Vision Engine, a system with a pipelined early vision architecture, Datacube image processors, connected to a MIMD intermediate vision system, a set of Transputers. The system uses a controllable eye/head for tasks involving motion, stereo and tracking. A simple pipeline model describes image transformation through multiple functional stages in early vision. Later processing (e.g., segmentation, edge linking, perceptual organization) cannot easily proceed on a pipeline architecture. A MIMD architecture is more appropriate for the irregular data and functional parallelism of later visual processing. The Vision Engine is designed for general vision tasks. Early vision processing, both optical flow and stereo, is implemented in near r...
Object Recognition on a Systolic Array
, 1987
"... Computer vision systems for recognition include both the extraction of features and the matching of those features with a known model. Traditionally, the most time consuming step has been feature extraction, but new parallel architectures are removing the bottleneck at this level. Once features have ..."
Abstract
- Add to MetaCart
Computer vision systems for recognition include both the extraction of features and the matching of those features with a known model. Traditionally, the most time consuming step has been feature extraction, but new parallel architectures are removing the bottleneck at this level. Once features have been extracted from an image considerable geometric search is still necessary to form relationships between the extracted features and to match those features and feature aggregates with a model. One can take advantage of certain constraints about the appearance of an object, but with complex images or multiple models intensive processing is still required. We have developed some algorithms for doing these geometric search operations in parallel on iWarp, a long linear array of VLSI processing elements currently being designed by Carnegie Mellon and Intel Corporation. We have simulated a system which uses these algorithms to do an object recognition task (after low-level vision) almost completely on a 72 processor iWarp array. An analysis of this system indicates a speedup by a factor of roughly 100 to 250 over a sequential version running on a VAX 8650. A common paradigm for computer vision recognition systems, which was originally used by Roberts [10] is: 1. Extract features from an image.

