Results 1 - 10
of
10
Computer Vision Algorithms on Reconfigurable Logic Arrays
- IEEE TRANS. ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1999
"... Computer vision algorithms are natural candidates for high performance computing due to their inherent parallelism and intense computational demands. For example, a simple 3 x 3 convolution on a 512 x 512 gray scale image at 30 frames per second requires 67.5 million multiplications and 60 million a ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Computer vision algorithms are natural candidates for high performance computing due to their inherent parallelism and intense computational demands. For example, a simple 3 x 3 convolution on a 512 x 512 gray scale image at 30 frames per second requires 67.5 million multiplications and 60 million additions to be performed in one second. Computer vision tasks can be classified into three categories based on their computational complexity andcommunication complexity: low-level, intermediate-level and high-level. Special-purpose hardware provides better performance compared to a general-purpose hardware for all the three levels of vision tasks. With recent advances in very large scale integration (VLSI) technology, an application specific integrated circuit (ASIC) can provide the best performance in terms of total execution time. However, long design cycle time, high development cost and inflexibility of a dedicated hardware deter design of ASICs. In contrast, field programmable gate arrays (FPGAs) support lower design verification time and easier design adaptability atalower cost. Hence, FPGAs with an array of reconfigurable logic blocks canbevery useful compute elements. FPGA-based custom computing machines are
Portable and scalable algorithms for irregular all-to-all communication
- In 16th ICDCS
, 1996
"... In irregular all-to-all communication, messages are exchanged between every pair of processors. The message sizes vary from processor to processor and are known only at run time. This is a fundamental communication primitive in parallelizing irregularly structured scientific computations. Our algori ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
In irregular all-to-all communication, messages are exchanged between every pair of processors. The message sizes vary from processor to processor and are known only at run time. This is a fundamental communication primitive in parallelizing irregularly structured scientific computations. Our algorithm reduces the total number of message start-ups. It also reduces node contention by smoothing out the lengths of the messages communicated. As compared to the earlier approaches, our algorithm provides deterministic performance and also reduces the buffer space at the nodes during message passing. The performance of the algorithm is characterised using a simple communication model of high-performance computing (HPC)platforms. We show the implementation on T3D and SP2 using C and the message passing interface standard. These can be easily ported to other HPC platforms. The results show the effectiveness of the proposed technique as well as the interplay among the machine size, the variance in message length, and the network
Parallel Object Recognition on an FPGA-based Configurable Computing Platform
- In International Workshop on Computer Architectures for Machine Perception
, 1997
"... Object recognition involves identifying known objects in a given scene. It plays a key role in image understanding. Geometric hashing has been proposed as a technique for model-based object recognition in occluded scenes. However, parallel techniques are needed to realize real-time vision systems em ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
Object recognition involves identifying known objects in a given scene. It plays a key role in image understanding. Geometric hashing has been proposed as a technique for model-based object recognition in occluded scenes. However, parallel techniques are needed to realize real-time vision systems employing geometric hashing. In this paper, we develop a design technique for parallelizing geometric hashing on an FPGA-based platform. We first transform the hash table which contains symbolic data into a bit-level representation. By regularizing the data flow and exploiting bit-level parallelism in hardware, our design achieves high performance. Using our approach, given a scene consisting of 256 feature points, a probe can be performed in 1.65 milliseconds on an FPGA-based platform having 32 Xilinx 4062s. In earlier implementations, the same probe operation was performed in 240 milliseconds on a 32K-node CM2 and in 382 milliseconds on a 32-node CM5. Also, the same operation takes 40 millis...
A Fast Asynchronous Algorithm for Linear Feature Extraction on IBM SP-2
- Proc. of the Computer Architectures for Machine Perception
, 1995
"... In this paper, we present a fast parallel implementation of linear feature extraction on IBM SP-2. We first analyze the machine features and the problem characteristics to understand the overheads in parallel solutions to the problem. Based on these, we propose an asynchronous algorithm which enhanc ..."
Abstract
- Add to MetaCart
In this paper, we present a fast parallel implementation of linear feature extraction on IBM SP-2. We first analyze the machine features and the problem characteristics to understand the overheads in parallel solutions to the problem. Based on these, we propose an asynchronous algorithm which enhances processor utilization and overlaps communication with computation by maintaining algorithmic threads in each processing node. Our implementation shows that, given a 512 \Theta 512 image, the linear feature extraction task can be performed in 0.065 seconds on a SP-2 having 64 processing nodes. A serial implementation takes 3.45 seconds on a single processing node of SP-2. A previous implementation on CM-5 takes 0.1 second on a partition of 512 processing nodes. Experimental results on various sizes of images using 4, 8, 16, 32, and 64 processing nodes are also reported. 1 Introduction In distributed memory machines, the processing nodes are interconnected by an interconnection network an...
Parallel Implementations of Perceptual Grouping Tasks on Distributed Memory Machines
- Connection Machine CM-5", International Conference on Pattern Recognition
, 1994
"... In this paper, we propose parallel implementations for solving Perceptual Grouping tasks on distributed memory machines. Our implementations show that, given 7K line segments extracted from a 1K \Theta 1K image, the Line Grouping task can be performed in 0.486 seconds using a partition of CM-5 havi ..."
Abstract
- Add to MetaCart
In this paper, we propose parallel implementations for solving Perceptual Grouping tasks on distributed memory machines. Our implementations show that, given 7K line segments extracted from a 1K \Theta 1K image, the Line Grouping task can be performed in 0.486 seconds using a partition of CM-5 having 256 processing nodes and in 0.382 seconds using a 16-node Cray T3D. The serial implementation written in C takes 20.368 seconds and 4.181 seconds using 1-node CM-5 and 1-node T3D respectively. Our code is written in C and MPI message passing standard and can be easily ported to other high performance computing platforms. 1 Introduction Many distributed memory machines are commercially available. These include IBM SP-2, TMC CM-5, Intel Paragon, Cray T3D, among others. The scalability of these machines as the machine size is varied and the flexibility of parallel program development using messagepassing makes them suitable for solving computer vision problems efficiently [Wang, 1995]. Per...
Parallelization of Perceptual Grouping on Distributed Memory Machines
- Proc. of Computer Architectures for Machine Perception
, 1995
"... In this paper, we propose architecture-independent parallel algorithms for solving Perceptual Grouping tasks on distributed memory machines. Given an n \Theta n image, using P processors, we show that these tasks can be performed in O( n 2 P ) computation time and 20 p PT d + 8(log P )T d + ..."
Abstract
- Add to MetaCart
In this paper, we propose architecture-independent parallel algorithms for solving Perceptual Grouping tasks on distributed memory machines. Given an n \Theta n image, using P processors, we show that these tasks can be performed in O( n 2 P ) computation time and 20 p PT d + 8(log P )T d + ( 40n p P +20P )ø d communication time, where T d is the communication startup time and ø d is the transmission rate. Our implementations show that, given 7K line segments extracted from a 1K \Theta 1K image, the Line Grouping task can be performed in 1.115 seconds using a partition of CM-5 having 256 processing nodes and in 0.382 seconds using a 16-node Cray T3D. Our code is written in C and MPI message passing standard and can be easily ported to other high performance computing platforms. 1 Introduction Many distributed memory machines are commercially available. These include IBM SP-2, TMC CM-5, Intel Paragon, Cray T3D, among others. The scalability of these machines as the machi...
Scalable Data Parallel Object Recognition using Geometric Hashing on CM-5
- on the CM-5. Scalable High Performance Computing Conference, SHPCC
, 1994
"... In this paper, we present scalable parallel algorithms for object recognition using geometric hashing. We define an abstract model of CM-5. We develop a loadbalancing technique that results in scalable processortime optimal algorithms for performing a probe on the CM-5 model. Given a model of CM-5 w ..."
Abstract
- Add to MetaCart
In this paper, we present scalable parallel algorithms for object recognition using geometric hashing. We define an abstract model of CM-5. We develop a loadbalancing technique that results in scalable processortime optimal algorithms for performing a probe on the CM-5 model. Given a model of CM-5 with P PNs and a set S of feature points in a scene, a probe of the recognition phase can be performed in O( jV (S)j P ) time, where V (S) is the set of votes cast by feature points in S. This algorithm is scalable in the range 1 P p jV (S)j= log jV (S)j. These results do not assume any distributions of hash bin lengths or scene points. The implementations developed in this paper require number of processors independent of the size of the model database and are scalable with the machine size. 1 Introduction Object recognition is a key step in an integrated vision system. Most model-based recognition systems work by hypothesizing matches between scene features and model features, pred...
Parallel Algorithms for Linear Approximation on Distributed Memory Machines
"... In this paper, we summarize our results in parallelizing the linear approximation step on current distributed memory machines. We first analyze the features of current distributed memory machines and the problem characteristics to understand the overheads in parallel solutions to the problem. Based ..."
Abstract
- Add to MetaCart
In this paper, we summarize our results in parallelizing the linear approximation step on current distributed memory machines. We first analyze the features of current distributed memory machines and the problem characteristics to understand the overheads in parallel solutions to the problem. Based on these, we propose an asynchronous algorithm which enhances processor utilization and overlaps communication with computation by maintaining algorithmic threads in each processing node. Our implementation shows that, given a 512 \Theta 512 image, the linear approximation task can be performed in 0.015 seconds on a SP-2 having 64 processing nodes and in 0.032 seconds on a T3D having 32 processing nodes. A serial implementation takes 0.445 seconds on a single processing node of SP-2 and 0.779 seconds on a single processing node of T3D. Experimental results on various sizes of images using 4, 8, 16, 32, and 64 processing nodes are also reported. 1 Introduction In distributed memory machines...
A Robust Neural Network Based Object Recognition System and its SIMD Implementation
"... Recognition of objects is a particularly demanding problem, if one considers that each image must be interpreted in milliseconds (usually 30 or 40 frames/second). The problem becomes more difficult if the objects are distorted and/or partially occluded. In this case a sequence of local features ..."
Abstract
- Add to MetaCart
Recognition of objects is a particularly demanding problem, if one considers that each image must be interpreted in milliseconds (usually 30 or 40 frames/second). The problem becomes more difficult if the objects are distorted and/or partially occluded. In this case a sequence of local features are to be extracted, combined in a global shape description and classified as belonging to pre-defined sets of known shapes (reference shapes). In this paper we propose a massively parallel object recognition system, which makes use of the multi polygonal approximation scheme for the extraction of rotation and translation invariant shape features, in connection with artificial neural networks for the parallel classification of the extracted features. The system has been successfully applied for recognizing aircraft shapes in different sizes, orientations, with the addition of noise distortion and occlusion. Timings on the Connection Machine 200 are also reported. 1

