Results 1 - 10
of
11
The Warp Computer: Architecture, Implementation, and Performance
- IEEE Transactions on Computers
, 1987
"... The Warp machine is a systolic array computer of linearly connected cells, each of which is a programmable processor capable of performing 10 million floating-point operations per second (10 MFLOPS). A typical Warp array includes 10 cells, thus having a peak computation rate of 100 MFLOPS. The Warp ..."
Abstract
-
Cited by 42 (2 self)
- Add to MetaCart
The Warp machine is a systolic array computer of linearly connected cells, each of which is a programmable processor capable of performing 10 million floating-point operations per second (10 MFLOPS). A typical Warp array includes 10 cells, thus having a peak computation rate of 100 MFLOPS. The Warp array can be extended to include more cells to accommodate applications capable of using the increased computational bandwidth. Warp is integrated as an attached processor into a UN host system. Programs for Warp are written in a high-level language supported by an optimizing compiler.
Parallel Algorithms for Image Histogramming and Connected Components with an Experimental Study
- JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1994
"... This paper presents efficient and portable implementations of two useful primitives in image processing algorithms, histogramming and connected components. Our general framework is a single-address space, distributed memory programming model. We use efficient techniques for distributing and coalesci ..."
Abstract
-
Cited by 21 (8 self)
- Add to MetaCart
This paper presents efficient and portable implementations of two useful primitives in image processing algorithms, histogramming and connected components. Our general framework is a single-address space, distributed memory programming model. We use efficient techniques for distributing and coalescing data as well as efficient combinations of task and data parallelism. Our connected components algorithm uses a novel approach for parallel merging which performs drastically limited updating during iterative steps, and concludes with a total consistency update at the nal step. The algorithms have been coded in Split-C and run on a variety of platforms. Our experimental results are consistent with the theoretical analysis and provide the best known execution times for these two primitives, even when compared with machine specific implementations. More efficient implementations of Split-C will likely result in even faster execution times.
Parallel Architectures and Parallel Algorithms for Integrated Vision Systems
- The Kluwer International Series In Engineering and Computer Science
, 1990
"... l e,' ..."
Parallel algorithms for image enhancement and segmentation by region growing with an experimental study
- THE JOURNAL OF SUPERCOMPUTING
, 1996
"... This paper presents efficient and portable implementations of a useful image enhancement process, the Symmetric Neighborhood Filter (SNF), and an image segmentation technique which makes use of the SNF and a variant of the conventional connected components algorithm which we call delta-Connected Com ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
This paper presents efficient and portable implementations of a useful image enhancement process, the Symmetric Neighborhood Filter (SNF), and an image segmentation technique which makes use of the SNF and a variant of the conventional connected components algorithm which we call delta-Connected Components. Our general framework is a single-address space, distributed memory programming model. We use efficient techniques for distributing and coalescing data as well as efficient combinations of task and data parallelism. The image segmentation algorithm makes use of an efficient connected components algorithm which uses a novel approach for parallel merging. The algorithms have been coded in Split-C and run on a variety of platforms, including the Thinking Machines CM-5, IBM SP-1 and SP-2, Cray Research T3D, Meiko Scientific CS-2, Intel Paragon, and workstation clusters. Our experimental results are consistent with the theoretical analysis (and provide the best known execution times for segmentation, even when compared with machine-specific implementations.) Our test data include difficult images from the Landsat Thematic Mapper (TM) satellite data. More efficient implementations of Split-C will likely result in even faster execution times.
The Evaluation of Massively Parallel Array Architectures
, 1994
"... Computer Science to the memory of my mother Acknowledgments This dissertation would not have been possible without the help of many people. First, I would like to thank my committee for their many helpful comments and suggestions. Specifically, Al Hanson who taught me about computer vision, Wayne Bu ..."
Abstract
-
Cited by 13 (7 self)
- Add to MetaCart
Computer Science to the memory of my mother Acknowledgments This dissertation would not have been possible without the help of many people. First, I would like to thank my committee for their many helpful comments and suggestions. Specifically, Al Hanson who taught me about computer vision, Wayne Burleson who taught me about VLSI, and Don Towsley who taught me about performance evaluation. Most especially, I’d like to thank my committee chair and my advisor and mentor for my entire graduate career, Chip Weems. Besides teaching me about architecture and writing, he suggested the final form of the topic, pulled me out of many blind alleys, and his vast store of knowledge was a constant help. Many other professors at UMass also contributed to my knowledge of computer science and so helped me with this dissertation. I would especially like to thank Arny Rosenberg who not only taught me theory but more importantly how and where to apply it, and Ed Riseman who’s boundless energy and optimism serves as a model for all of us. The first level of discussion and comments is always with the fellow graduate students in one’s
An Environment for Evaluating Architectures for Spatially Mapped Computation: System Architecture and Preliminary Results
, 1993
"... : An environment which addresses several problems in evaluating massively parallel array architectures is described. A realistic workload including a series of applications currently being used as building blocks in vision research has been constructed. Both flexibility in architectural parameter se ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
: An environment which addresses several problems in evaluating massively parallel array architectures is described. A realistic workload including a series of applications currently being used as building blocks in vision research has been constructed. Both flexibility in architectural parameter selection and simulation efficiency are maintained by combining virtual machine emulation with trace driven simulation. The trade-off between fairness to diverse target architectures and programmability of the test programs is addressed through the use of operator and application libraries. Initial results are presented indicating the appropriate balance between register file and cache to optimize performance under varying levels of processor element virtualization. This paper also appears in the Proceedings of Computer Architectures for Machine Perception `93. y Authors' address: Department of Computer Science; University of Massachusetts; Amherst, MA 01003; NetAd : fherbordt,weemsg@cs.uma...
An Empirical Study of Datapath, Memory Hierarchy, and Network in SIMD Array Architectures
- In Proc. of the 1995 Int. Conf. on Computer Design
, 1995
"... : Although SIMD arrays have been built for 30 years, they have as a class been the subject of few empirical design studies. Using ENPASSANT, a simulation environment developed for that purpose, we analyze several aspects of SIMD array architecture with respect to a test suite of spatially mapped app ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
: Although SIMD arrays have been built for 30 years, they have as a class been the subject of few empirical design studies. Using ENPASSANT, a simulation environment developed for that purpose, we analyze several aspects of SIMD array architecture with respect to a test suite of spatially mapped applications. Several surprising results are obtained. With respect to memory hierarchy, we find that adding a level of cache to current PE designs is likely to be advantageous, but that such a cache will look quite different than expected. In particular, we find that associativity has unusual significance and that performance varies inversely with block size. Router network results indicate the importance of support for local transfers, broadcast, and reduction even at the expense of arbitrary permutations. Other communication results point to the appropriate dimensionality of k-ary n-cube networks (2 or 3), and the criticality of supporting bidirectional transfers, even if the overall bandw...
Making a Dataparallel Language Portable for Massively Parallel Array Computers
- IN PROC. OF COMPUTER ARCHITECTURES FOR MACHINE PERCEPTION
, 1997
"... A key goal in language design is to simultaneously achieve portability and efficiency. Achieving a general solution to this problem is quite difficult: virtually all attempts have emphasized one or the other requirement by restricting either the architecture domain, the application domain, or both ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
A key goal in language design is to simultaneously achieve portability and efficiency. Achieving a general solution to this problem is quite difficult: virtually all attempts have emphasized one or the other requirement by restricting either the architecture domain, the application domain, or both. In this study we present i) a framework that explains why meeting these requirements simultaneously is so difficult, and ii) our approach, which, though it may not be the final word on this subject, implements a new set of trade-offs that may come closer to a balanced solution than has been previously achieved. Our solution includes an easy to use language based on the dataparallel programmer's model, a compiler that hides as many machine variations as possible, a library with emulations of constructs that map directly to hardware on some but not all machines, and a library with different versions of those critical application functions for which a single algorithm is not optimal across al...
Processor/Memory/Array Size Tradeoffs in the Design of SIMD Arrays for a Spatially Mapped Workload
- In Proc. of Computer Architectures for Machine Perception
, 1997
"... : Though massively parallel SIMD arrays continue to be promising for many computer vision applications, they have undergone few systematic empirical studies. The problems include the size of the architecture space, the lack of portability of the test programs, and the inherent complexity of simulati ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
: Though massively parallel SIMD arrays continue to be promising for many computer vision applications, they have undergone few systematic empirical studies. The problems include the size of the architecture space, the lack of portability of the test programs, and the inherent complexity of simulating up to hundreds of thousands of processing elements. The latter two issues have been addressed previously, here we describe how spreadsheets and tk/tcl are used to endow our simulator with the flexibility to model a large variety of designs. The utility of this approach is shown in the second half of the paper where results are presented as to the performance of a large number of array size, datapath, register file, and application code combinations. The conclusions derived include the utility of multiplier and floating point support, the cost of virtual PE emulation, likely datapath /memory combinations, and overall designs with the most promising performance/chip area ratios. 1 Introduc...
The DARPA Image Understanding Motion Benchmark
"... Benchmarks and test suites are an essential element of the architectural evaluation process. At the conclusion of the last DARPA workshop on vision benchmarks to test the performance of parallel architectures, it was recommended that the DARPA Image Understanding Benchmark [Weems, 1991] be extended ..."
Abstract
- Add to MetaCart
Benchmarks and test suites are an essential element of the architectural evaluation process. At the conclusion of the last DARPA workshop on vision benchmarks to test the performance of parallel architectures, it was recommended that the DARPA Image Understanding Benchmark [Weems, 1991] be extended with a second level task to add motion and tracking to the original task. We have now developed this new benchmark and a sample solution. This paper describes the benchmark, and presents some timing results for various common workstations. 1. History of the DARPA Benchmark Effort One of the first parallel processor benchmarks to address vision-related processing was the Abingdon Cross benchmark, defined at the 1982 Multicomputer Workshop in Abingdon, England [Preston, 1986]. In that benchmark, an input image was specified that consisted of a dark background with a pair of brighter rectangular bars, equal in size, that cross at their midpoints and are centered in the image, and with Gaussia...

