Results 1 - 10
of
92
Exploiting Superword Level Parallelism with Multimedia Instruction Sets
- in Proceedings of the SIGPLAN ’00 Conference on Programming Language Design and Implementation
, 2000
"... Increasing focus on multimedia applications has prompted the addition of multimedia extensions to most existing general-purpose microprocessors. This added functionality comes primarily in the addition of short SIMD instructions. Unfortunately, access to these instructions is limited to in-line asse ..."
Abstract
-
Cited by 69 (8 self)
- Add to MetaCart
Increasing focus on multimedia applications has prompted the addition of multimedia extensions to most existing general-purpose microprocessors. This added functionality comes primarily in the addition of short SIMD instructions. Unfortunately, access to these instructions is limited to in-line assembly and library calls. Some researchers have proposed using vector compilers as a means of exploiting multimedia instructions. Although vectorization technology is well understood, it is inherently complex and fragile. In addition, it is incapable of locating SIMD-style parallelism within a basic block. In this paper we introduce the concept of Superword Level Parallelism(SLP), a novel way of viewing parallelism in multimedia applications. We believe SLP is fundamentally different from the loop-level parallelism exploited by traditional vector processing, and therefore warrants a different method for extracting it. We have developed a simple and robust compiler technique for detecting SLP that targets basic blocks rather than loop nests. As with techniques designed to extract ILP, ours is able to exploit parallelism both across loop iterations and within basic blocks. The result is an algorithm that provides excellent performance in several application domains. Experiments on scientific and multimedia benchmarks have yielded average performance improvements of 84%, and range as high as 253%.
The Concert System -- Compiler and Runtime Support for Efficient, Fine-Grained Concurrent Object-Oriented Programs
, 1993
"... The introduction of concurrency complicates the already difficult task of large-scale programming. Concurrent object-oriented languages provide a mechanism, encapsulation, for managing the increased complexity of large-scale concurrent programs, thereby reducing the difficulty of large scale conc ..."
Abstract
-
Cited by 47 (12 self)
- Add to MetaCart
The introduction of concurrency complicates the already difficult task of large-scale programming. Concurrent object-oriented languages provide a mechanism, encapsulation, for managing the increased complexity of large-scale concurrent programs, thereby reducing the difficulty of large scale concurrent programming. In particular, fine-grained object-oriented approaches provide modularity through encapsulation while exposing large degrees of concurrency. Though fine-grained concurrent object-oriented languages are attractive from a programming perspective, they have historically suffered from poor efficiency. The goal of the Concert project is to develop portable, efficient implementations of finegrained concurrent object-oriented languages. Our approach incorporates careful program analysis and information management at every stage from the compiler to the runtime system. In this document, we outline the basic elements of the Concert approach. In particular, we discuss progr...
Permutation Warping for Data Parallel Volume Rendering
- In Proceedings of the Parallel Rendering Symposium
, 1993
"... Volume rendering algorithms visualize sampled three dimensional data. A variety of applications create sampled data, including medical imaging, simulations, animation, and remote sensing. Researchers have sought to speed up volume rendering because of the high run time and wide application. Our algo ..."
Abstract
-
Cited by 19 (5 self)
- Add to MetaCart
Volume rendering algorithms visualize sampled three dimensional data. A variety of applications create sampled data, including medical imaging, simulations, animation, and remote sensing. Researchers have sought to speed up volume rendering because of the high run time and wide application. Our algorithm uses permutation warping to achieve linear speedup on data parallel machines. This new algorithm calculates higher quality images than previous distributed approaches, and also provides more view angle freedom. We present permutation warping results on the SIMD MasPar MP-1. The efficiency results from nonconflicting communication. The communication remains efficient with arbitrary view directions, larger data sets, larger parallel machines, and high order filters. We show constant run time versus view angle, tunable filter quality, and efficient memory implementation. 1 Introduction Volume rendering [4] is memory and compute bound. Researchers have used parallelism to speedup transpa...
Supporting the hypercube programming model on mesh architectures (A fast sorter for iWarp tori)
, 1992
"... ..."
Control structures for data-parallel SIMD languages: semantics and implementation
- FUTURE GENERATION COMPUTER SYSTEMS
, 1992
"... We define a simple language which encapsulates the main concepts of SIMD data-parallel programming, and we give its operational semantics. This language includes a unique data-parallel control structure called multitype conditioning and escape. We show that it suffices to express all data-paralle ..."
Abstract
-
Cited by 15 (6 self)
- Add to MetaCart
We define a simple language which encapsulates the main concepts of SIMD data-parallel programming, and we give its operational semantics. This language includes a unique data-parallel control structure called multitype conditioning and escape. We show that it suffices to express all data-parallel extensions of usual scalar control structures of C, as found in C*, MPL, POMPC etc. Moreover, we give a formal correctness proof for two different implementations of this new statement, respectively by a single context stack, and by a set of counters. Thus, this simple language appears as an interesting basis to study data-parallel SIMD programming methodology.
The Evaluation of Massively Parallel Array Architectures
, 1994
"... Computer Science to the memory of my mother Acknowledgments This dissertation would not have been possible without the help of many people. First, I would like to thank my committee for their many helpful comments and suggestions. Specifically, Al Hanson who taught me about computer vision, Wayne Bu ..."
Abstract
-
Cited by 13 (7 self)
- Add to MetaCart
Computer Science to the memory of my mother Acknowledgments This dissertation would not have been possible without the help of many people. First, I would like to thank my committee for their many helpful comments and suggestions. Specifically, Al Hanson who taught me about computer vision, Wayne Burleson who taught me about VLSI, and Don Towsley who taught me about performance evaluation. Most especially, I’d like to thank my committee chair and my advisor and mentor for my entire graduate career, Chip Weems. Besides teaching me about architecture and writing, he suggested the final form of the topic, pulled me out of many blind alleys, and his vast store of knowledge was a constant help. Many other professors at UMass also contributed to my knowledge of computer science and so helped me with this dissertation. I would especially like to thank Arny Rosenberg who not only taught me theory but more importantly how and where to apply it, and Ed Riseman who’s boundless energy and optimism serves as a model for all of us. The first level of discussion and comments is always with the fellow graduate students in one’s
Vessel extraction in medical images by wave-propagation and traceback
- IEEE Transactions on Medical Imaging
, 2001
"... This paper presents an approach for the extraction of vasculature from angiography im-ages by using a wave propagation and traceback mechanism. We discuss both the theory and the implementation of the approach. Using a dual-sigmoidal filter, we label each pixel in an angiogram with the likelihood th ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
This paper presents an approach for the extraction of vasculature from angiography im-ages by using a wave propagation and traceback mechanism. We discuss both the theory and the implementation of the approach. Using a dual-sigmoidal filter, we label each pixel in an angiogram with the likelihood that it is within a vessel. Representing the reciprocal of this likelihood image as an array of refractive indices, we propagate a digital wave through the im-age from the base of the vascular tree. This wave ‘washes ’ over the vasculature, ignoring local noise perturbations. The extraction of the vasculature becomes that of tracing the wave along the local normals to the waveform. While the approach is inherently SIMD, we present an efficient sequential algorithm for the wave propagation, and discuss the traceback algorithm. 1 We demonstrate the effectiveness of our integer image neighborhood-based algorithm and its robustness to image noise.
Reconfiguring Arrays with Faults Part I: Worst-Case Faults
- SIAM Journal on Computing
, 1997
"... . In this paper we study the ability of array-based networks to tolerate worst-case faults. We show that an N \Theta N two-dimensional array can sustain N 1\Gammaffl worst-case faults, for any fixed ffl ? 0, and still emulate T steps of a fully functioning N \Theta N array in O(T +N) steps, i.e., ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
. In this paper we study the ability of array-based networks to tolerate worst-case faults. We show that an N \Theta N two-dimensional array can sustain N 1\Gammaffl worst-case faults, for any fixed ffl ? 0, and still emulate T steps of a fully functioning N \Theta N array in O(T +N) steps, i.e., with only constant slowdown. Previously it was known only that an array could tolerate a constant number of faults with constant slowdown. We also show that if faulty nodes are allowed to communicate, but not compute, then an N-node one-dimensional array can tolerate log k N worst-case faults, for any constant k ? 0, and still emulate a fault-free array with constant slowdown, and this bound is tight. Key words. fault tolerance, array-based network, mesh network, network emulation AMS subject classifications. 68M07, 68M10, 68M15, 68Q68 1. Introduction. In a truly large parallel computer, some components are bound to fail. Knowing this, a programmer can write software that explicitly cope...
The Data-Parallel Programming Model: a Semantic Perspective
, 1992
"... We propose a short introduction to the Data-Parallel programming model. We show that parallel computing often makes little distinction between the execution model and the programming model. This results in poor programming and low portability. Using the "GOTO considered harmful" seminal analogy, ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
We propose a short introduction to the Data-Parallel programming model. We show that parallel computing often makes little distinction between the execution model and the programming model. This results in poor programming and low portability. Using the "GOTO considered harmful" seminal analogy, we show that data-parallelism can be seen as a way out of this collapsing. We show that this model was already present in several works on parallel programming methodology, and that it can be characterized by a small number of concepts with simple semantics.

