Results 1 - 10
of
12
Scans as Primitive Parallel Operations
- IEEE Transactions on Computers
, 1987
"... In most parallel random-access machine (P-RAM) models, memory references are assumed to take unit time. In practice, and in theory, certain scan operations, also known as prefix computations, can executed in no more time than these parallel memory references. This paper outline an extensive study of ..."
Abstract
-
Cited by 143 (12 self)
- Add to MetaCart
In most parallel random-access machine (P-RAM) models, memory references are assumed to take unit time. In practice, and in theory, certain scan operations, also known as prefix computations, can executed in no more time than these parallel memory references. This paper outline an extensive study of the effect of including in the P-RAM models, such scan operations as unit-time primitives. The study concludes that the primitives improve the asymptotic running time of many algorithms by an O(lg n) factor, greatly simplify the description of many algorithms, and are significantly easier to implement than memory references. We therefore argue that the algorithm designer should feel free to use these operations as if they were as cheap as a memory reference. This paper describes five algorithms that clearly illustrate how the scan primitives can be used in algorithm design: a radix-sort algorithm, a quicksort algorithm, a minimumspanning -tree algorithm, a line-drawing algorithm and a mergi...
Compiling Collection-Oriented Languages onto Massively Parallel Computers
- Journal of Parallel and Distributed Computing
, 1990
"... : This paper introduces techniques for compiling the nested parallelism of collectionoriented languages onto existing parallel hardware. Programmers of parallel machines encounter nested parallelism whenever they write a routine that performs parallel operations, and then want to call that routine ..."
Abstract
-
Cited by 78 (10 self)
- Add to MetaCart
: This paper introduces techniques for compiling the nested parallelism of collectionoriented languages onto existing parallel hardware. Programmers of parallel machines encounter nested parallelism whenever they write a routine that performs parallel operations, and then want to call that routine itself in parallel. This occurs naturally in many applications. Most parallel systems, however, do not permit the expression of nested parallelism. This forces the programmer to exploit only one level of parallelism or to implement nested parallelism themselves. Both of these alternatives tend to produce code that is harder to maintain and less modular than code described at a higher-level with nested parallel constructs. Not permitting the expression of nested parallelism is analogous to not permitting nested loops in serial languages. This paper describes issues and techniques for taking high-level descriptions of parallelism in the form of operations on nested collections and automaticall...
Parallel Implementation of Algorithms for Finding Connected Components in Graphs
, 1997
"... In this paper, we describe our implementation of several parallel graph algorithms for finding connected components. Our implementation, with virtual processing, is on a 16,384-processor MasPar MP-1 using the language MPL. We present extensive test data on our code. In our previous projects [21, 22, ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
In this paper, we describe our implementation of several parallel graph algorithms for finding connected components. Our implementation, with virtual processing, is on a 16,384-processor MasPar MP-1 using the language MPL. We present extensive test data on our code. In our previous projects [21, 22, 23], we reported the implementation of an extensible parallel graph algorithms library. We developed general implementation and fine-tuning techniques without expending too much effort on optimizing each individual routine. We also handled the issue of implementing virtual processing. In this paper, we describe several algorithms and fine-tuning techniques that we developed for the problem of finding connected components in parallel; many of the fine-tuning techniques are of general interest, and should be applicable to code for other problems. We present data on the execution time and memory usage of our various implementations.
Parallel techniques for computational geometry
- Proc. IEEE
, 1992
"... A survey of techniques for solving geometric problems in parallel is given, both for shared memory parallel machines and for networks of processors. Open problems are also discussed, as well as directions for future research. 'This work was supported by the office oi Naval Research under Contracts N ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
A survey of techniques for solving geometric problems in parallel is given, both for shared memory parallel machines and for networks of processors. Open problems are also discussed, as well as directions for future research. 'This work was supported by the office oi Naval Research under Contracts N00014-84-K-0502 and
Implementation of Parallel Graph Algorithms on a Massively Parallel SIMD Computer with Virtual Processing
, 1995
"... We describe our implementation of several PRAM graph algorithms on the massively parallel computer MasPar MP-1 with 16,384 processors. Our implementation incorporated virtual processing and we present extensive test data. In a previous project [13], we reported the implementation of a set of paralle ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
We describe our implementation of several PRAM graph algorithms on the massively parallel computer MasPar MP-1 with 16,384 processors. Our implementation incorporated virtual processing and we present extensive test data. In a previous project [13], we reported the implementation of a set of parallel graph algorithms with the constraint that the maximum input size was restricted to be no more than the physical number of processors on the MasPar. The MasPar language MPL that we used for our code does not support virtual processing. In this paper, we describe a method of simulating virtual processors on the MasPar. We re-coded and fine-tuned our earlier parallel graph algorithms to incorporate the usage of virtual processors. Under the current implementation scheme, there is no limit on the number of virtual processors that one can use in the program as long as there is enough main memory to store all the data required during the computation. We also give two general optimization techniq...
PERFSIM: A Tool for Automatic Performance Analysis of Data-Parallel Fortran Programs
, 1995
"... This paper presents PERFSIM, a tool for automatic performance analysis of CM Fortran programs running on the Connection Machine CM-5. PERFSIM executes the scalar part of the program, including all of its control structure, but estimates the running time of all vector operations, including both commu ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
This paper presents PERFSIM, a tool for automatic performance analysis of CM Fortran programs running on the Connection Machine CM-5. PERFSIM executes the scalar part of the program, including all of its control structure, but estimates the running time of all vector operations, including both communication and computation. Our empirical studies show that the overall estimates are accurate to within a relative error of \Sigma13%, the estimates for vector computations are accurate to within \Sigma21%, and the estimates for vector communication are accurate to within \Sigma13%. These relative errors are comparable to deviations in the running time of programs running on the CM-5. By executing the cheap to execute but hard to analyze sequential control thread, and analyzing the easy to analyze but expensive to execute vector operations, PERFSIM produces accurate running time estimates efficiently. In particular, PERFSIM can execute on a workstation and generate in a few seconds performanc...
Spmd programming in java
- In [14
, 1996
"... Abstract We consider the suitability of the Java concurrent constructs for writing highperformance SPMD code for parallel machines. More specifically, we investigate implementing a financial application in Java on the IBM POWERparallel system SP. Despite the fact that Java was not specifically targe ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract We consider the suitability of the Java concurrent constructs for writing highperformance SPMD code for parallel machines. More specifically, we investigate implementing a financial application in Java on the IBM POWERparallel system SP. Despite the fact that Java was not specifically targeted to such applications and architectures per se, we conclude that efficient implementations are feasible. Finally, we propose a library of Java methods to facilitate SPMD programming. 1 Motivation Although Java was not specifically designed as a high-performance parallel-computing language, it does include concurrent objects (threads), and its wide-spread acceptance makes it an attractive candidate for writing portable computationally-intensive parallel \Lambda Also with Polytechnic University and Cornell Theory Center. 1 applications. In particular, Java has become a popular choice for numerical financial codes, an example of which is arbitrage- detecting when the buying and selling of securities is temporarily profitable. These applications involve sophisticated modeling techniques such as successive over relaxation (SOR) and Monte Carlo methods [19].
Implementing Data-Parallel Software on Dataflow Hardware
, 1993
"... The data-parallel programming model has become the de facto, baseline standard programming model for a variety of parallel computers, both SIMD and MIMD. The dataparallel programming model is easy to reason about, and has proven to be effective for a variety of applications. For the most part, MIMD ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The data-parallel programming model has become the de facto, baseline standard programming model for a variety of parallel computers, both SIMD and MIMD. The dataparallel programming model is easy to reason about, and has proven to be effective for a variety of applications. For the most part, MIMD parallel computer designs have not been driven by the requirements and characteristics of data-parallel programming, despite the fact that those computers will likely be programmed in the data-parallel programming model. Designs that have considered data-parallel programming have included special hardware for explicitly implementing operations such as global barrier synchronization, scan and reduction. To examine the hardware support useful for implementing the data-parallel programming model, we have written a compiler and run-time system for a small data-parallel language targeted for EM-4, a hybrid dataflow/von Neumann computer. EM-4 provides an interesting alternative MIMD parallel archi...
Activity Counter: a New Optimization for SIMD Control Flow (extended version)
, 1993
"... simd computers and collection-oriented languages, like C , are designed to perform the same computation on each data item or on just a subset of the data. Subsets of processors or data items are implemented via an activity bit and a stack of activity bits when subsets of subsets are supported. We ..."
Abstract
- Add to MetaCart
simd computers and collection-oriented languages, like C , are designed to perform the same computation on each data item or on just a subset of the data. Subsets of processors or data items are implemented via an activity bit and a stack of activity bits when subsets of subsets are supported. We present an implementation of activity stacks based on counters. At a given stack depth n, the number of memory bits required is log 2 n, whereas previous implementations require n bits. The local controller is of equivalent complexity in both cases. This algorithm is useful for simd machines and for compilers of collection-oriented languages on mimd computers. 1 Introduction The data-parallel programming model is seen as an acceptable solution to efficiently program many parallel applications on massively parallel machines. In this model, a single program is applied on different instances of data, spread across different processors, to gain use of parallelism on simd or mimd machines. Da...

