Results 1  10
of
13
Scans as Primitive Parallel Operations
 IEEE Transactions on Computers
, 1987
"... In most parallel randomaccess machine (PRAM) models, memory references are assumed to take unit time. In practice, and in theory, certain scan operations, also known as prefix computations, can executed in no more time than these parallel memory references. This paper outline an extensive study of ..."
Abstract

Cited by 157 (12 self)
 Add to MetaCart
In most parallel randomaccess machine (PRAM) models, memory references are assumed to take unit time. In practice, and in theory, certain scan operations, also known as prefix computations, can executed in no more time than these parallel memory references. This paper outline an extensive study of the effect of including in the PRAM models, such scan operations as unittime primitives. The study concludes that the primitives improve the asymptotic running time of many algorithms by an O(lg n) factor, greatly simplify the description of many algorithms, and are significantly easier to implement than memory references. We therefore argue that the algorithm designer should feel free to use these operations as if they were as cheap as a memory reference. This paper describes five algorithms that clearly illustrate how the scan primitives can be used in algorithm design: a radixsort algorithm, a quicksort algorithm, a minimumspanning tree algorithm, a linedrawing algorithm and a mergi...
Compiling CollectionOriented Languages onto Massively Parallel Computers
 Journal of Parallel and Distributed Computing
, 1990
"... : This paper introduces techniques for compiling the nested parallelism of collectionoriented languages onto existing parallel hardware. Programmers of parallel machines encounter nested parallelism whenever they write a routine that performs parallel operations, and then want to call that routine ..."
Abstract

Cited by 87 (10 self)
 Add to MetaCart
: This paper introduces techniques for compiling the nested parallelism of collectionoriented languages onto existing parallel hardware. Programmers of parallel machines encounter nested parallelism whenever they write a routine that performs parallel operations, and then want to call that routine itself in parallel. This occurs naturally in many applications. Most parallel systems, however, do not permit the expression of nested parallelism. This forces the programmer to exploit only one level of parallelism or to implement nested parallelism themselves. Both of these alternatives tend to produce code that is harder to maintain and less modular than code described at a higherlevel with nested parallel constructs. Not permitting the expression of nested parallelism is analogous to not permitting nested loops in serial languages. This paper describes issues and techniques for taking highlevel descriptions of parallelism in the form of operations on nested collections and automaticall...
Parallel Implementation of Algorithms for Finding Connected Components in Graphs
, 1997
"... In this paper, we describe our implementation of several parallel graph algorithms for finding connected components. Our implementation, with virtual processing, is on a 16,384processor MasPar MP1 using the language MPL. We present extensive test data on our code. In our previous projects [21, 22, ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
In this paper, we describe our implementation of several parallel graph algorithms for finding connected components. Our implementation, with virtual processing, is on a 16,384processor MasPar MP1 using the language MPL. We present extensive test data on our code. In our previous projects [21, 22, 23], we reported the implementation of an extensible parallel graph algorithms library. We developed general implementation and finetuning techniques without expending too much effort on optimizing each individual routine. We also handled the issue of implementing virtual processing. In this paper, we describe several algorithms and finetuning techniques that we developed for the problem of finding connected components in parallel; many of the finetuning techniques are of general interest, and should be applicable to code for other problems. We present data on the execution time and memory usage of our various implementations.
Implementation of Parallel Graph Algorithms on a Massively Parallel SIMD Computer with Virtual Processing
, 1995
"... We describe our implementation of several PRAM graph algorithms on the massively parallel computer MasPar MP1 with 16,384 processors. Our implementation incorporated virtual processing and we present extensive test data. In a previous project [13], we reported the implementation of a set of paralle ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
We describe our implementation of several PRAM graph algorithms on the massively parallel computer MasPar MP1 with 16,384 processors. Our implementation incorporated virtual processing and we present extensive test data. In a previous project [13], we reported the implementation of a set of parallel graph algorithms with the constraint that the maximum input size was restricted to be no more than the physical number of processors on the MasPar. The MasPar language MPL that we used for our code does not support virtual processing. In this paper, we describe a method of simulating virtual processors on the MasPar. We recoded and finetuned our earlier parallel graph algorithms to incorporate the usage of virtual processors. Under the current implementation scheme, there is no limit on the number of virtual processors that one can use in the program as long as there is enough main memory to store all the data required during the computation. We also give two general optimization techniq...
Parallel techniques for computational geometry
 Proc. IEEE
, 1992
"... A survey of techniques for solving geometric problems in parallel is given, both for shared memory parallel machines and for networks of processors. Open problems are also discussed, as well as directions for future research. 'This work was supported by the office oi Naval Research under Contracts N ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
A survey of techniques for solving geometric problems in parallel is given, both for shared memory parallel machines and for networks of processors. Open problems are also discussed, as well as directions for future research. 'This work was supported by the office oi Naval Research under Contracts N0001484K0502 and
PERFSIM: A Tool for Automatic Performance Analysis of DataParallel Fortran Programs
, 1995
"... This paper presents PERFSIM, a tool for automatic performance analysis of CM Fortran programs running on the Connection Machine CM5. PERFSIM executes the scalar part of the program, including all of its control structure, but estimates the running time of all vector operations, including both commu ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
This paper presents PERFSIM, a tool for automatic performance analysis of CM Fortran programs running on the Connection Machine CM5. PERFSIM executes the scalar part of the program, including all of its control structure, but estimates the running time of all vector operations, including both communication and computation. Our empirical studies show that the overall estimates are accurate to within a relative error of \Sigma13%, the estimates for vector computations are accurate to within \Sigma21%, and the estimates for vector communication are accurate to within \Sigma13%. These relative errors are comparable to deviations in the running time of programs running on the CM5. By executing the cheap to execute but hard to analyze sequential control thread, and analyzing the easy to analyze but expensive to execute vector operations, PERFSIM produces accurate running time estimates efficiently. In particular, PERFSIM can execute on a workstation and generate in a few seconds performanc...
Dataparallel primitives for spatial operations
 In Proceedings of the 1995 International Conference on Parallel Processing. III:184–191
, 1995
"... Dataparallel primitives for performing operations on the PM1 quadtree and the bucket PMR quadtree are presented using the scan model. Algorithms are described for building these two data structures that make use of these primitives. The dataparallel algorithms are assumed tobe main memory resident ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Dataparallel primitives for performing operations on the PM1 quadtree and the bucket PMR quadtree are presented using the scan model. Algorithms are described for building these two data structures that make use of these primitives. The dataparallel algorithms are assumed tobe main memory resident. They were implemented on a Thinking Machines CM5 with 32 processors containing 1GB of main memory. 1
Spmd programming in java
 In [14
, 1996
"... Abstract We consider the suitability of the Java concurrent constructs for writing highperformance SPMD code for parallel machines. More specifically, we investigate implementing a financial application in Java on the IBM POWERparallel system SP. Despite the fact that Java was not specifically targe ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Abstract We consider the suitability of the Java concurrent constructs for writing highperformance SPMD code for parallel machines. More specifically, we investigate implementing a financial application in Java on the IBM POWERparallel system SP. Despite the fact that Java was not specifically targeted to such applications and architectures per se, we conclude that efficient implementations are feasible. Finally, we propose a library of Java methods to facilitate SPMD programming. 1 Motivation Although Java was not specifically designed as a highperformance parallelcomputing language, it does include concurrent objects (threads), and its widespread acceptance makes it an attractive candidate for writing portable computationallyintensive parallel \Lambda Also with Polytechnic University and Cornell Theory Center. 1 applications. In particular, Java has become a popular choice for numerical financial codes, an example of which is arbitrage detecting when the buying and selling of securities is temporarily profitable. These applications involve sophisticated modeling techniques such as successive over relaxation (SOR) and Monte Carlo methods [19].
Implementing DataParallel Software on Dataflow Hardware
, 1993
"... The dataparallel programming model has become the de facto, baseline standard programming model for a variety of parallel computers, both SIMD and MIMD. The dataparallel programming model is easy to reason about, and has proven to be effective for a variety of applications. For the most part, MIMD ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
The dataparallel programming model has become the de facto, baseline standard programming model for a variety of parallel computers, both SIMD and MIMD. The dataparallel programming model is easy to reason about, and has proven to be effective for a variety of applications. For the most part, MIMD parallel computer designs have not been driven by the requirements and characteristics of dataparallel programming, despite the fact that those computers will likely be programmed in the dataparallel programming model. Designs that have considered dataparallel programming have included special hardware for explicitly implementing operations such as global barrier synchronization, scan and reduction. To examine the hardware support useful for implementing the dataparallel programming model, we have written a compiler and runtime system for a small dataparallel language targeted for EM4, a hybrid dataflow/von Neumann computer. EM4 provides an interesting alternative MIMD parallel archi...