Results 1  10
of
35
Scans as Primitive Parallel Operations
 IEEE Transactions on Computers
, 1987
"... In most parallel randomaccess machine (PRAM) models, memory references are assumed to take unit time. In practice, and in theory, certain scan operations, also known as prefix computations, can executed in no more time than these parallel memory references. This paper outline an extensive study of ..."
Abstract

Cited by 157 (12 self)
 Add to MetaCart
In most parallel randomaccess machine (PRAM) models, memory references are assumed to take unit time. In practice, and in theory, certain scan operations, also known as prefix computations, can executed in no more time than these parallel memory references. This paper outline an extensive study of the effect of including in the PRAM models, such scan operations as unittime primitives. The study concludes that the primitives improve the asymptotic running time of many algorithms by an O(lg n) factor, greatly simplify the description of many algorithms, and are significantly easier to implement than memory references. We therefore argue that the algorithm designer should feel free to use these operations as if they were as cheap as a memory reference. This paper describes five algorithms that clearly illustrate how the scan primitives can be used in algorithm design: a radixsort algorithm, a quicksort algorithm, a minimumspanning tree algorithm, a linedrawing algorithm and a mergi...
Planar Separators and Parallel Polygon Triangulation
, 1992
"... We show how to construct an O( p n)separator decomposition of a planar graph G in O(n) time. Such a decomposition defines a binary tree where each node corresponds to a subgraph of G and stores an O( p n)separator of that subgraph. We also show how to construct an O(n ffl )way decomposition tree ..."
Abstract

Cited by 51 (7 self)
 Add to MetaCart
We show how to construct an O( p n)separator decomposition of a planar graph G in O(n) time. Such a decomposition defines a binary tree where each node corresponds to a subgraph of G and stores an O( p n)separator of that subgraph. We also show how to construct an O(n ffl )way decomposition tree in parallel in O(log n) time so that each node corresponds to a subgraph of G and stores an O(n 1=2+ffl )separator of that subgraph. We demonstrate the utility of such a separator decomposition by showing how it can be used in the design of a parallel algorithm for triangulating a simple polygon deterministically in O(log n) time using O(n= log n) processors on a CRCW PRAM. Keywords: Computational geometry, algorithmic graph theory, planar graphs, planar separators, polygon triangulation, parallel algorithms, PRAM model. 1 Introduction Let G = (V; E) be an nnode graph. An f(n)separator is an f(n)sized subset of V whose removal disconnects G into two subgraphs G 1 and G 2 each...
Parallel Algorithmic Techniques for Combinatorial Computation
 Ann. Rev. Comput. Sci
, 1988
"... this paper and supplied many helpful comments. This research was supported in part by NSF grants DCR8511713, CCR8605353, and CCR8814977, and by DARPA contract N0003984C0165. ..."
Abstract

Cited by 29 (3 self)
 Add to MetaCart
this paper and supplied many helpful comments. This research was supported in part by NSF grants DCR8511713, CCR8605353, and CCR8814977, and by DARPA contract N0003984C0165.
Permutation Warping for Data Parallel Volume Rendering
 In Proceedings of the Parallel Rendering Symposium
, 1993
"... Volume rendering algorithms visualize sampled three dimensional data. A variety of applications create sampled data, including medical imaging, simulations, animation, and remote sensing. Researchers have sought to speed up volume rendering because of the high run time and wide application. Our algo ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
Volume rendering algorithms visualize sampled three dimensional data. A variety of applications create sampled data, including medical imaging, simulations, animation, and remote sensing. Researchers have sought to speed up volume rendering because of the high run time and wide application. Our algorithm uses permutation warping to achieve linear speedup on data parallel machines. This new algorithm calculates higher quality images than previous distributed approaches, and also provides more view angle freedom. We present permutation warping results on the SIMD MasPar MP1. The efficiency results from nonconflicting communication. The communication remains efficient with arbitrary view directions, larger data sets, larger parallel machines, and high order filters. We show constant run time versus view angle, tunable filter quality, and efficient memory implementation. 1 Introduction Volume rendering [4] is memory and compute bound. Researchers have used parallelism to speedup transpa...
The Owner Concept for PRAMs
, 1991
"... We analyze the owner concept for PRAMs. In OROWPRAMs each memory cell has one distinct processor that is the only one allowed to write into this memory cell and one distinct processor that is the only one allowed to read from it. By symmetric pointer doubling, a new proof technique for OROWPRAMs, ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
We analyze the owner concept for PRAMs. In OROWPRAMs each memory cell has one distinct processor that is the only one allowed to write into this memory cell and one distinct processor that is the only one allowed to read from it. By symmetric pointer doubling, a new proof technique for OROWPRAMs, it is shown that list ranking can be done in O(log n) time by an OROWPRAM and that LOGSPACE ` OROWTIME(log n). Then we prove that OROWPRAMs are a fairly robust model and recognize the same class of languages when the model is modified in several ways and that all kinds of PRAMs intertwine with the NC hierarchy without timeloss. Finally it is shown that EREWPRAMs can be simulated by OREWPRAMs and ERCWPRAMs by ORCWPRAMs. 3 This research was partially supported by the Deutsche Forschungsgemeinschaft, SFB 342, Teilprojekt A4 "Klassifikation und Parallelisierung durch Reduktionsanalyse" y Email: rossmani@lan.informatik.tumuenchen.dbp.de Introduction Fortune and Wyllie introduced in...
The Strict Time Lower Bound and Optimal Schedules for Parallel Prefix with Resource Constraints
 IEEE Trans. Comput
, 1996
"... Parallel prefix is a fundamental common operation at the core of many important applications, e.g., the Grand Challenge problems, circuit design, digital signal processing, graph optimizations, and computational geometry. Given x 1 ; . . . ; xN , parallel prefix computes x 1 ffi x 2 ffi . . . ffi x ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
Parallel prefix is a fundamental common operation at the core of many important applications, e.g., the Grand Challenge problems, circuit design, digital signal processing, graph optimizations, and computational geometry. Given x 1 ; . . . ; xN , parallel prefix computes x 1 ffi x 2 ffi . . . ffi x k , for 1 k N , with associative operation ffi. For prefix of N elements on p processors in N ? p(p+1)=2, we derive Harmonic Schedules and show that the Harmonic Schedules achieve the strict optimal time (steps), d2(N 0 1)=(p + 1)e. We also derived Pipelined Schedules, optimal schedules with d2(N 0 1)=(p + 1)e + d(p 0 1)=2e 0 1 time, which take a constant overhead of d(p 0 1)=2e time steps more than the strict optimal time but have the smallest loop body. Both the Harmonic Schedules and the Pipelined Schedules are simple, concise, with nice patterns of computation organizations, and easy to program. For prefix of N elements on p processors in N p(p + 1)=2, we use an algorithm to constru...
Solving the AllPair Shortest Path Query Problem on Interval and CircularArc Graphs
 Networks
, 1998
"... In this paper, we study the following allpair shortest path query problem: Given the interval model of an unweighted interval graph of n vertices, build a data structure such that each query on the shortest path (or its length) between any pair of vertices of the graph can be processed efficiently ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
In this paper, we study the following allpair shortest path query problem: Given the interval model of an unweighted interval graph of n vertices, build a data structure such that each query on the shortest path (or its length) between any pair of vertices of the graph can be processed efficiently (both sequentially and in parallel). We show that, after sorting the input intervals by their endpoints, a data structure can be constructed sequentially in O(n) time and O(n) space; using this data structure, each query on the length of the shortest path between any two intervals can be answered in O(1) time, and each query on the actual shortest path can be answered in O(k) time, where k is the number of intervals on that path. Furthermore, this data structure can be constructed optimally in parallel, in O(log n) time using O(n= log n) CREW PRAM processors; each query on the actual shortest path can be answered in O(1) time using k processors. Our techniques can be extended to solving the ...
A Sweep Algorithm for Massively Parallel Simulation of CircuitSwitched Networks
 Journal of Parallel and Distributed Computing
, 1993
"... p.. ..."
Optimal Schedules for Parallel Prefix Computation with Bounded Resources
 Proceeding of Third ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
, 1991
"... Given x 1 ; . . . ; xN , parallel prefix computes x 1 ffi x 2 ffi . . . ffi x k , for 1 k N , with associative operation ffi. We show optimal schedules for parallel prefix computation with a fixed number of resources p 2 for a prefix of size N p(p + 1)=2 . The time of the optimal schedules wit ..."
Abstract

Cited by 11 (6 self)
 Add to MetaCart
Given x 1 ; . . . ; xN , parallel prefix computes x 1 ffi x 2 ffi . . . ffi x k , for 1 k N , with associative operation ffi. We show optimal schedules for parallel prefix computation with a fixed number of resources p 2 for a prefix of size N p(p + 1)=2 . The time of the optimal schedules with p resources is d2N=(p + 1)e for N p(p + 1)=2, which we prove to be the strict lower bound(i.e., which is what can be achieved maximally). We then present a pipelined form of optimal schedules with d2N=(p + 1)e + d(p 0 1)=2e 0 1 time, which takes a constant overhead of d(p 0 1)=2e time more than the optimal schedules. Parallel prefix is an important common operation in many algorithms including the evaluation of polynomials, general Hornor expressions, carry lookahead circuits and ranking and packing problems. A most important application of parallel prefix is loop parallelizing transformation. 1 Introduction Given x 1 ; . . . ; xN , parallel prefix computes x 1 ffi x 2 ffi . . . ffi x...
Using DistributedEvent Parallel Simulation To Study Departures From Many Queues In Series
, 1991
"... Exciting new opportunities for efficient simulation of complex stochastic systems are emerging with the development of parallel computers with many processors. In this paper we describe an application of a new distributedevent approach for speeding up a single long simulation run to study the trans ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
Exciting new opportunities for efficient simulation of complex stochastic systems are emerging with the development of parallel computers with many processors. In this paper we describe an application of a new distributedevent approach for speeding up a single long simulation run to study the transient behavior of a large nonMarkovian network of queues. In particular, we implemented the parallelprefixbased algorithm of Greenberg, Lubachevsky and Mitrani (1991) on the 8192processor CM2 Connection machine to simulate the departure times D(k, n) of the k th customer from the n th queue in a long series of singleserver queues. Each queue has unlimited waiting space and the firstin firstout discipline; the service times of all the customers at all the queues are i.i.d. with a general distribution; the system starts out with k customers in the first queue and all other queues empty. Glynn and Whitt (1991) established limit theorems for this model, but unfortunately very little c...