Results 11  20
of
329
Powerlist: a structure for parallel recursion
 ACM Transactions on Programming Languages and Systems
, 1994
"... Many data parallel algorithms – Fast Fourier Transform, Batcher’s sorting schemes and prefixsum – exhibit recursive structure. We propose a data structure, powerlist, that permits succinct descriptions of such algorithms, highlighting the roles of both parallelism and recursion. Simple algebraic pro ..."
Abstract

Cited by 65 (2 self)
 Add to MetaCart
(Show Context)
Many data parallel algorithms – Fast Fourier Transform, Batcher’s sorting schemes and prefixsum – exhibit recursive structure. We propose a data structure, powerlist, that permits succinct descriptions of such algorithms, highlighting the roles of both parallelism and recursion. Simple algebraic properties of this data structure can be exploited to derive properties of these algorithms and establish equivalence of different algorithms that solve the same problem.
A new notation for arrows
 In International Conference on Functional Programming (ICFP ’01
, 2001
"... The categorical notion of monad, used by Moggi to structure denotational descriptions, has proved to be a powerful tool for structuring combinator libraries. Moreover, the monadic programming style provides a convenient syntax for many kinds of computation, so that each library defines a new sublang ..."
Abstract

Cited by 64 (1 self)
 Add to MetaCart
(Show Context)
The categorical notion of monad, used by Moggi to structure denotational descriptions, has proved to be a powerful tool for structuring combinator libraries. Moreover, the monadic programming style provides a convenient syntax for many kinds of computation, so that each library defines a new sublanguage. Recently, several workers have proposed a generalization of monads, called variously “arrows ” or Freydcategories. The extra generality promises to increase the power, expressiveness and efficiency of the embedded approach, but does not mesh as well with the native abstraction and application. Definitions are typically given in a pointfree style, which is useful for proving general properties, but can be awkward for programming specific instances. In this paper we define a simple extension to the functional language Haskell that makes these new notions of computation more convenient to use. Our language is similar to the monadic style, and has similar reasoning properties. Moreover, it is extensible, in the sense that new combining forms can be defined as expressions in the host language. 1.
Removing Randomness in Parallel Computation Without a Processor Penalty
 Journal of Computer and System Sciences
, 1988
"... We develop some general techniques for converting randomized parallel algorithms into deterministic parallel algorithms without a blowup in the number of processors. One of the requirements for the application of these techniques is that the analysis of the randomized algorithm uses only pairwise in ..."
Abstract

Cited by 61 (1 self)
 Add to MetaCart
We develop some general techniques for converting randomized parallel algorithms into deterministic parallel algorithms without a blowup in the number of processors. One of the requirements for the application of these techniques is that the analysis of the randomized algorithm uses only pairwise independence. Our main new result is a parallel algorithm for coloring the vertices of an undirected graph using at most \Delta + 1 distinct colors in such a way that no two adjacent vertices receive the same color, where \Delta is the maximum degree of any vertex in the graph. The running time of the algorithm is O(log 3 n log log n) using a linear number of processors on a concurrent read, exclusive write (CREW) parallel random access machine (PRAM). 1 Our techniques also apply to several other problems, including the maximal independent set problem and the maximal matching problem. The application of the general technique to these last two problems is mostly of academic interest because...
CollectionOriented Languages
 PROCEEDINGS OF THE IEEE
, 1991
"... Several programming languages arising from widely diverse practical and theoretical considerations share a common highlevel feature: their basic data type is an aggregate of other more primitive data types and their primitive functions operate on these aggregates. Examples of such languages (and th ..."
Abstract

Cited by 60 (5 self)
 Add to MetaCart
(Show Context)
Several programming languages arising from widely diverse practical and theoretical considerations share a common highlevel feature: their basic data type is an aggregate of other more primitive data types and their primitive functions operate on these aggregates. Examples of such languages (and the collections they support) are FORTRAN 90 (arrays), APL (arrays), Connection Machine LISP (xectors), PARALATION LISP (paralations), and SETL (sets). Acting on large collections of data with a single operation is the hallmark of dataparallel programming and massively parallel computers. These languages  which we call collectionoriented  are thus ideal for use with massively parallel machines, even though many of them were developed before parallelism and associated considerations became important. This paper examines collections and the operations that can be performed on them in a languageindependent manner. It also critically reviews and compares a variety of collectionoriented languages...
A Family of Adders
 In Proceedings of 14th IEEE Symposium on Computer Arithmetic
, 1999
"... Binary carrypropagating addition can be efficiently expressed as a prefix computation. Several examples of adders based on such a formulation have been published, and efficient implementations are numerous. Chief among the known constructions are those of Kogge & Stone and Ladner & Fischer. ..."
Abstract

Cited by 58 (0 self)
 Add to MetaCart
(Show Context)
Binary carrypropagating addition can be efficiently expressed as a prefix computation. Several examples of adders based on such a formulation have been published, and efficient implementations are numerous. Chief among the known constructions are those of Kogge & Stone and Ladner & Fischer. In this work we show that these are end cases of a large family of addition structures, all of which share the attractive property of minimum logical depth. The intermediate structures allow tradeoffs between the amount of internal wiring and the fanout of intermediate nodes, and can thus usually achieve a more attractive combination of speed and area/power cost than either of the known endcases. Rules for the construction of such adders are given, as are examples of realistic 32b designs implemented in an industrial 0u25 CMOS process. 1. Introduction There are many ways of formulating the process of binary addition. Each different way provides different insight and thus suggests different impl...
Planar Separators and Parallel Polygon Triangulation
"... We show how to construct an O ( p n)separator decomposition of a planar graph G in O(n) time. Such a decomposition defines a binary tree where each node corresponds to a subgraph of G and stores an O ( p n)separator of that subgraph. We also show how to construct an O(n)way decomposition tree in ..."
Abstract

Cited by 58 (8 self)
 Add to MetaCart
We show how to construct an O ( p n)separator decomposition of a planar graph G in O(n) time. Such a decomposition defines a binary tree where each node corresponds to a subgraph of G and stores an O ( p n)separator of that subgraph. We also show how to construct an O(n)way decomposition tree in parallel in O(log n) time so that each node corresponds to a subgraph of G and stores an O(n 1=2+)separator of that subgraph. We demonstrate the utility of such a separator decomposition by showing how it can be used in the design of a parallel algorithm for triangulating a simple polygon deterministically in O(log n) time using O(n = log n) processors on a CRCW PRAM.
Randomized Routing on FatTrees
 Advances in Computing Research
, 1996
"... Fattrees are a class of routing networks for hardwareefficient parallel computation. This paper presents a randomized algorithm for routing messages on a fattree. The quality of the algorithm is measured in terms of the load factor of a set of messages to be routed, which is a lower bound on the ..."
Abstract

Cited by 55 (11 self)
 Add to MetaCart
(Show Context)
Fattrees are a class of routing networks for hardwareefficient parallel computation. This paper presents a randomized algorithm for routing messages on a fattree. The quality of the algorithm is measured in terms of the load factor of a set of messages to be routed, which is a lower bound on the time required to deliver the messages. We show that if a set of messages has load factor on a fattree with n processors, the number of delivery cycles (routing attempts) that the algorithm requires is O(+lg n lg lg n) with probability 1 \Gamma O(1=n). The best previous bound was O( lg n) for the offline problem in which the set of messages is known in advance. In the context of a VLSI model that equates hardware cost with physical volume, the routing algorithm can be used to demonstrate that fattrees are universal routing networks. Specifically, we prove that any routing network can be efficiently simulated by a fattree of comparable hardware cost. 1 Introduction Fattrees constitute...
Parallel Ear Decomposition Search (EDS) And STNumbering In Graphs
, 1986
"... [LEC67] linear time serial algorithm for testing planarity of graphs uses the linear time serial algorithm of [ET76] for stnumbering. This stnumbering algorithm is based on depthfirst search (DFS). A known conjecture states that DFS, which is a key technique in designing serial algorithms, is n ..."
Abstract

Cited by 52 (2 self)
 Add to MetaCart
[LEC67] linear time serial algorithm for testing planarity of graphs uses the linear time serial algorithm of [ET76] for stnumbering. This stnumbering algorithm is based on depthfirst search (DFS). A known conjecture states that DFS, which is a key technique in designing serial algorithms, is not amenable to polylog time parallelism using "around linearly" (or even polynomially) many processors. The first contribution of this paper is a general method for searching efficiently in parallel undirected graphs, called eardecomposition search (EDS). The second contribution demonstrates the applicability of this search method. We present an efficient parallel algorithm for stnumbering in a biconnected graph. The algorithm runs in logarithmic time using a linear number of processors on a concurrentread concurrentwrite (CRCW) PRAM. An efficient parallel algorithm for the problem did not exist before. The problem was not even known to be in NC. 1. Introduction We define the problems ...
Radix Sort For Vector Multiprocessors
 In Proceedings Supercomputing '91
, 1991
"... We have designed a radix sort algorithm for vector multiprocessors and have implemented the algorithm on the CRAY YMP. On one processor of the YMP, our sort is over 5 times faster on large sorting problems than the optimized library sort provided by CRAY Research. On eight processors we achieve a ..."
Abstract

Cited by 50 (6 self)
 Add to MetaCart
(Show Context)
We have designed a radix sort algorithm for vector multiprocessors and have implemented the algorithm on the CRAY YMP. On one processor of the YMP, our sort is over 5 times faster on large sorting problems than the optimized library sort provided by CRAY Research. On eight processors we achieve an additional speedup of almost 5, yielding a routine over 25 times faster than the library sort. Using this multiprocessor version, we can sort at a rate of 15 million 64bit keys per second. Our sorting algorithm is adapted from a dataparallel algorithm previously designed for a highly parallel Single Instruction Multiple Data (SIMD) computer, the Connection Machine CM2. To develop our version we introduce three general techniques for mapping dataparallel algorithms ontovector multiprocessors. These techniques allow us to fully vectorize and parallelize the algorithm. The paper also derives equations that model the performance of our algorithm on the YMP. These equations are then used t...
Planar Orientations with Low OutDegree and Compaction of Adjacency Matrices
 Theoretical Computer Science
, 1991
"... We consider the problem of orienting the edges of a planar graph in such a way that the outdegree of each vertex is minimized. If, for each vertex v, the outdegree is at most d, then we say that such an orientation is dbounded. We prove the following results: ffl Each planar graph has a 5bounde ..."
Abstract

Cited by 48 (4 self)
 Add to MetaCart
We consider the problem of orienting the edges of a planar graph in such a way that the outdegree of each vertex is minimized. If, for each vertex v, the outdegree is at most d, then we say that such an orientation is dbounded. We prove the following results: ffl Each planar graph has a 5bounded acyclic orientation, which can be constructed in linear time. ffl Each planar graph has a 3bounded orientation, which can be constructed in linear time. ffl A 6bounded acyclic orientation, and a 3bounded orientation, of each planar graph can each be constructed in parallel time O(log n log n) on an EREW PRAM, using O(n= log n log n) processors. As an application of these results, we present a data structure such that each entry in the adjacency matrix of a planar graph can be looked up in constant time. The data structure uses linear storage, and can be constructed in linear time. Department of Mathematics and Computer Science, University of California, Riverside, CA 92521. On...