Results 1  10
of
78
Programming Parallel Algorithms
, 1996
"... In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a th ..."
Abstract

Cited by 191 (9 self)
 Add to MetaCart
In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a theoretical framework, many are quite efficient in practice or have key ideas that have been used in efficient implementations. This research on parallel algorithms has not only improved our general understanding ofparallelism but in several cases has led to improvements in sequential algorithms. Unf:ortunately there has been less success in developing good languages f:or prograftlftling parallel algorithftls, particularly languages that are well suited for teaching and prototyping algorithms. There has been a large gap between languages
A new approach to the minimum cut problem
 Journal of the ACM
, 1996
"... Abstract. This paper presents a new approach to finding minimum cuts in undirected graphs. The fundamental principle is simple: the edges in a graph’s minimum cut form an extremely small fraction of the graph’s edges. Using this idea, we give a randomized, strongly polynomial algorithm that finds th ..."
Abstract

Cited by 94 (8 self)
 Add to MetaCart
Abstract. This paper presents a new approach to finding minimum cuts in undirected graphs. The fundamental principle is simple: the edges in a graph’s minimum cut form an extremely small fraction of the graph’s edges. Using this idea, we give a randomized, strongly polynomial algorithm that finds the minimum cut in an arbitrarily weighted undirected graph with high probability. The algorithm runs in O(n 2 log 3 n) time, a significant improvement over the previous Õ(mn) time bounds based on maximum flows. It is simple and intuitive and uses no complex data structures. Our algorithm can be parallelized to run in �� � with n 2 processors; this gives the first proof that the minimum cut problem can be solved in ���. The algorithm does more than find a single minimum cut; it finds all of them. With minor modifications, our algorithm solves two other problems of interest. Our algorithm finds all cuts with value within a multiplicative factor of � of the minimum cut’s in expected Õ(n 2 � ) time, or in �� � with n 2 � processors. The problem of finding a minimum multiway cut of a graph into r pieces is solved in expected Õ(n 2(r�1) ) time, or in �� � with n 2(r�1) processors. The “trace ” of the algorithm’s execution on these two problems forms a new compact data structure for representing all small cuts and all multiway cuts in a graph. This data structure can be efficiently transformed into the
A provable time and space efficient implementation of nesl
 In International Conference on Functional Programming
, 1996
"... In this paper we prove time and space bounds for the implementation of the programming language NESL on various parallel machine models. NESL is a sugared typed Jcalculus with a set of array primitives and an explicit parallel map over arrays. Our results extend previous work on provable implementa ..."
Abstract

Cited by 70 (7 self)
 Add to MetaCart
In this paper we prove time and space bounds for the implementation of the programming language NESL on various parallel machine models. NESL is a sugared typed Jcalculus with a set of array primitives and an explicit parallel map over arrays. Our results extend previous work on provable implementation bounds for functional languages by considering space and by including arrays. For modeling the cost of NESL we augment a standard callbyvalue operational semantics to return two cost measures: a DAG representing the sequential dependence in the computation, and a measure of the space taken by a sequential implementation. We show that a NESL program with w work (nodes in the DAG), d depth (levels in the DAG), and s sequential space can be implemented on a p processor butterfly network, hypercube, or CRCW PRAM usin O(w/p + d log p) time and 0(s + dp logp) reachable space. For programs with sufficient parallelism these bounds are optimal in that they give linew speedup and use space within a constant factor of the sequential space. 1
CommunicationEfficient Parallel Sorting
, 1996
"... We study the problem of sorting n numbers on a pprocessor bulksynchronous parallel (BSP) computer, which is a parallel multicomputer that allows for general processortoprocessor communication rounds provided each processor sends and receives at most h items in any round. We provide parallel sort ..."
Abstract

Cited by 64 (2 self)
 Add to MetaCart
We study the problem of sorting n numbers on a pprocessor bulksynchronous parallel (BSP) computer, which is a parallel multicomputer that allows for general processortoprocessor communication rounds provided each processor sends and receives at most h items in any round. We provide parallel sorting methods that use internal computation time that is O( n log n p ) and a number of communication rounds that is O( log n log(h+1) ) for h = \Theta(n=p). The internal computation bound is optimal for any comparisonbased sorting algorithm. Moreover, the number of communication rounds is bounded by a constant for the (practical) situations when p n 1\Gamma1=c for a constant c 1. In fact, we show that our bound on the number of communication rounds is asymptotically optimal for the full range of values for p, for we show that just computing the "or" of n bits distributed evenly to the first O(n=h) of an arbitrary number of processors in a BSP computer requires\Omega\Gammaqui n= log(h...
Powerlist: a structure for parallel recursion
 ACM Transactions on Programming Languages and Systems
, 1994
"... Many data parallel algorithms – Fast Fourier Transform, Batcher’s sorting schemes and prefixsum – exhibit recursive structure. We propose a data structure, powerlist, that permits succinct descriptions of such algorithms, highlighting the roles of both parallelism and recursion. Simple algebraic pro ..."
Abstract

Cited by 59 (2 self)
 Add to MetaCart
Many data parallel algorithms – Fast Fourier Transform, Batcher’s sorting schemes and prefixsum – exhibit recursive structure. We propose a data structure, powerlist, that permits succinct descriptions of such algorithms, highlighting the roles of both parallelism and recursion. Simple algebraic properties of this data structure can be exploited to derive properties of these algorithms and establish equivalence of different algorithms that solve the same problem.
A parallel algorithmic version of the local lemma
, 1991
"... The Lovász Local Lemma is a tool that enables one to show that certain events hold with positive, though very small probability. It often yields existence proofs of results without supplying any efficient way of solving the corresponding algorithmic problems. J. Beck has recently found a method for ..."
Abstract

Cited by 58 (10 self)
 Add to MetaCart
The Lovász Local Lemma is a tool that enables one to show that certain events hold with positive, though very small probability. It often yields existence proofs of results without supplying any efficient way of solving the corresponding algorithmic problems. J. Beck has recently found a method for converting some of these existence proofs into efficient algorithmic procedures, at the cost of loosing a little in the estimates. His method does not seem to be parallelizable. Here we modify his technique and achieve an algorithmic version that can be parallelized, thus obtaining deterministic NC 1 algorithms for several interesting algorithmic problems.
Static Frequency Assignment in Cellular Networks
 Algorithmica
, 1997
"... A cellular network is generally modeled as a subgraph of the triangular lattice. In the static frequency assignment problem, each vertex of the graph is a base station in the network, and has associated with it an integer weight that represents the number of calls that must be served at the vertex b ..."
Abstract

Cited by 38 (4 self)
 Add to MetaCart
A cellular network is generally modeled as a subgraph of the triangular lattice. In the static frequency assignment problem, each vertex of the graph is a base station in the network, and has associated with it an integer weight that represents the number of calls that must be served at the vertex by assigning distinct frequencies per call. The edges of the graph model interference constraints for frequencies assigned to neighboring stations. The static frequency assignment problem can be abstracted as a graph multicoloring problem. We describe an efficient algorithm to optimally multicolor any weighted even or odd length cycle representing a cellular network. This result is further extended to any outerplanar graph. For the problem of multicoloring an arbitrary connected subgraph of the triangular lattice, we demonstrate an approximation algorithm which guarantees that no more than 4=3 times the minimum number of required colors are used. Further, we show that this algorithm can be im...
Parallel Open Ear Decomposition with Applications to Graph Biconnectivity and Triconnectivity
 Synthesis of Parallel Algorithms
, 1992
"... This report deals with a parallel algorithmic technique that has proved to be very useful in the design of efficient parallel algorithms for several problems on undirected graphs. We describe this method for searching undirected graphs, called "open ear decomposition", and we relate this decompos ..."
Abstract

Cited by 25 (9 self)
 Add to MetaCart
This report deals with a parallel algorithmic technique that has proved to be very useful in the design of efficient parallel algorithms for several problems on undirected graphs. We describe this method for searching undirected graphs, called "open ear decomposition", and we relate this decomposition to graph biconnectivity. We present an efficient parallel algorithm for finding this decomposition and we relate it to a sequential algorithm based on depthfirst search. We then apply open ear decomposition to obtain an efficient parallel algorithm for testing graph triconnectivity and for finding the triconnnected components of a graph.
A new graph triconnectivity algorithm and its parallelization
 Combinatorica
, 1987
"... We present a new algorithm for finding the triconnected components of an undirected graph. The algorithm is based on a method of searching graphs called ‘open ear decomposition’. A parallel implementation of the algorithm on a CRCW PRAM runs in O(log 2 n) parallel time using O(n + m) processors, whe ..."
Abstract

Cited by 25 (3 self)
 Add to MetaCart
We present a new algorithm for finding the triconnected components of an undirected graph. The algorithm is based on a method of searching graphs called ‘open ear decomposition’. A parallel implementation of the algorithm on a CRCW PRAM runs in O(log 2 n) parallel time using O(n + m) processors, where n is the number of vertices and m is the number of edges in the graph.
Possible and Impossible SelfStabilizing Digital Clock Synchronization in General Graphs
 Journal of RealTime Systems
, 1997
"... We study digital clock synchronization for multiprocessor systems, where processors are triggered by a common clock pulse and communicate with others via shared memory. A selfstabilizing digital clock synchronization protocol for systems with a general communication graph is presented. The protocol ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
We study digital clock synchronization for multiprocessor systems, where processors are triggered by a common clock pulse and communicate with others via shared memory. A selfstabilizing digital clock synchronization protocol for systems with a general communication graph is presented. The protocol can commence in an arbitrary nonconsistent system state and converges to a legitimate state in which the clocks are synchronized and incremented by one in every subsequent pulse. To enhance the faulttolerance of our protocol, we allow that during and following convergence processors may stop operating. Crash failures may partition the communication graph into several connected components. Our protocol synchronizes the clocks of the processors in every such connected component. For the case in which faulty processors can exhibit Byzantine behavior, we prove that there is no digital clock synchronization protocol that tolerates even one single faulty processor. Keywords: Clock synchronizat...