Results 1  10
of
78
Programming Parallel Algorithms
, 1996
"... In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a th ..."
Abstract

Cited by 193 (9 self)
 Add to MetaCart
In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a theoretical framework, many are quite efficient in practice or have key ideas that have been used in efficient implementations. This research on parallel algorithms has not only improved our general understanding ofparallelism but in several cases has led to improvements in sequential algorithms. Unf:ortunately there has been less success in developing good languages f:or prograftlftling parallel algorithftls, particularly languages that are well suited for teaching and prototyping algorithms. There has been a large gap between languages
A new approach to the minimum cut problem
 Journal of the ACM
, 1996
"... Abstract. This paper presents a new approach to finding minimum cuts in undirected graphs. The fundamental principle is simple: the edges in a graph’s minimum cut form an extremely small fraction of the graph’s edges. Using this idea, we give a randomized, strongly polynomial algorithm that finds th ..."
Abstract

Cited by 95 (8 self)
 Add to MetaCart
Abstract. This paper presents a new approach to finding minimum cuts in undirected graphs. The fundamental principle is simple: the edges in a graph’s minimum cut form an extremely small fraction of the graph’s edges. Using this idea, we give a randomized, strongly polynomial algorithm that finds the minimum cut in an arbitrarily weighted undirected graph with high probability. The algorithm runs in O(n 2 log 3 n) time, a significant improvement over the previous Õ(mn) time bounds based on maximum flows. It is simple and intuitive and uses no complex data structures. Our algorithm can be parallelized to run in �� � with n 2 processors; this gives the first proof that the minimum cut problem can be solved in ���. The algorithm does more than find a single minimum cut; it finds all of them. With minor modifications, our algorithm solves two other problems of interest. Our algorithm finds all cuts with value within a multiplicative factor of � of the minimum cut’s in expected Õ(n 2 � ) time, or in �� � with n 2 � processors. The problem of finding a minimum multiway cut of a graph into r pieces is solved in expected Õ(n 2(r�1) ) time, or in �� � with n 2(r�1) processors. The “trace ” of the algorithm’s execution on these two problems forms a new compact data structure for representing all small cuts and all multiway cuts in a graph. This data structure can be efficiently transformed into the
A provable time and space efficient implementation of nesl
 In International Conference on Functional Programming
, 1996
"... In this paper we prove time and space bounds for the implementation of the programming language NESL on various parallel machine models. NESL is a sugared typed Jcalculus with a set of array primitives and an explicit parallel map over arrays. Our results extend previous work on provable implementa ..."
Abstract

Cited by 70 (7 self)
 Add to MetaCart
In this paper we prove time and space bounds for the implementation of the programming language NESL on various parallel machine models. NESL is a sugared typed Jcalculus with a set of array primitives and an explicit parallel map over arrays. Our results extend previous work on provable implementation bounds for functional languages by considering space and by including arrays. For modeling the cost of NESL we augment a standard callbyvalue operational semantics to return two cost measures: a DAG representing the sequential dependence in the computation, and a measure of the space taken by a sequential implementation. We show that a NESL program with w work (nodes in the DAG), d depth (levels in the DAG), and s sequential space can be implemented on a p processor butterfly network, hypercube, or CRCW PRAM usin O(w/p + d log p) time and 0(s + dp logp) reachable space. For programs with sufficient parallelism these bounds are optimal in that they give linew speedup and use space within a constant factor of the sequential space. 1
CommunicationEfficient Parallel Sorting
, 1996
"... We study the problem of sorting n numbers on a pprocessor bulksynchronous parallel (BSP) computer, which is a parallel multicomputer that allows for general processortoprocessor communication rounds provided each processor sends and receives at most h items in any round. We provide parallel sort ..."
Abstract

Cited by 64 (2 self)
 Add to MetaCart
We study the problem of sorting n numbers on a pprocessor bulksynchronous parallel (BSP) computer, which is a parallel multicomputer that allows for general processortoprocessor communication rounds provided each processor sends and receives at most h items in any round. We provide parallel sorting methods that use internal computation time that is O( n log n p ) and a number of communication rounds that is O( log n log(h+1) ) for h = \Theta(n=p). The internal computation bound is optimal for any comparisonbased sorting algorithm. Moreover, the number of communication rounds is bounded by a constant for the (practical) situations when p n 1\Gamma1=c for a constant c 1. In fact, we show that our bound on the number of communication rounds is asymptotically optimal for the full range of values for p, for we show that just computing the "or" of n bits distributed evenly to the first O(n=h) of an arbitrary number of processors in a BSP computer requires\Omega\Gammaqui n= log(h...
A parallel algorithmic version of the local lemma
, 1991
"... The Lovász Local Lemma is a tool that enables one to show that certain events hold with positive, though very small probability. It often yields existence proofs of results without supplying any efficient way of solving the corresponding algorithmic problems. J. Beck has recently found a method for ..."
Abstract

Cited by 60 (10 self)
 Add to MetaCart
The Lovász Local Lemma is a tool that enables one to show that certain events hold with positive, though very small probability. It often yields existence proofs of results without supplying any efficient way of solving the corresponding algorithmic problems. J. Beck has recently found a method for converting some of these existence proofs into efficient algorithmic procedures, at the cost of loosing a little in the estimates. His method does not seem to be parallelizable. Here we modify his technique and achieve an algorithmic version that can be parallelized, thus obtaining deterministic NC 1 algorithms for several interesting algorithmic problems.
Powerlist: a structure for parallel recursion
 ACM Transactions on Programming Languages and Systems
, 1994
"... Many data parallel algorithms – Fast Fourier Transform, Batcher’s sorting schemes and prefixsum – exhibit recursive structure. We propose a data structure, powerlist, that permits succinct descriptions of such algorithms, highlighting the roles of both parallelism and recursion. Simple algebraic pro ..."
Abstract

Cited by 59 (2 self)
 Add to MetaCart
Many data parallel algorithms – Fast Fourier Transform, Batcher’s sorting schemes and prefixsum – exhibit recursive structure. We propose a data structure, powerlist, that permits succinct descriptions of such algorithms, highlighting the roles of both parallelism and recursion. Simple algebraic properties of this data structure can be exploited to derive properties of these algorithms and establish equivalence of different algorithms that solve the same problem.
Static Frequency Assignment in Cellular Networks
 Algorithmica
, 1997
"... A cellular network is generally modeled as a subgraph of the triangular lattice. In the static frequency assignment problem, each vertex of the graph is a base station in the network, and has associated with it an integer weight that represents the number of calls that must be served at the vertex b ..."
Abstract

Cited by 38 (4 self)
 Add to MetaCart
A cellular network is generally modeled as a subgraph of the triangular lattice. In the static frequency assignment problem, each vertex of the graph is a base station in the network, and has associated with it an integer weight that represents the number of calls that must be served at the vertex by assigning distinct frequencies per call. The edges of the graph model interference constraints for frequencies assigned to neighboring stations. The static frequency assignment problem can be abstracted as a graph multicoloring problem. We describe an efficient algorithm to optimally multicolor any weighted even or odd length cycle representing a cellular network. This result is further extended to any outerplanar graph. For the problem of multicoloring an arbitrary connected subgraph of the triangular lattice, we demonstrate an approximation algorithm which guarantees that no more than 4=3 times the minimum number of required colors are used. Further, we show that this algorithm can be im...
A new graph triconnectivity algorithm and its parallelization
 Combinatorica
, 1987
"... We present a new algorithm for finding the triconnected components of an undirected graph. The algorithm is based on a method of searching graphs called ‘open ear decomposition’. A parallel implementation of the algorithm on a CRCW PRAM runs in O(log 2 n) parallel time using O(n + m) processors, whe ..."
Abstract

Cited by 25 (3 self)
 Add to MetaCart
We present a new algorithm for finding the triconnected components of an undirected graph. The algorithm is based on a method of searching graphs called ‘open ear decomposition’. A parallel implementation of the algorithm on a CRCW PRAM runs in O(log 2 n) parallel time using O(n + m) processors, where n is the number of vertices and m is the number of edges in the graph.
Parallel Open Ear Decomposition with Applications to Graph Biconnectivity and Triconnectivity
 Synthesis of Parallel Algorithms
, 1992
"... This report deals with a parallel algorithmic technique that has proved to be very useful in the design of efficient parallel algorithms for several problems on undirected graphs. We describe this method for searching undirected graphs, called "open ear decomposition", and we relate this decompos ..."
Abstract

Cited by 25 (9 self)
 Add to MetaCart
This report deals with a parallel algorithmic technique that has proved to be very useful in the design of efficient parallel algorithms for several problems on undirected graphs. We describe this method for searching undirected graphs, called "open ear decomposition", and we relate this decomposition to graph biconnectivity. We present an efficient parallel algorithm for finding this decomposition and we relate it to a sequential algorithm based on depthfirst search. We then apply open ear decomposition to obtain an efficient parallel algorithm for testing graph triconnectivity and for finding the triconnnected components of a graph.
Cacheefficient Dynamic Programming Algorithms for Multicores
, 2008
"... We present cacheefficient chip multiprocessor (CMP) algorithms with good speedup for some widely used dynamic programming algorithms. We consider three types of caching systems for CMPs: DCMP with a private cache for each core, SCMP with a single cache shared by all cores, and Multicore, which h ..."
Abstract

Cited by 23 (5 self)
 Add to MetaCart
We present cacheefficient chip multiprocessor (CMP) algorithms with good speedup for some widely used dynamic programming algorithms. We consider three types of caching systems for CMPs: DCMP with a private cache for each core, SCMP with a single cache shared by all cores, and Multicore, which has private L1 caches and a shared L2 cache. We derive results for three classes of problems: local dependency dynamic programming (LDDP), Gaussian Elimination Paradigm (GEP), and parenthesis problem. For each class of problems, we develop a generic CMP algorithm with an associated tiling sequence. We then tailor this tiling sequence to each caching model and provide a parallel schedule that results in a cacheefficient parallel execution up to the critical path length of the underlying dynamic programming algorithm. We present experimental results on an 8core Opteron for two sequence alignment problems that are important examples of LDDP. Our experimental results show good speedups for simple versions of our algorithms.