Results 1  10
of
27
Optimal and Sublogarithmic Time Randomized Parallel Sorting Algorithms
 SIAM JOURNAL ON COMPUTING
, 1989
"... We assume a parallel RAM model which allows both concurrent reads and concurrent writes of a global memory. Our main result is an optimal randomized parallel algorithm for INTEGER SORT (i.e., for sorting n integers in the range [1; n]). Our algorithm costs only logarithmic time and is the first know ..."
Abstract

Cited by 61 (12 self)
 Add to MetaCart
We assume a parallel RAM model which allows both concurrent reads and concurrent writes of a global memory. Our main result is an optimal randomized parallel algorithm for INTEGER SORT (i.e., for sorting n integers in the range [1; n]). Our algorithm costs only logarithmic time and is the first known that is optimal: the product of its time and processor bounds is upper bounded by a linear function of the input size. We also give a deterministic sublogarithmic time algorithm for prefix sum. In addition we present a sublogarithmic time algorithm for obtaining a random permutation of n elements in parallel. And finally, we present sublogarithmic time algorithms for GENERAL SORT and INTEGER SORT. Our sublogarithmic GENERAL SORT algorithm is also optimal.
Parallel Ear Decomposition Search (EDS) And STNumbering In Graphs
, 1986
"... [LEC67] linear time serial algorithm for testing planarity of graphs uses the linear time serial algorithm of [ET76] for stnumbering. This stnumbering algorithm is based on depthfirst search (DFS). A known conjecture states that DFS, which is a key technique in designing serial algorithms, is n ..."
Abstract

Cited by 44 (2 self)
 Add to MetaCart
[LEC67] linear time serial algorithm for testing planarity of graphs uses the linear time serial algorithm of [ET76] for stnumbering. This stnumbering algorithm is based on depthfirst search (DFS). A known conjecture states that DFS, which is a key technique in designing serial algorithms, is not amenable to polylog time parallelism using "around linearly" (or even polynomially) many processors. The first contribution of this paper is a general method for searching efficiently in parallel undirected graphs, called eardecomposition search (EDS). The second contribution demonstrates the applicability of this search method. We present an efficient parallel algorithm for stnumbering in a biconnected graph. The algorithm runs in logarithmic time using a linear number of processors on a concurrentread concurrentwrite (CRCW) PRAM. An efficient parallel algorithm for the problem did not exist before. The problem was not even known to be in NC. 1. Introduction We define the problems ...
The Accelerated Centroid Decomposition Technique For Optimal Parallel Tree Evaluation In Logarithmic Time
, 1986
"... A new general parallel algorithmic technique for computations on trees is presented. The new technique performs a reduction of the tree expression evaluation problem to list ranking; then, the list ranking provides a schedule for evaluating the tree operations. The technique needs logarithmic tim ..."
Abstract

Cited by 41 (3 self)
 Add to MetaCart
A new general parallel algorithmic technique for computations on trees is presented. The new technique performs a reduction of the tree expression evaluation problem to list ranking; then, the list ranking provides a schedule for evaluating the tree operations. The technique needs logarithmic time using an optimal number of processors and has applications to other tree problems. This new technique enables us to systematically order four basic ideas and techniques for parallel algorithms on tree: (1) The list ranking problem. (2) The Euler tour technique on trees. (3) The centroid decomposition technique. (4) The new accelerated centroid decomposition (ACD) technique. 1. Introduction The model of parallel computation used in this paper is the concurrentread exclusivewrite (CREW) parallel random access machine (PRAM). A PRAM employs p synchronous processors all having access to a common memory. A CREW PRAM allows concurrent access by several processors to the same common memo...
A Fast, Parallel Spanning Tree Algorithm for Symmetric Multiprocessors (SMPs) (Extended Abstract)
, 2004
"... Our study in this paper focuses on implementing parallel spanning tree algorithms on SMPs. Spanning tree is an important problem in the sense that it is the building block for many other parallel graph algorithms and also because it is representative of a large class of irregular combinatorial probl ..."
Abstract

Cited by 34 (13 self)
 Add to MetaCart
Our study in this paper focuses on implementing parallel spanning tree algorithms on SMPs. Spanning tree is an important problem in the sense that it is the building block for many other parallel graph algorithms and also because it is representative of a large class of irregular combinatorial problems that have simple and efficient sequential implementations and fast PRAM algorithms, but often have no known efficient parallel implementations. In this paper we present a new randomized algorithm and implementation with superior performance that for the firsttime achieves parallel speedup on arbitrary graphs (both regular and irregular topologies) when compared with the best sequential implementation for finding a spanning tree. This new algorithm uses several techniques to give an expected running time that scales linearly with the number p of processors for suitably large inputs (n> p 2). As the spanning tree problem is notoriously hard for any parallel implementation to achieve reasonable speedup, our study may shed new light on implementing PRAM algorithms for sharedmemory parallel computers. The main results of this paper are 1. A new and practical spanning tree algorithm for symmetric multiprocessors that exhibits parallel speedups on graphs with regular and irregular topologies; and 2. An experimental study of parallel spanning tree algorithms that reveals the superior performance of our new approach compared with the previous algorithms. The source code for these algorithms is freelyavailable from our web site hpc.ece.unm.edu.
Efficient Parallel Evaluation of Straightline Code and Arithmetic Circuits
 SIAM J. Comput
, 1988
"... A new parallel algorithm is given to evaluate a straight line program. The algorithm evaluates a program over a commutative semiring R of degree d and size n in time O(log n(log nd)) using M(n) processors, where M(n) is the number of processors required for multiplying n \Theta n matrices over the ..."
Abstract

Cited by 32 (5 self)
 Add to MetaCart
(Show Context)
A new parallel algorithm is given to evaluate a straight line program. The algorithm evaluates a program over a commutative semiring R of degree d and size n in time O(log n(log nd)) using M(n) processors, where M(n) is the number of processors required for multiplying n \Theta n matrices over the semiring R in O(log n) time. Appears in SIAM J. Comput., 17/4, pp. 687695 (1988). Preliminary version of this paper appeared in [6]. y Research supported in part by National Science Foundation Grant MCS800756 A01. z Research supported by NSF under ECS8404866, the Semiconductor Research Corporation under RSCH 84060496, and by an IBM Faculty Development Award. x Research Supported in part by NSF Grant DCR8504391 and by an IBM Faculty Development Award. 1 INTRODUCTION 1 1 Introduction In this paper we consider the problem of dynamic evaluation of a straight line program in parallel. This is a generalization of the result of Valiant et al [10]. They consider the problem of ta...
Parallel Algorithmic Techniques for Combinatorial Computation
 Ann. Rev. Comput. Sci
, 1988
"... this paper and supplied many helpful comments. This research was supported in part by NSF grants DCR8511713, CCR8605353, and CCR8814977, and by DARPA contract N0003984C0165. ..."
Abstract

Cited by 30 (3 self)
 Add to MetaCart
this paper and supplied many helpful comments. This research was supported in part by NSF grants DCR8511713, CCR8605353, and CCR8814977, and by DARPA contract N0003984C0165.
A Comparison of DataParallel Algorithms for Connected Components
 In Proc. 6th Ann. Symp. Parallel Algorithms and Architectures (SPAA94
, 1994
"... This paper presents a pragmatic comparison of three parallel algorithms for finding connected components, together with optimizations on these algorithms. Those being compared are two similar algorithms by Awerbuch and Shiloach [2] and by Shiloach and Vishkin [19] and a randomized contraction algori ..."
Abstract

Cited by 29 (1 self)
 Add to MetaCart
This paper presents a pragmatic comparison of three parallel algorithms for finding connected components, together with optimizations on these algorithms. Those being compared are two similar algorithms by Awerbuch and Shiloach [2] and by Shiloach and Vishkin [19] and a randomized contraction algorithm by Blelloch [7], based on algorithms by Reif [18] and Phillips [17]. Major improvements are given for the first two which significantly reduces the superlinear component of their work complexity. An improvement is also given for randomized algorithm, and this algorithm is shown to be the fastest of those tested. These comparisons are presented with NESL dataparallel code as executed on a Connection Machine 2. This research was sponsored in part by the Defense Advanced Research Projects Agency, CSTO, under the title "The Fox Project: Advanced Development of Systems Software", ARPA Order No. 8313, issued by ESD/AVS under Contract No. F1962891C0168, and in part by the ONR Graduate Fell...
Constructing a Maximal Independent Set in Parallel
 SIAM J. Disc. Math
, 1989
"... f a The problem of constructing in parallel a maximal independent set o given graph is considered. A new deterministic NC algorithm imple  t mented in the EREW PRAM model is presented. On graphs with n ver ices and m edges, it uses O ((n +m )/logn ) processors and runs in O (log n ) 3  c time. T ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
f a The problem of constructing in parallel a maximal independent set o given graph is considered. A new deterministic NC algorithm imple  t mented in the EREW PRAM model is presented. On graphs with n ver ices and m edges, it uses O ((n +m )/logn ) processors and runs in O (log n ) 3  c time. This reduces by a factor of logn both the running time and the pro essor count of the previously fastest deterministic algorithm which solves the problem using a linear number of processors. Key words: parallel computation, NC, graph, maximal independent set, 1 deterministic. . Introduction The problem of constructing in parallel a maximal independent set of a given graph, t MIS , has been investigated in several recent papers. Karp and Wigderson proved in [KW] hat the problem is in NC . Their algorithm finds a maximal independent set of an n  vertex graph in O (log n ) time and uses O (n /log n ) processors. In successive papers, the 4 3 3  s authors proposed algorithms which either...
A Comparison of Parallel Algorithms for Connected Components
 in the Symposium on Parallel Algorithms and Architectures
, 1994
"... This paper presents a comparison of the pragmatic aspects of some parallel algorithms for finding connected components, together with optimizations on these algorithms. The algorithms being compared are two similar algorithms by ShiloachVishkin [22] and AwerbuchShiloach [2], a randomized contracti ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
(Show Context)
This paper presents a comparison of the pragmatic aspects of some parallel algorithms for finding connected components, together with optimizations on these algorithms. The algorithms being compared are two similar algorithms by ShiloachVishkin [22] and AwerbuchShiloach [2], a randomized contraction algorithm based on algorithms by Reif [21] and Phillips [20], and a hybrid algorithm [11]. Improvements are given for the first two to improve performance significantly, although without improving their asymptotic complexity. The hybrid combines features of the others and is generally the fastest of those tested. Timings were made using NESL [4] code as executed on a Connection Machine 2 and Cray YMP/C90. 1 Introduction The complexity of various PRAM algorithms has received much attention, but there has been relatively little work on the implementation and pragmatic efficiency of many of these algorithms. Moreover, much of this work has been for algorithms having regular communication ...
On the architectural requirements for efficient execution of graph algorithms
 In Proc. 34th Int’l Conf. on Parallel Processing (ICPP
, 2005
"... Combinatorial problems such as those from graph theory pose serious challenges for parallel machines due to noncontiguous, concurrent accesses to global data structures with low degrees of locality. The hierarchical memory systems of symmetric multiprocessor (SMP) clusters optimize for local, conti ..."
Abstract

Cited by 17 (9 self)
 Add to MetaCart
(Show Context)
Combinatorial problems such as those from graph theory pose serious challenges for parallel machines due to noncontiguous, concurrent accesses to global data structures with low degrees of locality. The hierarchical memory systems of symmetric multiprocessor (SMP) clusters optimize for local, contiguous memory accesses, and so are inefficient platforms for such algorithms. Few parallel graph algorithms outperform their best sequential implementation on SMP clusters due to long memory latencies and high synchronization costs. In this paper, we consider the performance and scalability of two graph algorithms, list ranking and connected components, on two classes of sharedmemory computers: symmetric multiprocessors such as the Sun Enterprise servers and multithreaded architectures