Results 1 - 10
of
208
Programming Parallel Algorithms
, 1996
"... In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a th ..."
Abstract
-
Cited by 163 (7 self)
- Add to MetaCart
In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a theoretical framework, many are quite efficient in practice or have key ideas that have been used in efficient implementations. This research on parallel algorithms has not only improved our general understanding ofparallelism but in several cases has led to improvements in sequential algorithms. Unf:ortunately there has been less success in developing good languages f:or prograftlftling parallel algorithftls, particularly languages that are well suited for teaching and prototyping algorithms. There has been a large gap between languages
Scans as Primitive Parallel Operations
- IEEE Transactions on Computers
, 1987
"... In most parallel random-access machine (P-RAM) models, memory references are assumed to take unit time. In practice, and in theory, certain scan operations, also known as prefix computations, can executed in no more time than these parallel memory references. This paper outline an extensive study of ..."
Abstract
-
Cited by 143 (12 self)
- Add to MetaCart
In most parallel random-access machine (P-RAM) models, memory references are assumed to take unit time. In practice, and in theory, certain scan operations, also known as prefix computations, can executed in no more time than these parallel memory references. This paper outline an extensive study of the effect of including in the P-RAM models, such scan operations as unit-time primitives. The study concludes that the primitives improve the asymptotic running time of many algorithms by an O(lg n) factor, greatly simplify the description of many algorithms, and are significantly easier to implement than memory references. We therefore argue that the algorithm designer should feel free to use these operations as if they were as cheap as a memory reference. This paper describes five algorithms that clearly illustrate how the scan primitives can be used in algorithm design: a radix-sort algorithm, a quicksort algorithm, a minimumspanning -tree algorithm, a line-drawing algorithm and a mergi...
Simple linear work suffix array construction
, 2003
"... Abstract. Suffix trees and suffix arrays are widely used and largely interchangeable index structures on strings and sequences. Practitioners prefer suffix arrays due to their simplicity and space efficiency while theoreticians use suffix trees due to linear-time construction algorithms and more exp ..."
Abstract
-
Cited by 119 (6 self)
- Add to MetaCart
Abstract. Suffix trees and suffix arrays are widely used and largely interchangeable index structures on strings and sequences. Practitioners prefer suffix arrays due to their simplicity and space efficiency while theoreticians use suffix trees due to linear-time construction algorithms and more explicit structure. We narrow this gap between theory and practice with a simple linear-time construction algorithm for suffix arrays. The simplicity is demonstrated with a C++ implementation of 50 effective lines of code. The algorithm is called DC3, which stems from the central underlying concept of difference cover. This view leads to a generalized algorithm, DC, that allows a space-efficient implementation and, moreover, supports the choice of a space–time tradeoff. For any v ∈ [1, √ n], it runs in O(vn) time using O(n / √ v) space in addition to the input string and the suffix array. We also present variants of the algorithm for several parallel and hierarchical memory models of computation. The algorithms for BSP and EREW-PRAM models are asymptotically faster than all previous suffix tree or array construction algorithms.
Sorting in Linear Time?
, 1995
"... We show that a unit-cost RAM with a word length of w bits can sort n integers in the range 0 : : 2 w \Gamma1 in O(n log log n) time, for arbitrary w log n, a significant improvement over the bound of O(n p log n) achieved by the fusion trees of Fredman and Willard. Provided that w (log n) 2+f ..."
Abstract
-
Cited by 73 (15 self)
- Add to MetaCart
We show that a unit-cost RAM with a word length of w bits can sort n integers in the range 0 : : 2 w \Gamma1 in O(n log log n) time, for arbitrary w log n, a significant improvement over the bound of O(n p log n) achieved by the fusion trees of Fredman and Willard. Provided that w (log n) 2+ffl for some fixed ffl ? 0, the sorting can even be accomplished in linear expected time with a randomized algorithm. Both of our algorithms parallelize without loss on a unit-cost PRAM with a word length of w bits. The first one yields an algorithm that uses O(logn) time and O(n log log n) operations on a deterministic CRCW PRAM. The second one yields an algorithm that uses O(log n) expected time and O(n) expected operations on a randomized EREW PRAM, provided that w (log n) 2+ffl for some fixed ffl ? 0. Our deterministic and randomized sequential and parallel algorithms generalize to the lexicographic sorting problem of sorting multiple-precision integers represented in several words. ...
Geometric Pattern Matching under Euclidean Motion
, 1993
"... Given two planar sets A and B, we examine the problem of determining the smallest " such that there is a Euclidean motion (rotation and translation) of A that brings each member of A within distance " of some member of B. We establish upper bounds on the combinatorial complexity of this subproblem i ..."
Abstract
-
Cited by 65 (2 self)
- Add to MetaCart
Given two planar sets A and B, we examine the problem of determining the smallest " such that there is a Euclidean motion (rotation and translation) of A that brings each member of A within distance " of some member of B. We establish upper bounds on the combinatorial complexity of this subproblem in model-based computer vision, when the sets A and B contain points, line segments, or (filled-in) polygons. We also show how to use our methods to substantially improve on existing algorithms for finding the minimum Hausdorff distance under Euclidean motion. 1 Author's address: Department of Computer Science, Cornell University, Ithaca, NY 14853. This work was supported by the Advanced Research Projects Agency of the Department of Defense under ONR Contract N00014-92-J-1989, and by ONR Contract N00014-92-J-1839, NSF Contract IRI-9006137, and AFOSR Contract AFOSR-91-0328. 2 Author's address: Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218. This work was suppo...
Optimal and Sublogarithmic Time Randomized Parallel Sorting Algorithms
- SIAM Journal on Computing
, 1989
"... .We assume a parallel RAM model which allows both concurrent reads and concurrent writes of a global memory. Our main result is an optimal randomized parallel algorithm for INTEGER SORT (i.e., for sorting n integers in the range [1; n]). Our algorithm costs only logarithmic time and is the first kno ..."
Abstract
-
Cited by 60 (12 self)
- Add to MetaCart
.We assume a parallel RAM model which allows both concurrent reads and concurrent writes of a global memory. Our main result is an optimal randomized parallel algorithm for INTEGER SORT (i.e., for sorting n integers in the range [1; n]). Our algorithm costs only logarithmic time and is the first known that is optimal: the product of its time and processor bounds is upper bounded by a linear function of the input size. We also give a deterministic sub-logarithmic time algorithm for prefix sum. In addition we present a sub-logarithmic time algorithm for obtaining a random permutation of n elements in parallel. And finally, we present sub-logarithmic time algorithms for GENERAL SORT and INTEGER SORT. Our sublogarithmic GENERAL SORT algorithm is also optimal. Key words. Randomized algorithms, parallel sorting, parallel random access machines, random permutations, radix sort, prefix sum, optimal algorithms. AMS(MOS) subject classifications. 68Q25. 1 A preliminary version of this paper ...
Communication-Efficient Parallel Sorting
, 1996
"... We study the problem of sorting n numbers on a p-processor bulk-synchronous parallel (BSP) computer, which is a parallel multicomputer that allows for general processor-to-processor communication rounds provided each processor sends and receives at most h items in any round. We provide parallel sort ..."
Abstract
-
Cited by 60 (2 self)
- Add to MetaCart
We study the problem of sorting n numbers on a p-processor bulk-synchronous parallel (BSP) computer, which is a parallel multicomputer that allows for general processor-to-processor communication rounds provided each processor sends and receives at most h items in any round. We provide parallel sorting methods that use internal computation time that is O( n log n p ) and a number of communication rounds that is O( log n log(h+1) ) for h = \Theta(n=p). The internal computation bound is optimal for any comparison-based sorting algorithm. Moreover, the number of communication rounds is bounded by a constant for the (practical) situations when p n 1\Gamma1=c for a constant c 1. In fact, we show that our bound on the number of communication rounds is asymptotically optimal for the full range of values for p, for we show that just computing the "or" of n bits distributed evenly to the first O(n=h) of an arbitrary number of processors in a BSP computer requires\Omega\Gammaqui n= log(h...
Parallel Symmetry-Breaking in Sparse Graphs
- SIAM J. Disc. Math
, 1987
"... We describe efficient deterministic techniques for breaking symmetry in parallel. These techniques work well on rooted trees and graphs of constant degree or genus. Our primary technique allows us to 3-color a rooted tree in O(lg n) time on an EREW PRAM using a linear number of processors. We use th ..."
Abstract
-
Cited by 60 (2 self)
- Add to MetaCart
We describe efficient deterministic techniques for breaking symmetry in parallel. These techniques work well on rooted trees and graphs of constant degree or genus. Our primary technique allows us to 3-color a rooted tree in O(lg n) time on an EREW PRAM using a linear number of processors. We use these techniques to construct fast linear processor algorithms for several problems, including (\Delta + 1)-coloring constantdegree graphs and 5-coloring planar graphs. We also prove lower bounds for 2-coloring directed lists and for finding maximal independent sets in arbitrary graphs. 1 Introduction Some problems for which trivial sequential algorithms exist appear to be much harder to solve in a parallel framework. When converting a sequential algorithm to a parallel one, at each step of the parallel algorithm we have to choose a set of operations which may be executed in parallel. Often, we have to choose these operations from a large set A preliminary version of this paper appear...
Efficient parallel graph algorithms for coarse grained multicomputers and BSP (Extended Abstract)
- in Proc. 24th International Colloquium on Automata, Languages and Programming (ICALP'97
, 1997
"... In this paper, we present deterministic parallel algorithms for the coarse grained multicomputer (CGM) and bulk-synchronous parallel computer (BSP) models which solve the following well known graph problems: (1) list ranking, (2) Euler tour construction, (3) computing the connected components and s ..."
Abstract
-
Cited by 55 (23 self)
- Add to MetaCart
In this paper, we present deterministic parallel algorithms for the coarse grained multicomputer (CGM) and bulk-synchronous parallel computer (BSP) models which solve the following well known graph problems: (1) list ranking, (2) Euler tour construction, (3) computing the connected components and spanning forest, (4) lowest common ancestor preprocessing, (5) tree contraction and expression tree evaluation, (6) computing an ear decomposition or open ear decomposition, (7) 2-edge connectivity and biconnectivity (testing and component computation), and (8) cordal graph recognition (finding a perfect elimination ordering). The algorithms for Problems 1-7 require O(log p) communication rounds and linear sequential work per round. Our results for Problems 1 and 2, i.e.they are fully scalable, and for Problems hold for arbitrary ratios n p 3-8 it is assumed that n p,>0, which is true for all commercially
Parallel Construction of Quadtrees and Quality Triangulations
, 1999
"... We describe e#cient PRAM algorithms for constructing unbalanced quadtrees, balanced quadtrees, and quadtree-based finite element meshes. Our algorithms take time O(log n) for point set input and O(log n log k) time for planar straight-line graphs, using O(n + k/ log n) processors, where n measure ..."
Abstract
-
Cited by 54 (5 self)
- Add to MetaCart
We describe e#cient PRAM algorithms for constructing unbalanced quadtrees, balanced quadtrees, and quadtree-based finite element meshes. Our algorithms take time O(log n) for point set input and O(log n log k) time for planar straight-line graphs, using O(n + k/ log n) processors, where n measures input size and k output size. 1. Introduction A crucial preprocessing step for the finite element method is mesh generation, and the most general and versatile type of two-dimensional mesh is an unstructured triangular mesh. Such a mesh is simply a triangulation of the input domain (e.g., a polygon), along with some extra vertices, called Steiner points. Not all triangulations, however, serve equally well; numerical and discretization error depend on the quality of the triangulation, meaning the shapes and sizes of triangles. A typical quality guarantee gives a lower bound on the minimum angle in the triangulation. Baker et al. 1 first proved the existence of quality triangulations fo...

