Results 1  10
of
221
Programming Parallel Algorithms
, 1996
"... In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a th ..."
Abstract

Cited by 193 (9 self)
 Add to MetaCart
In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a theoretical framework, many are quite efficient in practice or have key ideas that have been used in efficient implementations. This research on parallel algorithms has not only improved our general understanding ofparallelism but in several cases has led to improvements in sequential algorithms. Unf:ortunately there has been less success in developing good languages f:or prograftlftling parallel algorithftls, particularly languages that are well suited for teaching and prototyping algorithms. There has been a large gap between languages
Scans as Primitive Parallel Operations
 IEEE Transactions on Computers
, 1987
"... In most parallel randomaccess machine (PRAM) models, memory references are assumed to take unit time. In practice, and in theory, certain scan operations, also known as prefix computations, can executed in no more time than these parallel memory references. This paper outline an extensive study of ..."
Abstract

Cited by 157 (12 self)
 Add to MetaCart
In most parallel randomaccess machine (PRAM) models, memory references are assumed to take unit time. In practice, and in theory, certain scan operations, also known as prefix computations, can executed in no more time than these parallel memory references. This paper outline an extensive study of the effect of including in the PRAM models, such scan operations as unittime primitives. The study concludes that the primitives improve the asymptotic running time of many algorithms by an O(lg n) factor, greatly simplify the description of many algorithms, and are significantly easier to implement than memory references. We therefore argue that the algorithm designer should feel free to use these operations as if they were as cheap as a memory reference. This paper describes five algorithms that clearly illustrate how the scan primitives can be used in algorithm design: a radixsort algorithm, a quicksort algorithm, a minimumspanning tree algorithm, a linedrawing algorithm and a mergi...
Simple linear work suffix array construction
, 2003
"... Abstract. Suffix trees and suffix arrays are widely used and largely interchangeable index structures on strings and sequences. Practitioners prefer suffix arrays due to their simplicity and space efficiency while theoreticians use suffix trees due to lineartime construction algorithms and more exp ..."
Abstract

Cited by 149 (6 self)
 Add to MetaCart
Abstract. Suffix trees and suffix arrays are widely used and largely interchangeable index structures on strings and sequences. Practitioners prefer suffix arrays due to their simplicity and space efficiency while theoreticians use suffix trees due to lineartime construction algorithms and more explicit structure. We narrow this gap between theory and practice with a simple lineartime construction algorithm for suffix arrays. The simplicity is demonstrated with a C++ implementation of 50 effective lines of code. The algorithm is called DC3, which stems from the central underlying concept of difference cover. This view leads to a generalized algorithm, DC, that allows a spaceefficient implementation and, moreover, supports the choice of a space–time tradeoff. For any v ∈ [1, √ n], it runs in O(vn) time using O(n / √ v) space in addition to the input string and the suffix array. We also present variants of the algorithm for several parallel and hierarchical memory models of computation. The algorithms for BSP and EREWPRAM models are asymptotically faster than all previous suffix tree or array construction algorithms.
Geometric Pattern Matching under Euclidean Motion
, 1993
"... Given two planar sets A and B, we examine the problem of determining the smallest " such that there is a Euclidean motion (rotation and translation) of A that brings each member of A within distance " of some member of B. We establish upper bounds on the combinatorial complexity of this subproblem i ..."
Abstract

Cited by 73 (2 self)
 Add to MetaCart
Given two planar sets A and B, we examine the problem of determining the smallest " such that there is a Euclidean motion (rotation and translation) of A that brings each member of A within distance " of some member of B. We establish upper bounds on the combinatorial complexity of this subproblem in modelbased computer vision, when the sets A and B contain points, line segments, or (filledin) polygons. We also show how to use our methods to substantially improve on existing algorithms for finding the minimum Hausdorff distance under Euclidean motion. 1 Author's address: Department of Computer Science, Cornell University, Ithaca, NY 14853. This work was supported by the Advanced Research Projects Agency of the Department of Defense under ONR Contract N0001492J1989, and by ONR Contract N0001492J1839, NSF Contract IRI9006137, and AFOSR Contract AFOSR910328. 2 Author's address: Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218. This work was suppo...
Parallel SymmetryBreaking in Sparse Graphs
 SIAM J. Disc. Math
, 1987
"... We describe efficient deterministic techniques for breaking symmetry in parallel. These techniques work well on rooted trees and graphs of constant degree or genus. Our primary technique allows us to 3color a rooted tree in O(lg n) time on an EREW PRAM using a linear number of processors. We use th ..."
Abstract

Cited by 73 (2 self)
 Add to MetaCart
We describe efficient deterministic techniques for breaking symmetry in parallel. These techniques work well on rooted trees and graphs of constant degree or genus. Our primary technique allows us to 3color a rooted tree in O(lg n) time on an EREW PRAM using a linear number of processors. We use these techniques to construct fast linear processor algorithms for several problems, including (\Delta + 1)coloring constantdegree graphs and 5coloring planar graphs. We also prove lower bounds for 2coloring directed lists and for finding maximal independent sets in arbitrary graphs. 1 Introduction Some problems for which trivial sequential algorithms exist appear to be much harder to solve in a parallel framework. When converting a sequential algorithm to a parallel one, at each step of the parallel algorithm we have to choose a set of operations which may be executed in parallel. Often, we have to choose these operations from a large set A preliminary version of this paper appear...
CommunicationEfficient Parallel Sorting
, 1996
"... We study the problem of sorting n numbers on a pprocessor bulksynchronous parallel (BSP) computer, which is a parallel multicomputer that allows for general processortoprocessor communication rounds provided each processor sends and receives at most h items in any round. We provide parallel sort ..."
Abstract

Cited by 64 (2 self)
 Add to MetaCart
We study the problem of sorting n numbers on a pprocessor bulksynchronous parallel (BSP) computer, which is a parallel multicomputer that allows for general processortoprocessor communication rounds provided each processor sends and receives at most h items in any round. We provide parallel sorting methods that use internal computation time that is O( n log n p ) and a number of communication rounds that is O( log n log(h+1) ) for h = \Theta(n=p). The internal computation bound is optimal for any comparisonbased sorting algorithm. Moreover, the number of communication rounds is bounded by a constant for the (practical) situations when p n 1\Gamma1=c for a constant c 1. In fact, we show that our bound on the number of communication rounds is asymptotically optimal for the full range of values for p, for we show that just computing the "or" of n bits distributed evenly to the first O(n=h) of an arbitrary number of processors in a BSP computer requires\Omega\Gammaqui n= log(h...
Optimal and Sublogarithmic Time Randomized Parallel Sorting Algorithms
 SIAM Journal on Computing
, 1989
"... .We assume a parallel RAM model which allows both concurrent reads and concurrent writes of a global memory. Our main result is an optimal randomized parallel algorithm for INTEGER SORT (i.e., for sorting n integers in the range [1; n]). Our algorithm costs only logarithmic time and is the first kno ..."
Abstract

Cited by 64 (12 self)
 Add to MetaCart
.We assume a parallel RAM model which allows both concurrent reads and concurrent writes of a global memory. Our main result is an optimal randomized parallel algorithm for INTEGER SORT (i.e., for sorting n integers in the range [1; n]). Our algorithm costs only logarithmic time and is the first known that is optimal: the product of its time and processor bounds is upper bounded by a linear function of the input size. We also give a deterministic sublogarithmic time algorithm for prefix sum. In addition we present a sublogarithmic time algorithm for obtaining a random permutation of n elements in parallel. And finally, we present sublogarithmic time algorithms for GENERAL SORT and INTEGER SORT. Our sublogarithmic GENERAL SORT algorithm is also optimal. Key words. Randomized algorithms, parallel sorting, parallel random access machines, random permutations, radix sort, prefix sum, optimal algorithms. AMS(MOS) subject classifications. 68Q25. 1 A preliminary version of this paper ...
Parallel Construction of Quadtrees and Quality Triangulations
, 1999
"... We describe e#cient PRAM algorithms for constructing unbalanced quadtrees, balanced quadtrees, and quadtreebased finite element meshes. Our algorithms take time O(log n) for point set input and O(log n log k) time for planar straightline graphs, using O(n + k/ log n) processors, where n measure ..."
Abstract

Cited by 61 (5 self)
 Add to MetaCart
We describe e#cient PRAM algorithms for constructing unbalanced quadtrees, balanced quadtrees, and quadtreebased finite element meshes. Our algorithms take time O(log n) for point set input and O(log n log k) time for planar straightline graphs, using O(n + k/ log n) processors, where n measures input size and k output size. 1. Introduction A crucial preprocessing step for the finite element method is mesh generation, and the most general and versatile type of twodimensional mesh is an unstructured triangular mesh. Such a mesh is simply a triangulation of the input domain (e.g., a polygon), along with some extra vertices, called Steiner points. Not all triangulations, however, serve equally well; numerical and discretization error depend on the quality of the triangulation, meaning the shapes and sizes of triangles. A typical quality guarantee gives a lower bound on the minimum angle in the triangulation. Baker et al. 1 first proved the existence of quality triangulations fo...
Efficient parallel graph algorithms for coarse grained multicomputers and BSP (Extended Abstract)
 in Proc. 24th International Colloquium on Automata, Languages and Programming (ICALP'97
, 1997
"... In this paper, we present deterministic parallel algorithms for the coarse grained multicomputer (CGM) and bulksynchronous parallel computer (BSP) models which solve the following well known graph problems: (1) list ranking, (2) Euler tour construction, (3) computing the connected components and s ..."
Abstract

Cited by 59 (23 self)
 Add to MetaCart
In this paper, we present deterministic parallel algorithms for the coarse grained multicomputer (CGM) and bulksynchronous parallel computer (BSP) models which solve the following well known graph problems: (1) list ranking, (2) Euler tour construction, (3) computing the connected components and spanning forest, (4) lowest common ancestor preprocessing, (5) tree contraction and expression tree evaluation, (6) computing an ear decomposition or open ear decomposition, (7) 2edge connectivity and biconnectivity (testing and component computation), and (8) cordal graph recognition (finding a perfect elimination ordering). The algorithms for Problems 17 require O(log p) communication rounds and linear sequential work per round. Our results for Problems 1 and 2, i.e.they are fully scalable, and for Problems hold for arbitrary ratios n p 38 it is assumed that n p,>0, which is true for all commercially