Results 1 - 10
of
66
Programming Parallel Algorithms
, 1996
"... In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a th ..."
Abstract
-
Cited by 163 (7 self)
- Add to MetaCart
In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a theoretical framework, many are quite efficient in practice or have key ideas that have been used in efficient implementations. This research on parallel algorithms has not only improved our general understanding ofparallelism but in several cases has led to improvements in sequential algorithms. Unf:ortunately there has been less success in developing good languages f:or prograftlftling parallel algorithftls, particularly languages that are well suited for teaching and prototyping algorithms. There has been a large gap between languages
Applications of parametric searching in geometric optimization
- J. Algorithms
, 1994
"... z Sivan Toledo x ..."
Sorting in Linear Time?
, 1995
"... We show that a unit-cost RAM with a word length of w bits can sort n integers in the range 0 : : 2 w \Gamma1 in O(n log log n) time, for arbitrary w log n, a significant improvement over the bound of O(n p log n) achieved by the fusion trees of Fredman and Willard. Provided that w (log n) 2+f ..."
Abstract
-
Cited by 73 (15 self)
- Add to MetaCart
We show that a unit-cost RAM with a word length of w bits can sort n integers in the range 0 : : 2 w \Gamma1 in O(n log log n) time, for arbitrary w log n, a significant improvement over the bound of O(n p log n) achieved by the fusion trees of Fredman and Willard. Provided that w (log n) 2+ffl for some fixed ffl ? 0, the sorting can even be accomplished in linear expected time with a randomized algorithm. Both of our algorithms parallelize without loss on a unit-cost PRAM with a word length of w bits. The first one yields an algorithm that uses O(logn) time and O(n log log n) operations on a deterministic CRCW PRAM. The second one yields an algorithm that uses O(log n) expected time and O(n) expected operations on a randomized EREW PRAM, provided that w (log n) 2+ffl for some fixed ffl ? 0. Our deterministic and randomized sequential and parallel algorithms generalize to the lexicographic sorting problem of sorting multiple-precision integers represented in several words. ...
Biconnectivity Approximations and Graph Carvings
, 1994
"... A spanning tree in a graph is the smallest connected spanning subgraph. Given a graph, how does one find the smallest (i.e., least number of edges) 2-connected spanning subgraph (connectivity refers to both edge and vertex connectivity, if not specified) ? Unfortunately, the problem is known to be ..."
Abstract
-
Cited by 67 (3 self)
- Add to MetaCart
A spanning tree in a graph is the smallest connected spanning subgraph. Given a graph, how does one find the smallest (i.e., least number of edges) 2-connected spanning subgraph (connectivity refers to both edge and vertex connectivity, if not specified) ? Unfortunately, the problem is known to be NP -hard. We consider the problem of finding a better approximation to the smallest 2-connected subgraph, by an efficient algorithm. For 2-edge connectivity our algorithm guarantees a solution that is no more than 3 2 times the optimal. For 2-vertex connectivity our algorithm guarantees a solution that is no more than 5 3 times the optimal. The previous best approximation factor is 2 for each of these problems. The new algorithms (and their analyses) depend upon a structure called a carving of a graph, which is of independent interest. We show that approximating the optimal solution to within an additive constant is NP -hard as well. We also consider the case where the graph has edge weigh...
Cache-oblivious priority queue and graph algorithm applications
- In Proc. 34th Annual ACM Symposium on Theory of Computing
, 2002
"... In this paper we develop an optimal cache-oblivious priority queue data structure, supporting insertion, deletion, and deletemin operations in O ( 1 B logM/B N) amortized memory B transfers, where M and B are the memory and block transfer sizes of any two consecutive levels of a multilevel memory hi ..."
Abstract
-
Cited by 56 (10 self)
- Add to MetaCart
In this paper we develop an optimal cache-oblivious priority queue data structure, supporting insertion, deletion, and deletemin operations in O ( 1 B logM/B N) amortized memory B transfers, where M and B are the memory and block transfer sizes of any two consecutive levels of a multilevel memory hierarchy. In a cache-oblivious data structure, M and B are not used in the description of the structure. The bounds match the bounds of several previously developed external-memory (cache-aware) priority queue data structures, which all rely crucially on knowledge about M and B. Priority queues are a critical component in many of the best known external-memory graph algorithms, and using our cache-oblivious priority queue we develop several cacheoblivious graph algorithms.
Efficient parallel graph algorithms for coarse grained multicomputers and BSP (Extended Abstract)
- in Proc. 24th International Colloquium on Automata, Languages and Programming (ICALP'97
, 1997
"... In this paper, we present deterministic parallel algorithms for the coarse grained multicomputer (CGM) and bulk-synchronous parallel computer (BSP) models which solve the following well known graph problems: (1) list ranking, (2) Euler tour construction, (3) computing the connected components and s ..."
Abstract
-
Cited by 55 (23 self)
- Add to MetaCart
In this paper, we present deterministic parallel algorithms for the coarse grained multicomputer (CGM) and bulk-synchronous parallel computer (BSP) models which solve the following well known graph problems: (1) list ranking, (2) Euler tour construction, (3) computing the connected components and spanning forest, (4) lowest common ancestor preprocessing, (5) tree contraction and expression tree evaluation, (6) computing an ear decomposition or open ear decomposition, (7) 2-edge connectivity and biconnectivity (testing and component computation), and (8) cordal graph recognition (finding a perfect elimination ordering). The algorithms for Problems 1-7 require O(log p) communication rounds and linear sequential work per round. Our results for Problems 1 and 2, i.e.they are fully scalable, and for Problems hold for arbitrary ratios n p 3-8 it is assumed that n p,>0, which is true for all commercially
Parallel Ear Decomposition Search (EDS) And ST-Numbering In Graphs
, 1986
"... [LEC-67] linear time serial algorithm for testing planarity of graphs uses the linear time serial algorithm of [ET-76] for st-numbering. This st-numbering algorithm is based on depth-first search (DFS). A known conjecture states that DFS, which is a key technique in designing serial algorithms, is n ..."
Abstract
-
Cited by 38 (2 self)
- Add to MetaCart
[LEC-67] linear time serial algorithm for testing planarity of graphs uses the linear time serial algorithm of [ET-76] for st-numbering. This st-numbering algorithm is based on depth-first search (DFS). A known conjecture states that DFS, which is a key technique in designing serial algorithms, is not amenable to poly-log time parallelism using "around linearly" (or even polynomially) many processors. The first contribution of this paper is a general method for searching efficiently in parallel undirected graphs, called ear-decomposition search (EDS). The second contribution demonstrates the applicability of this search method. We present an efficient parallel algorithm for st-numbering in a biconnected graph. The algorithm runs in logarithmic time using a linear number of processors on a concurrentread concurrent-write (CRCW) PRAM. An efficient parallel algorithm for the problem did not exist before. The problem was not even known to be in NC. 1. Introduction We define the problems ...
Analyzing the Behavior and Performance of Parallel Programs
- Univ. of Wisconsin-Madison, UW CS Tech. Rep
, 1993
"... An analytical performance model for parallel programs can provide qualitative insight as well as efficient quantitative evaluation and prediction of parallel program performance. While stochastic models for parallel programs can represent execution time variance due to communication and resource con ..."
Abstract
-
Cited by 37 (5 self)
- Add to MetaCart
An analytical performance model for parallel programs can provide qualitative insight as well as efficient quantitative evaluation and prediction of parallel program performance. While stochastic models for parallel programs can represent execution time variance due to communication and resource contention delays, a qualitative assessment of previous models shows that the stochastic assumption makes it extremely difficult to compute synchronization costs and overall execution times. This thesis first re-evaluates the need for the stochastic assumption by examining the influence of non-deterministic communication and resource contention delays on execution times in parallel programs. An analytical model of program behavior, combined with detailed program measurements, provides compelling evidence that in shared-memory programs on current systems as well as programs with similar granularity on foreseeable future systems, such delays introduce extremely low variance into the execution tim...
The Accelerated Centroid Decomposition Technique For Optimal Parallel Tree Evaluation In Logarithmic Time
, 1986
"... A new general parallel algorithmic technique for computations on trees is presented. The new technique performs a reduction of the tree expression evaluation problem to list ranking; then, the list ranking provides a schedule for evaluating the tree operations. The technique needs logarithmic tim ..."
Abstract
-
Cited by 36 (3 self)
- Add to MetaCart
A new general parallel algorithmic technique for computations on trees is presented. The new technique performs a reduction of the tree expression evaluation problem to list ranking; then, the list ranking provides a schedule for evaluating the tree operations. The technique needs logarithmic time using an optimal number of processors and has applications to other tree problems. This new technique enables us to systematically order four basic ideas and techniques for parallel algorithms on tree: (1) The list ranking problem. (2) The Euler tour technique on trees. (3) The centroid decomposition technique. (4) The new accelerated centroid decomposition (ACD) technique. 1. Introduction The model of parallel computation used in this paper is the concurrent-read exclusive-write (CREW) parallel random access machine (PRAM). A PRAM employs p synchronous processors all having access to a common memory. A CREW PRAM allows concurrent access by several processors to the same common memo...
A new succinct representation of RMQ-information and improvements in the enhanced suffix array
- PROC. ESCAPE. LNCS
, 2007
"... The Range-Minimum-Query-Problem is to preprocess an array of length n in O(n) time such that all subsequent queries asking for the position of a minimal element between two specified indices can be obtained in constant time. This problem was first solved by Berkman and Vishkin [1], and Sadakane [2] ..."
Abstract
-
Cited by 31 (13 self)
- Add to MetaCart
The Range-Minimum-Query-Problem is to preprocess an array of length n in O(n) time such that all subsequent queries asking for the position of a minimal element between two specified indices can be obtained in constant time. This problem was first solved by Berkman and Vishkin [1], and Sadakane [2] gave the first succinct data structure that uses 4n+o(n) bits of additional space. In practice, this method has several drawbacks: it needs O(nlog n) bits of intermediate space when constructing the data structure, and it builds on previous results on succinct data structures. We overcome these problems by giving the first algorithm that never uses more than 2n + o(n) bits, and does not rely on rank- and select-queries or other succinct data structures. We stress the importance of this result by simplifying and reducing the space consumption of the Enhanced Suffix Array [3], while retaining its capability of simulating top-down-traversals of the suffix tree, used, e.g., to locate all occ positions of a pattern p in a text in optimal O(|p | + occ) time (assuming constant alphabet size). We further prove a lower bound of 2n − o(n) bits, which makes our algorithm asymptotically optimal.

