Results 1 - 10
of
10
Extending The Scalable Coherent Interface For Large-Scale Shared-Memory Multiprocessors
, 1993
"... Massively parallel machines promise to provide enormous computing power using an amalgamation of low-cost parts. We believe many of these will be shared-memory machines, since they do not burden the programmer with data placement and nonuniform access semantics. However, an efficient kiloprocessor s ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
Massively parallel machines promise to provide enormous computing power using an amalgamation of low-cost parts. We believe many of these will be shared-memory machines, since they do not burden the programmer with data placement and nonuniform access semantics. However, an efficient kiloprocessor solution for the shared-memory paradigm has proven elusive due to bottlenecks associated with parallel accesses to rapidly changing data. The Scalable Coherent Interface (SCI) is an IEEE and ANSI standard for multiprocessors, specifying a topology-independent network and a cache-coherence protocol. The goal of this dissertation is to investigate ways to efficiently share frequently changing data among thousands of processors. SCI is the platform in which these methods are investigated. Before investigating cache-coherence protocols, we demonstrate that an arbitrary topology can be constructed from a set of interwoven rings, such as SCI rings. This result is important because it would be impos...
Parallel Tree Contraction Part 2: Further Applications
- SIAM JOURNAL ON COMPUTING
, 1991
"... This paper applies the parallel tree contraction techniques developed in Miller and paper [Randomness and Computation, 5, S. Micali, ed., JAI Press, 1989, pp. 47-72] to a number of fundamental graph problems. The paper presents an time and processor, a 0-sided randomized algorithm for testing the i ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
This paper applies the parallel tree contraction techniques developed in Miller and paper [Randomness and Computation, 5, S. Micali, ed., JAI Press, 1989, pp. 47-72] to a number of fundamental graph problems. The paper presents an time and processor, a 0-sided randomized algorithm for testing the isomorphism of trees, and an n) time, n-processor algorithm for maximal isomorphism and for common subexpression elimination. An time, n-processor algorithm for computing the canonical forms of trees and subtrees is given. An Ologn time algorithm for computing the tree of 3-connected components of a graph, an n)time algorithm for computing an explicit planar embedding of a planar graph, and an n)time algorithm for computing a canonical form for a planar graph are also given. All these latter algorithms use only processors on a Parallel Random Access Machine (PRAM) model with concurrent writes and concurrent reads.
Efficient Matrix Chain Ordering in Polylog Time
- IN PROC. OF INT'L PARALLEL PROCESSING SYMP
, 1998
"... The matrix chain ordering problem is to find the cheapest way to multiply a chain of n matrices, where the matrices are pairwise compatible but of varying dimensions. Here we give several new parallel algorithms including O(lg 3 n)-time and n/lg n-processor algorithms for solving the matrix chain o ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
The matrix chain ordering problem is to find the cheapest way to multiply a chain of n matrices, where the matrices are pairwise compatible but of varying dimensions. Here we give several new parallel algorithms including O(lg 3 n)-time and n/lg n-processor algorithms for solving the matrix chain ordering problem and for solving an optimal triangulation problem of convex polygons on the common CRCW PRAM model. Next, by using efficient algorithms for computing row minima of totally monotone matrices, this complexity is improved to O(lg 2 n) time with n processors on the EREW PRAM and to O(lg 2 nlg lg n) time with n/lg lg n processors on a common CRCW PRAM. A new algorithm for computing the row minima of totally monotone matrices improves our parallel MCOP algorithm to O(nlg 1.5 n) work and polylog time on a CREW PRAM. Optimal log-time algorithms for computing row minima of totally monotone matrices will improve our algorithm and enable it to have the same work as the sequential algorithm of Hu and
Efficient Parallel Dynamic Programming
, 1994
"... In 1983, Valiant, Skyum, Berkowitz and Racko# showed that many problems with simple O#n 3 # sequential dynamic programming solutions are in the class NC. They used straight line programs to show that these problems can be solved in O#lg 2 n# time with n 9 processors. In 1988, Rytter used pebbl ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
In 1983, Valiant, Skyum, Berkowitz and Racko# showed that many problems with simple O#n 3 # sequential dynamic programming solutions are in the class NC. They used straight line programs to show that these problems can be solved in O#lg 2 n# time with n 9 processors. In 1988, Rytter used pebbling games to show that these same problems can be solved on a CREW PRAM in O#lg 2 n# time with n 6 =lg n processors. Recently, Huang, Liu and Viswanathan #23# and Galil and Park #15# give algorithms that improve this processor complexityby polylog factors. Using a graph structure that is analogous to the classical dynamic programming table, this paper improves these results. First, this graph characterization leads to a polylog time and n 6 =lg n processor algorithm that solves these problems. Second, there follows a subpolylog time and sublinear processor parallel approximation algorithm for the matrix chain ordering problem. Finally, this paper presents a n 3 =lg n processor and O...
Matrix Chain Ordering in Polylog Time with n/lg n Processors
- Proceedings of the 8th Annual IEEE International Parallel Processing Symposium (IPPS), Cancun
, 1993
"... This paper gives a O(lg 4 n) time and n=lg n processor algorithm for solving the matrix chain ordering problem and for finding optimal triangulations of a convex polygon on the Common CRCW PRAM model. This algorithm works by finding shortest paths in special digraphs modeling dynamic programming t ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
This paper gives a O(lg 4 n) time and n=lg n processor algorithm for solving the matrix chain ordering problem and for finding optimal triangulations of a convex polygon on the Common CRCW PRAM model. This algorithm works by finding shortest paths in special digraphs modeling dynamic programming tables. These shortest paths are found cheaply using new and efficient techniques for exploiting monotonic problem constraints. 1 Introduction Recently, much research has gone into designing efficient parallel algorithms for problems with elementary serial dynamic programming solutions. These problems include string editing [1, 3], context free grammar recognition [22, 21], and optimal tree building [2, 19]. Polylog time parallel algorithms for solving these problems use new approaches since straightforward parallelization of sequential dynamic programming algorithms produces very slow (linear-time) parallel algorithms. Many efficient parallel algorithms designed to date rely on monotonicity...
A Work Efficient Parallel Algorithm for Constructing Huffman Codes
"... this paper, we present a work efficient PRAM CREW algorithm for constructing codes. important feature of the algorithm is its simplicity. This algorithm is a direct parallelization of Huffman's algorithm ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
this paper, we present a work efficient PRAM CREW algorithm for constructing codes. important feature of the algorithm is its simplicity. This algorithm is a direct parallelization of Huffman's algorithm
Efficient reorganization of binary search trees
- Lecture Notes in Computer Science
, 1994
"... We consider the problem of maintaining a binary search tree (BST) that minimizes the average access cost needed to satisfy randomly generated requests. We analyze scenarios in which the accesses are generated according to a vector of fixed probabilities which is unknown. Our approach is statistical. ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We consider the problem of maintaining a binary search tree (BST) that minimizes the average access cost needed to satisfy randomly generated requests. We analyze scenarios in which the accesses are generated according to a vector of fixed probabilities which is unknown. Our approach is statistical. We devise policies for modifying the tree structure dynamically, using rotations of accessed records. The aim is to produce good approximations of the optimal structure of the tree, while keeping the number of rotations as small as possible. The heuristics that we propose achieve a close approximation to the optimal BST, with lower organization costs than any previously studied. We introduce the MOVE ONCE rule. The average access cost to the tree under this rule is shown to equal the value achieved by the common rule Move to the Root (MTR). The advantage of MOVE ONCE over MTR and similar rules is that it relocates each of the items in the tree at most once. We show that the total expected cost of modifying the tree by the MOVE ONCE rule is bounded from above by 2(n +1)H n \Gamma 4n rotations (in a tree with n records), where H n is the nth harmonic number. Extensive experiments show that this value is an over-estimate, and in fact the number of rotations is linear for all the access probability vectors we tested. An approximate analysis is shown to match the experimental results, producing the expected number n \Gamma
Parallel Searching in Generalized Monge Arrays
- Algorithmica
, 1997
"... This paper investigates the parallel time and processor complexities of several searching problems involving Monge, staircase-Monge, and Monge-composite arrays. We present array-searching algorithms for concurrent-read-exclusive-write (CREW) PRAMs, hypercubes, and several hypercubic networks. All th ..."
Abstract
- Add to MetaCart
This paper investigates the parallel time and processor complexities of several searching problems involving Monge, staircase-Monge, and Monge-composite arrays. We present array-searching algorithms for concurrent-read-exclusive-write (CREW) PRAMs, hypercubes, and several hypercubic networks. All these algorithms run in near-optimal time, and their processor-time products are all within an O(lg n) factor of the worst-case sequential bounds. Several applications of these algorithms are also given. Two applications improve previous results substantially, and the others provide novel parallel algorithms for problems not previously considered.
On the Parallel Complexity of Digraph Reachability
"... We formally show that the directed graph reachability problem can be reduced to several problems using a linear number of processors; hence an efficient parallel algorithm to solve any of these problems would imply an efficient parallel algorithm for the directed graph reachability problem. This for ..."
Abstract
- Add to MetaCart
We formally show that the directed graph reachability problem can be reduced to several problems using a linear number of processors; hence an efficient parallel algorithm to solve any of these problems would imply an efficient parallel algorithm for the directed graph reachability problem. This formally establishes that all these problems are at least as hard as the s \Gamma t reachability problem. 1 Introduction Many problems are hard to solve efficiently in parallel. Despite belonging to the class NC (can be solved in poly-logarithmic time with a polynomial number of processors), they have eluded workefficient parallel algorithms that run in poly-logarithmic time. One such example is: given a directed graph, is there is a path from s to t (also called the directed graph reachability problem)? The only known poly-logarithmic time algorithms for this problem use M(n) processors on a PRAM, where M(n) is the number of processors required to do matrix multiplication in O(log n) time (...

