Results 1 - 10
of
33
Sorting in Linear Time?
, 1995
"... We show that a unit-cost RAM with a word length of w bits can sort n integers in the range 0 : : 2 w \Gamma1 in O(n log log n) time, for arbitrary w log n, a significant improvement over the bound of O(n p log n) achieved by the fusion trees of Fredman and Willard. Provided that w (log n) 2+f ..."
Abstract
-
Cited by 73 (15 self)
- Add to MetaCart
We show that a unit-cost RAM with a word length of w bits can sort n integers in the range 0 : : 2 w \Gamma1 in O(n log log n) time, for arbitrary w log n, a significant improvement over the bound of O(n p log n) achieved by the fusion trees of Fredman and Willard. Provided that w (log n) 2+ffl for some fixed ffl ? 0, the sorting can even be accomplished in linear expected time with a randomized algorithm. Both of our algorithms parallelize without loss on a unit-cost PRAM with a word length of w bits. The first one yields an algorithm that uses O(logn) time and O(n log log n) operations on a deterministic CRCW PRAM. The second one yields an algorithm that uses O(log n) expected time and O(n) expected operations on a randomized EREW PRAM, provided that w (log n) 2+ffl for some fixed ffl ? 0. Our deterministic and randomized sequential and parallel algorithms generalize to the lexicographic sorting problem of sorting multiple-precision integers represented in several words. ...
On RAM priority queues
, 1996
"... Priority queues are some of the most fundamental data structures. They are used directly for, say, task scheduling in operating systems. Moreover, they are essential to greedy algorithms. We study the complexity of priority queue operations on a RAM with arbitrary word size. We present exponential i ..."
Abstract
-
Cited by 69 (9 self)
- Add to MetaCart
Priority queues are some of the most fundamental data structures. They are used directly for, say, task scheduling in operating systems. Moreover, they are essential to greedy algorithms. We study the complexity of priority queue operations on a RAM with arbitrary word size. We present exponential improvements over previous bounds, and we show tight relations to sorting. Our first result is a RAM priority queue supporting insert and extract-min operations in worst case time O(log log n) where n is the current number of keys in the queue. This is an exponential improvement over the O( p log n) bound of Fredman and Willard from STOC'90. Our algorithm is simple, and it only uses AC 0 operations, meaning that there is no hidden time dependency on the word size. Plugging this priority queue into Dijkstra's algorithm gives an O(m log log m) algorithm for the single source shortest path problem on a graph with m edges, as compared with the previous O(m p log m) bound based on Fredman...
Sorting Selection and Routing on the Array with Reconfigurable Optical Buses
"... In this paper we present efficient algorithms for sorting, selection and packet routing on the AROB (Array with Reconfigurable Optical Buses) model. ..."
Abstract
-
Cited by 30 (4 self)
- Add to MetaCart
In this paper we present efficient algorithms for sorting, selection and packet routing on the AROB (Array with Reconfigurable Optical Buses) model.
Optimal Parallel Algorithms for Periods, Palindromes and Squares (Extended Abstract)
, 1992
"... ) Alberto Apostolico Purdue University and Universit`a di Padova Dany Breslauer yyz Columbia University Zvi Galil z Columbia University and Tel-Aviv University Summary of results Optimal concurrent-read concurrent-write parallel algorithms for two problems are presented: ffl Finding all the pe ..."
Abstract
-
Cited by 28 (13 self)
- Add to MetaCart
) Alberto Apostolico Purdue University and Universit`a di Padova Dany Breslauer yyz Columbia University Zvi Galil z Columbia University and Tel-Aviv University Summary of results Optimal concurrent-read concurrent-write parallel algorithms for two problems are presented: ffl Finding all the periods of a string. The period of a string can be computed by previous efficient parallel algorithms only if it is shorter than half of the length of the string. Our new algorithm computes all the periods in optimal O(log log n) time, even if they are longer. The algorithm can be used to compute all initial palindromes of a string within the same bounds. ffl Testing if a string is square-free. We present an optimal O(log log n) time algorithm for testing if a string is square-free, improving the previous bound of O(log n) given by Apostolico [1] and Crochemore and Rytter [12]. We show matching lower bounds for the optimal parallel algorithms that solve the problems above on a general alphab...
Ultra-fast expected time parallel algorithms
- Proc. of the 2nd SODA
, 1991
"... It has been shown previously that sorting n items into n locations with a polynomial number of processors requires Ω(log n/log log n) time. We sidestep this lower bound with the idea of Padded Sorting, or sorting n items into n + o(n) locations. Since many problems do not rely on the exact rank of s ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
It has been shown previously that sorting n items into n locations with a polynomial number of processors requires Ω(log n/log log n) time. We sidestep this lower bound with the idea of Padded Sorting, or sorting n items into n + o(n) locations. Since many problems do not rely on the exact rank of sorted items, a Padded Sort is often just as useful as an unpadded sort. Our algorithm for Padded Sort runs on the Tolerant CRCW PRAM and takes Θ(log log n/log log log n) expected time using n log log log n/log log n processors, assuming the items are taken from a uniform distribution. Using similar techniques we solve some computational geometry problems, including Voronoi Diagram, with the same processor and time bounds, assuming points are taken from a uniform distribution in the unit square. Further, we present an Arbitrary CRCW PRAM algorithm to solve the Closest Pair problem in constant expected time with n processors regardless of the distribution of points. All of these algorithms achieve linear speedup in expected time over their optimal serial counterparts. 1 Research done while at the University of Michigan and supported by an AT&T Fellowship.
Integer Priority Queues with Decrease Key in . . .
- STOC'03
, 2003
"... We consider Fibonacci heap style integer priority queues supporting insert and decrease key operations in constant time. We present a deterministic linear space solution that with n integer keys support delete in O(log log n) time. If the integers are in the range [0,N), we can also support delete i ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
We consider Fibonacci heap style integer priority queues supporting insert and decrease key operations in constant time. We present a deterministic linear space solution that with n integer keys support delete in O(log log n) time. If the integers are in the range [0,N), we can also support delete in O(log log N) time. Even for the special case of monotone priority queues, where the minimum has to be non-decreasing, the best previous bounds on delete were O((log n) 1/(3−ε) ) and O((log N) 1/(4−ε)). These previous bounds used both randomization and amortization. Our new bounds a deterministic, worst-case, with no restriction to monotonicity, and exponentially faster. As a classical application, for a directed graph with n nodes and m edges with non-negative integer weights, we get single source shortest paths in O(m + n log log n) time, or O(m + n log log C) ifC is the maximal edge weight. The later solves an open problem of Ahuja, Mehlhorn, Orlin, and
Modeling parallel bandwidth: Local vs. global restrictions
"... Recently there has been an increasing interest in models of parallel computation that account for the bandwidth limitations in communication networks. Some models (e.g., bsp and logp) account for bandwidth limitations using a per-processor parameter g> 1, such that eachpro cessor can send/receive at ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
Recently there has been an increasing interest in models of parallel computation that account for the bandwidth limitations in communication networks. Some models (e.g., bsp and logp) account for bandwidth limitations using a per-processor parameter g> 1, such that eachpro cessor can send/receive at most h messages in g h time. Other models (e.g., pram(m)) account for bandwidth limitations as an aggregate parameter m<p, such thatthe p processors can send at most m messages in total at each step. This paper provides the rst detailed study of the algorithmic implications of modeling parallel bandwidth as a per-processor (local) limitation versus an aggregate (global) limitation. We consider a number of basic problems
Randomized sorting in O(n log log n) time and linear space using addition, shift, and bit-wise boolean operations
, 1996
"... A randomized sorting algorithm is presented, doing as described in the title. 1 Introduction In this paper we consider sorting on a very simple RAM where the only word-operations are addition, shift, and bit-wise boolean operations. Besides these word-operations, we have direct and indirect addres ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
A randomized sorting algorithm is presented, doing as described in the title. 1 Introduction In this paper we consider sorting on a very simple RAM where the only word-operations are addition, shift, and bit-wise boolean operations. Besides these word-operations, we have direct and indirect addressing, jumps, and conditional statements. Such a RAM has been referred to as a Practical RAM [Mil96]. In this paper we show Theorem 1 On a Practical RAM, there is a randomized algorithm sorting n words in O(n log log n) time and linear space. The above algorithm only makes shifts by powers of two, and it only needs O(log n) random words. Our time bound matches that of the current fastest sorting algorithm by Andersson, Hagerup, Raman, and Nilsson [AHNR95]. Their algorithm has two variants: one is deterministic uses space 2 "w , where w is the word length and " is a positive constant. Thus the space is unbounded in terms of n. The other variant is randomized and uses linear space like ou...
Selection on the Reconfigurable Mesh
- Proc. Frontiers of Massively Parallel Computation
, 1992
"... Our main result is a \Theta(log n) time algorithm to select the kth smallest element in a set of n elements on a reconfigurable mesh with n processors. This improves on the previous fastest algorithm's running time by a factor of log n. We also show that some variants of this problem can be solved e ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Our main result is a \Theta(log n) time algorithm to select the kth smallest element in a set of n elements on a reconfigurable mesh with n processors. This improves on the previous fastest algorithm's running time by a factor of log n. We also show that some variants of this problem can be solved even faster. First we show that a good approximation to the median of n elements can be found in \Theta(log log n) time. This can be used to solve two-dimensional linear programming over n equations in \Theta(log n log log n) time, an improvement of log n= log log n time over the previous fastest algorithm. Next, we show that, for any constant ffl ? 0, selecting the kth smallest element in a set of n 1\Gammaffl elements evenly spaced throughout the mesh can be done in constant time. We also show that one can select the kth smallest element from n b-bit words in \Theta((b= log b) maxflog n \Gamma log b; 1g) time, which implies that if the elements come from a polynomial range, one can...
Feasible Time-Optimal Algorithms for Boolean Functions on Exclusive-Write PRAMs
, 1994
"... It was shown some years ago that the computation time for many important Boolean functions of n arguments on concurrent-read exclusive-write parallel random-access machines (CREW PRAMs) of unlimited size is at least '(n) 0:72 log 2 n. On the other hand, it is known that every Boolean function of n ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
It was shown some years ago that the computation time for many important Boolean functions of n arguments on concurrent-read exclusive-write parallel random-access machines (CREW PRAMs) of unlimited size is at least '(n) 0:72 log 2 n. On the other hand, it is known that every Boolean function of n arguments can be computed in '(n) + 1 steps on a CREW PRAM with n \Delta 2 n\Gamma1 processors and memory cells. In the case of the OR of n bits, n processors and cells are sufficient. In this paper it is shown that for many important functions there are CREW PRAM algorithms that almost meet the lower bound in that they take '(n) + o(log n) steps, but use only a small number of processors and memory cells (in most cases, n). In addition, the cells only have to store binary words of bounded length (in most cases, length 1). We call such algorithms "feasible". The functions concerned include: the PARITY function and, more generally, all symmetric functions; a large class of Boolean formulas...

