Results 1 - 10
of
14
External Memory Algorithms and Data Structures
, 1998
"... Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we survey the ..."
Abstract
-
Cited by 286 (24 self)
- Add to MetaCart
Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we survey the state of the art in the design and analysis of external memory algorithms and data structures (which are sometimes referred to as "EM" or "I/O" or "out-of-core" algorithms and data structures). EM algorithms and data structures are often designed and analyzed using the parallel disk model (PDM). The three machine-independent measures of performance in PDM are the number of I/O operations, the CPU time, and the amount of disk space. PDM allows for multiple disks (or disk arrays) and parallel CPUs, and it can be generalized to handle tertiary storage and hierarchical memory. We discuss several important paradigms for how to solve batched and online problems efficiently in external memory. Programming tools and environments are available for simplifying the programming task. The TPIE system (Transparent Parallel I/O programming Environment) is both easy to use and efficient in terms of execution speed. We report on some experiments using TPIE in the domain of spatial databases. The newly developed EM algorithms and data structures that incorporate the paradigms we discuss are significantly faster than methods currently used in practice.
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets
- In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA
"... We consider the indexable dictionary problem, which consists of storing a set S ⊆ {0,...,m − 1} for some integer m, while supporting the operations of rank(x), which returns the number of elements in S that are less than x if x ∈ S, and −1 otherwise; and select(i) which returns the i-th smallest ele ..."
Abstract
-
Cited by 149 (5 self)
- Add to MetaCart
We consider the indexable dictionary problem, which consists of storing a set S ⊆ {0,...,m − 1} for some integer m, while supporting the operations of rank(x), which returns the number of elements in S that are less than x if x ∈ S, and −1 otherwise; and select(i) which returns the i-th smallest element in S. We give a data structure that supports both operations in O(1) time on the RAM model and requires B(n,m)+ o(n)+O(lg lg m) bits to store a set of size n, where B(n,m) = ⌈ lg ( m) ⌉ n is the minimum number of bits required to store any n-element subset from a universe of size m. Previous dictionaries taking this space only supported (yes/no) membership queries in O(1) time. In the cell probe model we can remove the O(lg lg m) additive term in the space bound, answering a question raised by Fich and Miltersen, and Pagh. We present extensions and applications of our indexable dictionary data structure, including: • an information-theoretically optimal representation of a k-ary cardinal tree that supports standard operations in constant time, • a representation of a multiset of size n from {0,...,m − 1} in B(n,m+n) + o(n) bits that supports (appropriate generalizations of) rank and select operations in constant time, and • a representation of a sequence of n non-negative integers summing up to m in B(n,m + n) + o(n) bits that supports prefix sum queries in constant time. 1
On RAM priority queues
, 1996
"... Priority queues are some of the most fundamental data structures. They are used directly for, say, task scheduling in operating systems. Moreover, they are essential to greedy algorithms. We study the complexity of priority queue operations on a RAM with arbitrary word size. We present exponential i ..."
Abstract
-
Cited by 69 (9 self)
- Add to MetaCart
Priority queues are some of the most fundamental data structures. They are used directly for, say, task scheduling in operating systems. Moreover, they are essential to greedy algorithms. We study the complexity of priority queue operations on a RAM with arbitrary word size. We present exponential improvements over previous bounds, and we show tight relations to sorting. Our first result is a RAM priority queue supporting insert and extract-min operations in worst case time O(log log n) where n is the current number of keys in the queue. This is an exponential improvement over the O( p log n) bound of Fredman and Willard from STOC'90. Our algorithm is simple, and it only uses AC 0 operations, meaning that there is no hidden time dependency on the word size. Plugging this priority queue into Dijkstra's algorithm gives an O(m log log m) algorithm for the single source shortest path problem on a graph with m edges, as compared with the previous O(m p log m) bound based on Fredman...
Optimal Bounds for the Predecessor Problem
- In Proceedings of the Thirty-First Annual ACM Symposium on Theory of Computing
"... We obtain matching upper and lower bounds for the amount of time to find the predecessor of a given element among the elements of a fixed efficiently stored set. Our algorithms are for the unit-cost word-level RAM with multiplication and extend to give optimal dynamic algorithms. The lower bounds ar ..."
Abstract
-
Cited by 57 (0 self)
- Add to MetaCart
We obtain matching upper and lower bounds for the amount of time to find the predecessor of a given element among the elements of a fixed efficiently stored set. Our algorithms are for the unit-cost word-level RAM with multiplication and extend to give optimal dynamic algorithms. The lower bounds are proved in a much stronger communication game model, but they apply to the cell probe and RAM models and to both static and dynamic predecessor problems.
Optimal Bounds for the Predecessor Problem and Related Problems
- Journal of Computer and System Sciences
, 2001
"... We obtain matching upper and lower bounds for the amount of time to find the predecessor of a given element among the elements of a fixed compactly stored set. Our algorithms are for the unit-cost word RAM with multiplication and are extended to give dynamic algorithms. The lower bounds are proved ..."
Abstract
-
Cited by 44 (0 self)
- Add to MetaCart
We obtain matching upper and lower bounds for the amount of time to find the predecessor of a given element among the elements of a fixed compactly stored set. Our algorithms are for the unit-cost word RAM with multiplication and are extended to give dynamic algorithms. The lower bounds are proved for a large class of problems, including both static and dynamic predecessor problems, in a much stronger communication game model, but they apply to the cell probe and RAM models.
Lower bounds for Union-Split-Find related problems on random access machines
, 1994
"... We prove \Omega\Gamma p log log n) lower bounds on the random access machine complexity of several dynamic, partially dynamic and static data structure problems, including the union-split-find problem, dynamic prefix problems and onedimensional range query problems. The proof techniques include a ..."
Abstract
-
Cited by 41 (3 self)
- Add to MetaCart
We prove \Omega\Gamma p log log n) lower bounds on the random access machine complexity of several dynamic, partially dynamic and static data structure problems, including the union-split-find problem, dynamic prefix problems and onedimensional range query problems. The proof techniques include a general technique using perfect hashing for reducing static data structure problems (with a restriction of the size of the structure) into partially dynamic data structure problems (with no such restriction), thus providing a way to transfer lower bounds. We use a generalization of a method due to Ajtai for proving the lower bounds on the static problems, but describe the proof in terms of communication complexity, revealing a striking similarity to the proof used by Karchmer and Wigderson for proving lower bounds on the monotone circuit depth of connectivity. 1 Introduction and summary of results In this paper we give lower bounds for the complexity of implementing several dynamic and sta...
On the Cell Probe Complexity of Polynomial Evaluation
, 1995
"... We consider the cell probe complexity of the polynomial evaluation problem with preprocessing of coefficients, for polynomials of degree at most n over a finite field K. We show that the trivial cell probe algorithm for the problem is optimal if K is sufficiently large compared to n. As an applicati ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
We consider the cell probe complexity of the polynomial evaluation problem with preprocessing of coefficients, for polynomials of degree at most n over a finite field K. We show that the trivial cell probe algorithm for the problem is optimal if K is sufficiently large compared to n. As an application, we give a new proof of the fact that P 6= incr-TIME(o(log n= log log n)). 1 Introduction Let K be a field. We consider the polynomial evaluation problem with preprocessing of coefficients. This problem is as follows: Given a polynomial f(X) 2 K[X], preprocess it, so that later, for any field element a 2 K, f(a) can be computed efficiently. It is a classical problem in the theory of algebraic complexity and has been intensively investigated in the model of arithmetic straight line programs. In this model, a solution for the polynomials of degree at most n is given by two objects: ffl A map OE from the set of polynomials of degree at most n into K s , where s is any integer, called t...
Rank-sensitive data structures
- In Proc. 12th International Symposium on String Processing and Information Retrieval (SPIRE), LNCS v. 3772
, 2005
"... Abstract. Output-sensitive data structures result from preprocessing n items and are capable of reporting the items satisfying an on-line query in O(t(n) + ℓ) time, where t(n) is the cost of traversing the structure and ℓ ≤ n is the number of reported items satisfying the query. In this paper we foc ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Abstract. Output-sensitive data structures result from preprocessing n items and are capable of reporting the items satisfying an on-line query in O(t(n) + ℓ) time, where t(n) is the cost of traversing the structure and ℓ ≤ n is the number of reported items satisfying the query. In this paper we focus on rank-sensitive data structures, which are additionally given a ranking of the n items, so that just the top k best-ranking items should be reported at query time, sorted in rank order, at a cost of O(t(n) + k) time. Note that k is part of the query as a parameter under the control of the user (as opposed to ℓ which is query-dependent). We explore the problem of adding rank-sensitivity to data structures such as suffix trees or range trees, where the ℓ items satisfying the query form O(polylog(n)) intervals of consecutive entries from which we choose the top k best-ranking ones. Letting s(n) be the number of items (including their copies) stored in the original data structures, we increase the space by an additional term of O(s(n) lg ǫ n) memory words of space, each of O(lg n) bits, for any positive constant ǫ < 1. We allow for changing the ranking on the fly during the lifetime of the data structures, with ranking values in 0... O(n). In this case, query time becomes O(t(n)+k) plus O(lg n/lg lg n) per interval; each change in the ranking and each insertion/deletion of an item takes O(lg n) time; the additional term in space occupancy increases to O(s(n) lg n/lg lg n). 1
Orthogonal Range Searching on the RAM, Revisited
, 2011
"... We present a number of new results on one of the most extensively studied topics in computational geometry, orthogonal range searching. All our results are in the standard word RAM model: 1. We present two data structures for 2-d orthogonal range emptiness. The first achieves O(n lg lg n) space and ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
We present a number of new results on one of the most extensively studied topics in computational geometry, orthogonal range searching. All our results are in the standard word RAM model: 1. We present two data structures for 2-d orthogonal range emptiness. The first achieves O(n lg lg n) space and O(lg lg n) query time, assuming that the n given points are in rank space. This improves the previous results by Alstrup, Brodal, and Rauhe (FOCS’00), with O(n lg ε n) space and O(lg lg n) query time, or with O(n lg lg n) space and O(lg 2 lg n) query time. Our second data structure uses O(n) space and answers queries in O(lg ε n) time. The best previous O(n)-space data structure, due to Nekrich (WADS’07), answers queries in O(lg n / lg lg n) time. 2. We give a data structure for 3-d orthogonal range reporting with O(n lg 1+ε n) space and O(lg lg n+ k) query time for points in rank space, for any constant ε> 0. This improves the previous results by Afshani (ESA’08), Karpinski and Nekrich (COCOON’09), and Chan (SODA’11), with O(n lg 3 n) space and O(lg lg n + k) query time, or with O(n lg 1+ε n) space and O(lg 2 lg n + k) query time. Consequently, we obtain improved upper bounds for orthogonal range reporting in all constant dimensions above 3.
On Searching Sorted Lists: A Near-Optimal Lower Bound
, 1997
"... We obtain improved lower bounds for a class of static and dynamic data structure problems that includes several problems of searching sorted lists as special cases. These lower bounds nearly match the upper bounds given by recent striking improvements in searching algorithms given by Fredman and Wil ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We obtain improved lower bounds for a class of static and dynamic data structure problems that includes several problems of searching sorted lists as special cases. These lower bounds nearly match the upper bounds given by recent striking improvements in searching algorithms given by Fredman and Willard's fusion trees [9] and Andersson's search data structure [5]. Thus they show sharp limitations on the running time improvements obtainable using the unit-cost word-level RAM operations that those algorithms employ. 1 Introduction Traditional analysis of problems such as sorting and searching is often schizophrenic in dealing with the operations one is permitted to perform on the input data. In one view, the elements being sorted are seen as abstract objects which may only be compared. In the other view, one is able to perform certain word-level operations, such as indirect addressing using the elements themselves, in algorithms like bucket and radix sorting. Traditionally, the second v...

