Results 11  20
of
30
MORE HASTE, LESS WASTE: LOWERING THE REDUNDANCY IN FULLY INDEXABLE DICTIONARIES
, 2009
"... We consider the problem of representing, in a compressed format, a bitvector S of m bits with n 1s, supporting the following operations, where b ∈ {0,1}: • rankb(S, i) returns the number of occurrences of bit b in the prefix S [1..i]; • selectb(S, i) returns the position of the ith occurrence of bi ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
We consider the problem of representing, in a compressed format, a bitvector S of m bits with n 1s, supporting the following operations, where b ∈ {0,1}: • rankb(S, i) returns the number of occurrences of bit b in the prefix S [1..i]; • selectb(S, i) returns the position of the ith occurrence of bit b in S. Such a data structure is called fully indexable dictionary (fid) [Raman, Raman, and Rao, 2007], and is at least as powerful as predecessor data structures. Viewing S as a set X = {x1, x2,..., xn} of n distinct integers drawn from a universe [m] = {1,..., m}, the predecessor of integer y ∈ [m] in X is given by select1(S,rank1(S, y − 1)). fids have many applications in succinct and compressed data structures, as they are often involved in the construction of succinct representation for a variety of abstract data types. Our focus is on spaceefficient fids on the ram model with word size Θ(lg m) and constant time for all operations, so that the time cost is independent of the input size. Given the bitstring S to be encoded, having length m and containing n ones, the minimal amount of information that needs to be stored is B(n, m) = ⌈log ` ´ m ⌉. The n state of the art in building a fid for S is given in [Pǎtra¸scu, 2008] using B(m, n) + O(m/((log m/t) t)) + O(m 3/4) bits, to support the operations in O(t) time. Here, we propose a parametric data structure exhibiting a time/space tradeoff such that, for any real constants 0 < δ ≤ 1/2, 0 < ε ≤ 1, and integer s> 0, it uses
The limits of buffering: A tight lower bound for dynamic membership in the external memory model
 In Proc. ACM Symposium on Theory of Computing
, 2010
"... We study the dynamic membership (or dynamic dictionary) problem, which is one of the most fundamental problems in data structures. We study the problem in the external memory model with cell size b bits and cache size m bits. We prove that if the amortized cost of updates is at most 0.999 (or any ot ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
We study the dynamic membership (or dynamic dictionary) problem, which is one of the most fundamental problems in data structures. We study the problem in the external memory model with cell size b bits and cache size m bits. We prove that if the amortized cost of updates is at most 0.999 (or any other constant < 1), then the query cost must be Ω(logb log n (n/m)), where n is the number of elements in the dictionary. In contrast, when the update time is allowed to be 1 + o(1), then a bit vector or hash table give query time O(1). Thus, this is a threshold phenomenon for data structures. This lower bound answers a folklore conjecture of the external memory community. Since almost any data structure task can solve membership, our lower bound implies a dichotomy between two alternatives: (i) make the amortized update time at least 1 (so the data structure does not buffer, and we lose one of the main potential advantages of the cache), or (ii) make the query time at least roughly logarithmic in n. Our result holds even when the updates and queries are chosen uniformly at random and there are no deletions; it holds for randomized data structures, holds when the universe size is O(n), and does not make any restrictive assumptions such as indivisibility. All of the lower bounds we prove hold regardless of the space consumption of the data structure, while the upper bounds only need linear space. The lower bound has some striking implications for external memory data structures. It shows that the query complexities of many problems such as 1Drange counting, predecessor, rankselect, and many others, are all the same
EntropyBounded Representation of Point Grids
"... Abstract. We give the first fully compressed representation of a set of m points on an n×n grid, taking H +o(H) bits of space, where H = lg ( n 2) m is the entropy of the set. This representation supports range counting, range reporting, and point selection queries, with a performance that is compar ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Abstract. We give the first fully compressed representation of a set of m points on an n×n grid, taking H +o(H) bits of space, where H = lg ( n 2) m is the entropy of the set. This representation supports range counting, range reporting, and point selection queries, with a performance that is comparable to that of uncompressed structures and that improves upon the only previous compressed structure. Operating within entropybounded space opens a new line of research on an otherwise wellstudied area, and is becoming extremely important for handling large datasets. 1
Connectivity Oracles for Failure Prone Graphs ∗
"... Dynamic graph connectivity algorithms have been studied for many years, but typically in the most general possible setting, where the graph can evolve in completely arbitrary ways. In this paper we consider a dynamic subgraph model. We assume there is some fixed, underlying graph that can be preproc ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Dynamic graph connectivity algorithms have been studied for many years, but typically in the most general possible setting, where the graph can evolve in completely arbitrary ways. In this paper we consider a dynamic subgraph model. We assume there is some fixed, underlying graph that can be preprocessed ahead of time. The graph is subject only to vertices and edges flipping “off ” (failing) and “on ” (recovering), where queries naturally apply to the subgraph on edges/vertices currently flipped on. This model fits most real world scenarios, where the topology of the graph in question (say a router network or road network) is constantly evolving due to temporary failures but never deviates too far from the ideal failurefree state. We present the first efficient connectivity oracle for graphs susceptible to vertex failures. Given vertices u and v and a set D of d failed vertices, we can determine if there is a path from u to v avoiding D in time polynomial in d log n. There is a tradeoff in our oracle between the space, which is roughly mn ɛ, for 0 < ɛ ≤ 1, and the polynomial query time, which depends on ɛ. If one wanted to achieve the same functionality with existing data structures (based on edge failures or twin vertex failures) the resulting connectivity oracle would either need exorbitant space (Ω(n d)) or update time Ω(dn), that is, linear in the number of vertices. Our connectivity oracle is therefore the first of its kind. As a byproduct of our oracle for vertex failures we reduce the problem of constructing an edgefailure oracle to 2D range searching over the integers. We show there is an Õ(m)space oracle that processes any set of d failed edges in O(d 2 log log n) time and, thereafter, answers connectivity queries in O(log log n) time. Our update time is exponentially faster than a recent connectivity oracle of Pǎtra¸scu and Thorup for bounded d, but slower as a function of d.
Higher cell probe lower bounds for evaluating polynomials
 In Proc. 53rd IEEE Symposium on Foundations of Computer Science
, 2012
"... Abstract—In this paper, we study the cell probe complexity of evaluating an ndegree polynomial P over a finite field F of size at least n 1+Ω(1). More specifically, we show that any static data structure for evaluating P (x), where x ∈ F, must use Ω(lg F / lg(Sw/n lg F)) cell probes to answer a ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Abstract—In this paper, we study the cell probe complexity of evaluating an ndegree polynomial P over a finite field F of size at least n 1+Ω(1). More specifically, we show that any static data structure for evaluating P (x), where x ∈ F, must use Ω(lg F / lg(Sw/n lg F)) cell probes to answer a query, where S denotes the space of the data structure in number of cells and w the cell size in bits. This bound holds in expectation for randomized data structures with any constant error probability δ < 1/2. Our lower bound not only improves over the Ω(lg F / lg S) lower bound of Miltersen [TCS’95], but is in fact the highest static cell probe lower bound to date: For linear space (i.e. S = O(n lg F/w)), our query time lower bound simplifies to Ω(lg F), whereas the highest previous lower bound for any static data structure problem having d different queries is Ω(lg d / lg lg d), which was first achieved by Pǎtras¸cu and Thorup [SICOMP’10]. We also use the recent technique of Larsen [STOC’12] to show a lower bound of tq = Ω(lg F  lg n / lg(wtu / lg F) lg(wtu)) for dynamic data structures for polynomial evaluation over a finite field F of size Ω(n 2). Here tq denotes the expected query time and tu the worst case update time. This lower bound holds for randomized data structures with any constant error probability δ < 1/2. This is only the second time a lower bound beyond max{tu, tq} = Ω(max{lg n, lg d / lg lg d}) has been achieved for dynamic data structures, where d denotes the number of different queries and updates to the problem. Furthermore, it is the first such lower bound that holds for randomized data structures with a constant probability of error. Keywordscell probe model, lower bounds, data structures, polynomials I.
Nearoptimal range reporting structures for categorical data
 In Proc. 24th ACM/SIAM Symposium on Discrete Algorithms
, 2013
"... Range reporting on categorical (or colored) data is a wellstudied generalization of the classical range reporting problem in which each of the N input points has an associated color (category). A query then asks to report the set of colors of the points in a given rectangular query range, which may ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Range reporting on categorical (or colored) data is a wellstudied generalization of the classical range reporting problem in which each of the N input points has an associated color (category). A query then asks to report the set of colors of the points in a given rectangular query range, which may be far smaller than the set of all points in the query range. We study twodimensional categorical range reporting in both the wordRAM and I/Omodel. For the I/Omodel, we present two alternative data structures for threesided queries. The first answers queries in optimal O(lgB N + K/B) I/Os using O(N lg ∗ N) space, where K is the number of distinct colors in the output, B is the disk block size, and lg ∗ N is the iterated logarithm of N. Our second data structure uses linear space and answers queries in O(lgB N + lg (h) N + K/B) I/Os for any constant integer h ≥ 1. Here lg (1) N = lg N and lg (h) N = lg(lg (h−1) N) when h> 1. Both solutions use only comparisons on the coordinates. We also show that the lgB N terms in the query costs can be reduced to optimal lg lgB U when the input points lie on a U × U grid and we allow wordlevel manipulations of the coordinates. We further reduce the query time to just O(1) if the points are given on an N × N grid. Both solutions also lead to improved data structures for foursided queries. For the wordRAM, we obtain optimal data structures for threesided range reporting, as well as improved upper bounds for foursided range reporting. Finally, we show a tight lower bound on onedimensional categorical range counting using an elegant reduction from (standard) twodimensional range counting. 1
Sorted Range Reporting
"... Abstract. In this paper we consider a variant of the orthogonal range reporting problem when all points should be reported in the sorted order of their xcoordinates. We show that reporting twodimensional points with this additional condition can be organized (almost) as efficiently as the standard ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract. In this paper we consider a variant of the orthogonal range reporting problem when all points should be reported in the sorted order of their xcoordinates. We show that reporting twodimensional points with this additional condition can be organized (almost) as efficiently as the standard range reporting. Moreover, our results generalize and improve the previously known results for the orthogonal range successor problem and can be used to obtain better solutions for some stringology problems. 1
Efficient IP table lookup via adaptive stratified trees with selective reconstructions. 12th European Symp
 on Algorithms
"... IP address lookup is a critical operation for high bandwidth routers in packet switching networks such as Internet. The lookup is a nontrivial operation since it requires searching for the longest prefix, among those stored in a (large) given table, matching the IP address. Ever increasing routing ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
IP address lookup is a critical operation for high bandwidth routers in packet switching networks such as Internet. The lookup is a nontrivial operation since it requires searching for the longest prefix, among those stored in a (large) given table, matching the IP address. Ever increasing routing tables size, traffic volume and links speed demand new and more efficient algorithms. Moreover, the imminent move to IPv6 128bit addresses will soon require a rethinking of previous technical choices. This article describes a the new data structure for solving the IP table look up problem christened the Adaptive Stratified Tree (AST). The proposed solution is based on casting the problem in geometric terms and on repeated application of efficient local geometric optimization routines. Experiments with this approach have shown that in terms of storage, query time and update time the AST is at a par with state of the art algorithms based on data compression or string manipulations (and often it is better on some of the measured quantities).
A Note on Predecessor Searching in the Pointer Machine Model
, 2009
"... Predecessor searching is a fundamental data structuring problem and at the core of countless algorithms: given a totally ordered universe U with n elements, maintain a subset S ⊆ U such that for each element x ∈ U its predecessor in S can be found efficiently. During the last thirty years the proble ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Predecessor searching is a fundamental data structuring problem and at the core of countless algorithms: given a totally ordered universe U with n elements, maintain a subset S ⊆ U such that for each element x ∈ U its predecessor in S can be found efficiently. During the last thirty years the problem has been studied extensively and optimal algorithms in many classical models of computation are known. In 1988, Mehlhorn, Näher, and Alt [1] showed an amortized lower bound of Ω(log log n) in the pointer machine model. We give a different proof for this bound which sheds new light on the question of how much power the adversary actually needs.
Complexity of UnionSplitFind Problems
, 2007
"... In this thesis, we investigate various interpretations of the UnionSplitFind problem, an extension of the classic UnionFind problem. In the UnionSplitFind problem, we maintain disjoint sets of ordered elements subject to the operations of constructing singleton sets, merging two sets together, ..."
Abstract
 Add to MetaCart
In this thesis, we investigate various interpretations of the UnionSplitFind problem, an extension of the classic UnionFind problem. In the UnionSplitFind problem, we maintain disjoint sets of ordered elements subject to the operations of constructing singleton sets, merging two sets together, splitting a set by partitioning it around a specified value, and finding the set that contains a given element. The different interpretations of this problem arise from the different assumptions made regarding when sets can be merged and any special properties the sets may have. We define and analyze the Interval, Cyclic, Ordered, and General UnionSplitFind problems. Previous work implies optimal solutions to the Interval and Ordered UnionSplitFind problems and an Ω(log n / log log n) lower bound for the Cyclic UnionSplitFind problem in the cellprobe model. We present a new data