Results 1  10
of
11
Monotone Minimal Perfect Hashing: Searching a Sorted Table with O(1) Accesses
"... A minimal perfect hash function maps a set S of n keys into the set { 0, 1,..., n − 1} bijectively. Classical results state that minimal perfect hashing is possible in constant time using a structure occupying space close to the lower bound of log e bits per element. Here we consider the problem of ..."
Abstract

Cited by 20 (8 self)
 Add to MetaCart
A minimal perfect hash function maps a set S of n keys into the set { 0, 1,..., n − 1} bijectively. Classical results state that minimal perfect hashing is possible in constant time using a structure occupying space close to the lower bound of log e bits per element. Here we consider the problem of monotone minimal perfect hashing, in which the bijection is required to preserve the lexicographical ordering of the keys. A monotone minimal perfect hash function can be seen as a very weak form of index that provides ranking just on the set S (and answers randomly outside of S). Our goal is to minimise the description size of the hash function: we show that, for a set S of n elements out of a universe of 2 w elements, O(n log log w) bits are sufficient to hash monotonically with evaluation time O(log w). Alternatively, we can get space O(n log w) bits with O(1) query time. Both of these data structures improve a straightforward construction with O(n log w) space and O(log w) query time. As a consequence, it is possible to search a sorted table with O(1) accesses to the table (using additional O(n log log w) bits). Our results are based on a structure (of independent interest) that represents a trie in a very compact way, but admits errors. As a further application of the same structure, we show how to compute the predecessor (in the sorted order of S) of an arbitrary element, using O(1) accesses in expectation and an index of O(n log w) bits, improving the trivial result of O(nw) bits. This implies an efficient index for searching a blocked memory.
Succinct Data Structures for Retrieval and Approximate Membership
"... Abstract. The retrieval problem is the problem of associating data with keys in a set. Formally, the data structure must store a function f: U → {0, 1} r that has specified values on the elements of a given set S ⊆ U, S  = n, but may have any value on elements outside S. All known methods (e. g. ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
Abstract. The retrieval problem is the problem of associating data with keys in a set. Formally, the data structure must store a function f: U → {0, 1} r that has specified values on the elements of a given set S ⊆ U, S  = n, but may have any value on elements outside S. All known methods (e. g. those based on perfect hash functions), induce a space overhead of Θ(n) bits over the optimum, regardless of the evaluation time. We show that for any k, query time O(k) can be achieved using space that is within a factor 1 + e −k of optimal, asymptotically for large n. The time to construct the data structure is O(n), expected. If we allow logarithmic evaluation time, the additive overhead can be reduced to O(log log n) bits whp. A general reduction transfers the results on retrieval into analogous results on approximate membership, a problem traditionally addressed using Bloom filters. Thus we obtain space bounds arbitrarily close to the lower bound for this problem as well. The evaluation procedures of our data structures are extremely simple. For the results stated above we assume free access to fully random hash functions. This assumption can be justified using space o(n) to simulate full randomness on a RAM. 1
Theory and Practise of Monotone Minimal Perfect Hashing
"... Minimal perfect hash functions have been shown to be useful to compress data in several data management tasks. In particular, orderpreserving minimal perfect hash functions [12] have been used to retrieve the position of a key in a given list of keys: however, the ability to preserve any given orde ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
Minimal perfect hash functions have been shown to be useful to compress data in several data management tasks. In particular, orderpreserving minimal perfect hash functions [12] have been used to retrieve the position of a key in a given list of keys: however, the ability to preserve any given order leads to an unavoidable �(n log n) lower bound on the number of bits required to store the function. Recently, it was observed [1] that very frequently the keys to be hashed are sorted in their intrinsic (i.e., lexicographical) order. This is typically the case of dictionaries of search engines, list of URLs of web graphs, etc. We refer to this restricted version of the problem as monotone minimal perfect hashing. We analyse experimentally the data structures proposed in [1], and along our way we propose some new methods that, albeit asymptotically equivalent or worse, perform very well in practise, and provide a balance between access speed, ease of construction, and space usage. 1
Backyard Cuckoo Hashing: Constant WorstCase Operations with a Succinct Representation
, 2010
"... The performance of a dynamic dictionary is measured mainly by its update time, lookup time, and space consumption. In terms of update time and lookup time there are known constructions that guarantee constanttime operations in the worst case with high probability, and in terms of space consumption ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
The performance of a dynamic dictionary is measured mainly by its update time, lookup time, and space consumption. In terms of update time and lookup time there are known constructions that guarantee constanttime operations in the worst case with high probability, and in terms of space consumption there are known constructions that use essentially optimal space. In this paper we settle two fundamental open problems: • We construct the first dynamic dictionary that enjoys the best of both worlds: we present a twolevel variant of cuckoo hashing that stores n elements using (1+ϵ)n memory words, and guarantees constanttime operations in the worst case with high probability. Specifically, for any ϵ = Ω((log log n / log n) 1/2) and for any sequence of polynomially many operations, with high probability over the randomness of the initialization phase, all operations are performed in constant time which is independent of ϵ. The construction is based on augmenting cuckoo hashing with a “backyard ” that handles a large fraction of the elements, together with a deamortized perfect hashing scheme for eliminating the dependency on ϵ.
Optimal Sampling Algorithms for Frequency Estimation in Distributed Data
"... Abstract—Consider a distributed system with n nodes where each node holds a multiset of items. In this paper, we design sampling algorithms that allow us to estimate the global frequency of any item with a standard deviation of εN, where N denotes the total cardinality of all these multisets. Our al ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Abstract—Consider a distributed system with n nodes where each node holds a multiset of items. In this paper, we design sampling algorithms that allow us to estimate the global frequency of any item with a standard deviation of εN, where N denotes the total cardinality of all these multisets. Our algorithms have a communication cost of O(n + √ n/ε), which is never worse than the O(n + 1/ε 2) cost of uniform sampling, and could be much better when n ≪ 1/ε 2. In addition, we prove that one version of our algorithm is instanceoptimal in a fairly general sampling framework. We also design algorithms that achieve optimality on the bit level, by combining Bloom filters of various granularities. Finally, we present some simulation results comparing our algorithms with previous techniques. Other than the performance improvement, our algorithms are also much simpler and easily implementable in a largescale distributed system. I.
Some Open Questions Related to Cuckoo Hashing
"... Abstract. The purpose of this brief note is to describe recent work in the area of cuckoo hashing, including a clear description of several open problems, with the hope of spurring further research. 1 ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract. The purpose of this brief note is to describe recent work in the area of cuckoo hashing, including a clear description of several open problems, with the hope of spurring further research. 1
The Context of Coordinating Groups in Dynamic Mobile Networks
"... Abstract. Contextawareness in dynamic and unpredictable environments is a wellstudied problem, and many approaches handle sensing, understanding, and acting upon context information. Entities in these environments are not in isolation, and oftentimes the manner in which entities coordinate depends ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Abstract. Contextawareness in dynamic and unpredictable environments is a wellstudied problem, and many approaches handle sensing, understanding, and acting upon context information. Entities in these environments are not in isolation, and oftentimes the manner in which entities coordinate depends on some (implicit) notion of their shared context. In this paper, we are motivated by the need to explicitly construct notions of the context of a group that can support better coordination within the group. First we identify an efficient representation of context (both of an individual and of a group) that can be shared across wireless connections without incurring a significant communication overhead. Second we provide precise semantics for different types of groups, each with compelling use cases in these dynamic computing environments. Finally, we define and demonstrate protocols for efficiently computing groups and their context in a distributed manner. 1
Grapevine: Efficient situational awareness in pervasive computing environments
 In Proceedings of the 2012 IEEE International Conference on Pervasive Computing and Communications (Work in Progress
, 2012
"... Abstract—Many pervasive computing applications demand expressive situational awareness, which entails an entity learning detailed information about its immediate and surrounding context. Previous work has largely focused on individual entities’ context, in this paper we present Grapevine, a framewor ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract—Many pervasive computing applications demand expressive situational awareness, which entails an entity learning detailed information about its immediate and surrounding context. Previous work has largely focused on individual entities’ context, in this paper we present Grapevine, a framework for efficiently sharing context information in a localized region of a pervasive computing network, using that information to dynamically form groups defined by their shared situations, and assessing the aggregate context of that group. We provide an implementation of Grapevine and benchmark its performance in a live pervasive computing network deployment. Keywords contextawareness, network context, Bloom filters, context coordination I.
in Pervasive Computing Environments
"... Abstract—Many pervasive computing applications demand expressive situational awareness, which entails an entity in the pervasive computing environment learning detailed information about its immediate and surrounding context. Much work over the past decade focused on how to acquire and represent con ..."
Abstract
 Add to MetaCart
Abstract—Many pervasive computing applications demand expressive situational awareness, which entails an entity in the pervasive computing environment learning detailed information about its immediate and surrounding context. Much work over the past decade focused on how to acquire and represent context information. However, this work is largely egocentric, focusing on individual entities in the pervasive computing environment sensing their own context. Distributed acquisition of surrounding context information is much more challenging, largely because of the expense of communication among these resourceconstrained devices. In this paper, we present Grapevine, a framework for efficiently sharing context information in a localized region of a pervasive computing network, using that information to dynamically form groups defined by their shared situations, and assessing the aggregate context of that group. We provide an implementation of Grapevine and benchmark its performance in a live pervasive computing network deployment. Keywords contextawareness, network context, Bloom filters, context coordination I.
How to Approximate A Set Without Knowing Its Size In Advance
"... The dynamic approximate membership problem asks to represent a set S of size n, whose elements are provided in an online fashion, supporting membership queries without false negatives and with a false positive rate at most ϵ. That is, the membership algorithm must be correct on each x ∈ S, and may ..."
Abstract
 Add to MetaCart
The dynamic approximate membership problem asks to represent a set S of size n, whose elements are provided in an online fashion, supporting membership queries without false negatives and with a false positive rate at most ϵ. That is, the membership algorithm must be correct on each x ∈ S, and may err with probability at most ϵ on each x / ∈ S. We study a wellmotivated, yet insufficiently explored, variant of this problem where the size n of the set is not known in advance. Existing optimal approximate membership data structures require that the size is known in advance, but in many practical scenarios this is not a realistic assumption. Moreover, even if the eventual size n of the set is known in advance, it is desirable to have the smallest possible space usage also when the current number of inserted elements is smaller than n. Our contribution consists of the following results: • We show a superlinear gap between the space complexity when the size is known in advance and the space complexity when the size is not known in advance. When the size is known in advance, it is wellknown that Θ(n log(1/ϵ)) bits of space are necessary and sufficient (Bloom ’70, Carter et al. ’78). However, when the size is not known in advance, we prove