Results 1  10
of
10
An optimal bloom filter replacement based on matrix solving
 In CSR
, 2009
"... We suggest a method for holding a dictionary data structure, which maps keys to values, in the spirit of Bloom Filters. The space requirements of the dictionary we suggest are much smaller than those of a hashtable. We allow storing n keys, each mapped to value which is a string of k bits. Our sugge ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
We suggest a method for holding a dictionary data structure, which maps keys to values, in the spirit of Bloom Filters. The space requirements of the dictionary we suggest are much smaller than those of a hashtable. We allow storing n keys, each mapped to value which is a string of k bits. Our suggested method requires nk + o(n) bits space to store the dictionary, and O(n) time to produce the data structure, and allows answering a membership query in O(1) memory probes. The dictionary size does not depend on the size of the keys. However, reducing the space requirements of the data structure comes at a certain cost. Our dictionary has a small probability of a one sided error. When attempting to obtain the value for a key that is stored in the dictionary we always get the correct answer. However, when testing for membership of an element that is not stored in the dictionary, we may get an incorrect answer, and when requesting the value of such an element we may get a certain random value. Our method is based on solving equations in GF(2 k) and using several hash functions. Another significant advantage of our suggested method is that we do not require using sophisticated hash functions. We only require pairwise independent hash functions. We also suggest a data structure that requires only nk bits space, has O(n 2) preprocessing time, and has a O(log n) query time. However, this data structures requires a uniform hash functions. In order replace a Bloom Filter of n elements with an error proability of 2 −k, we require nk + o(n) memory bits, O(1) query time, O(n) preprocessing time, and only pairwise independent hash function. Even the most advanced previously known Bloom Filter would require nk +O(n) space, and a uniform hash functions, so our method is significantly less space consuming especially when k is small. Our suggested dictionary can replace Bloom Filters, and has many applications. A few application examples are dictionaries for storing bad passwords, differential files in databases, Internet caching and distributed storage systems. 1 1
Bloom maps
 In Proceedings of the Fourth Workshop on Analytic Algorithmics and Combinatorics (ANALCO). Society for Industrial and Applied Mathematics
, 2008
"... We consider the problem of succinctly encoding a static map to support approximate queries. We derive upper and lower bounds on the space requirements in terms of the error rate and the entropy of the distribution of values over keys: our bounds differ by a factor log e. For the upper bound we intro ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
We consider the problem of succinctly encoding a static map to support approximate queries. We derive upper and lower bounds on the space requirements in terms of the error rate and the entropy of the distribution of values over keys: our bounds differ by a factor log e. For the upper bound we introduce a novel data structure, the Bloom map, generalising the Bloom filter to this problem. The lower bound follows from an information theoretic argument. 1
Rewiring unstructured p2p networks using bloom filters to optimize recall
, 2006
"... While structured P2P networks are very efficient for keybased lookup, they are less suitable for keyword search. On the other hand, unstructured ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
While structured P2P networks are very efficient for keybased lookup, they are less suitable for keyword search. On the other hand, unstructured
Succincter
"... We can represent an array of n values from {0, 1, 2} using ⌈n log 2 3 ⌉ bits (arithmetic coding), but then we cannot retrieve a single element efficiently. Instead, we can encode every block of t elements using ⌈t log 2 3 ⌉ bits, and bound the retrieval time by t. This gives a linear tradeoff betwe ..."
Abstract
 Add to MetaCart
We can represent an array of n values from {0, 1, 2} using ⌈n log 2 3 ⌉ bits (arithmetic coding), but then we cannot retrieve a single element efficiently. Instead, we can encode every block of t elements using ⌈t log 2 3 ⌉ bits, and bound the retrieval time by t. This gives a linear tradeoff between the redundancy of the representation and the query time. In fact, this type of linear tradeoff is ubiquitous in known succinct data structures, and in data compression. The folk wisdom is that if we want to waste one bit per block, the encoding is so constrained that it cannot help the query in any way. Thus, the only thing a query can do is to read the entire block and unpack it. We break this limitation and show how to use recursion to improve redundancy. It turns out that if a block is encoded with two (!) bits of redundancy, we can decode a single element, and answer many other interesting queries, in time logarithmic in the block size. Our technique allows us to revisit classic problems in succinct data structures, and give surprising new upper bounds. We also construct a locallydecodable version of arithmetic coding.
Theory and Practice of Monotone Minimal Perfect Hashing DJAMAL BELAZZOUGUI
"... Minimal perfect hash functions have been shown to be useful to compress data in several data management tasks. In particular, orderpreserving minimal perfect hash functions [12] have been used to retrieve the position of a key in a given list of keys: however, the ability to preserve any given orde ..."
Abstract
 Add to MetaCart
Minimal perfect hash functions have been shown to be useful to compress data in several data management tasks. In particular, orderpreserving minimal perfect hash functions [12] have been used to retrieve the position of a key in a given list of keys: however, the ability to preserve any given order leads to an unavoidable.n log n / lower bound on the number of bits required to store the function. Recently, it was observed [1] that very frequently the keys to be hashed are sorted in their intrinsic (i.e., lexicographical) order. This is typically the case of dictionaries of search engines, list of URLs of web graphs, etc. We refer to this restricted version of the problem as monotone minimal perfect hashing. We analyse experimentally the data structures proposed in [1], and along our way we propose some new methods that, albeit asymptotically equivalent or worse, perform very well in practice, and provide a balance between access speed, ease of construction, and space usage. 1
Astrometry.net: Automatic recognition and calibration of astronomical images Dustin Lang
, 2009
"... Astrometry.net: Automatic recognition and calibration of astronomical images by ..."
Abstract
 Add to MetaCart
Astrometry.net: Automatic recognition and calibration of astronomical images by
The Bitwise Bloom Filter
, 2007
"... We present the Bitwise Bloom Filter, a data structure for maintaining counts for a large number of items. The bitwise filter is an extension of the Bloom filter, a spaceefficient data structure for storing a large set efficiently by discarding the identity of the items being held while still being ..."
Abstract
 Add to MetaCart
We present the Bitwise Bloom Filter, a data structure for maintaining counts for a large number of items. The bitwise filter is an extension of the Bloom filter, a spaceefficient data structure for storing a large set efficiently by discarding the identity of the items being held while still being able to determine whether it is in the set or not, with high probability. We show how this idea can be extended to maintaining counts of items by maintaining a separate Bloom filter for every position in the bit representations of all the counts. We give both theoretical analysis of the accuracy of the Bitwise filter together with validation via experiments on real network data. 1
On Dynamic Range Reporting in One Dimension Christian Worm Mortensen ∗ IT U. Copenhagen
, 2005
"... We consider the problem of maintaining a dynamic set of integers and answering queries of the form: report a point (equivalently, all points) in a given interval. Range searching is a natural and fundamental variant of integer search, and can be solved using predecessor search. However, for a RAM wi ..."
Abstract
 Add to MetaCart
We consider the problem of maintaining a dynamic set of integers and answering queries of the form: report a point (equivalently, all points) in a given interval. Range searching is a natural and fundamental variant of integer search, and can be solved using predecessor search. However, for a RAM with wbit words, we show how to perform updates in O(lg w) time and answer queries in O(lg lg w) time. The update time is identical to the van Emde Boas structure, but the query time is exponentially faster. Existing lower bounds show that achieving our query time for predecessor search requires doublyexponentially slower updates. We present some arguments supporting the conjecture that our solution is optimal. Our solution is based on a new and interesting recursion idea which is “more extreme” that the van Emde Boas recursion. Whereas van Emde Boas uses a simple recursion (repeated halving) on each path in a trie, we use a nontrivial, van Emde Boaslike recursion on every such path. Despite this, our algorithm is quite clean when seen from the right angle. To achieve linear space for our data structure, we solve a problem which is of independent interest. We develop the first scheme for dynamic perfect hashing requiring sublinear space. This gives a dynamic Bloomier filter (an approximate storage scheme for sparse vectors) which uses low space. We strengthen previous lower bounds to show that these results are optimal. 1
Path Query Routing in Unstructured PeertoPeer Networks
, 2010
"... Abstract. In this article, we introduce a way to distribute an index database of XML documents on an unstructured peertopeer network with a flat topology (i.e. with no superpeer). We then show how to perform content path query routing in such networks. Nodes in the network maintain a set of Multi ..."
Abstract
 Add to MetaCart
Abstract. In this article, we introduce a way to distribute an index database of XML documents on an unstructured peertopeer network with a flat topology (i.e. with no superpeer). We then show how to perform content path query routing in such networks. Nodes in the network maintain a set of Multi Level Bloom Filters that summarises structural properties of XML documents. They propagate part of this information to their neighbor nodes, allowing efficient path query routing in the peertopeer network, as shown by the evaluation tests presented. 1