Results 1  10
of
24
LOW REDUNDANCY IN STATIC DICTIONARIES WITH CONSTANT QUERY TIME
 SIAM J. COMPUT.
, 2001
"... A static dictionary is a data structure storing subsets of a finite universe U, answering membership queries. We show that on a unit cost RAM with word size Θ(log U), a static dictionary for nelement sets with constant worst case query time can be obtained using B +O(log log U)+o(n) (U) bits ..."
Abstract

Cited by 50 (7 self)
 Add to MetaCart
A static dictionary is a data structure storing subsets of a finite universe U, answering membership queries. We show that on a unit cost RAM with word size Θ(log U), a static dictionary for nelement sets with constant worst case query time can be obtained using B +O(log log U)+o(n) (U) bits of storage, where B = ⌈log2 ⌉ is the minimum number of bits needed to represent all nn element subsets of U.
Logarithmic lower bounds in the cellprobe model
 SIAM Journal on Computing
"... Abstract. We develop a new technique for proving cellprobe lower bounds on dynamic data structures. This enables us to prove Ω(lg n) bounds, breaking a longstanding barrier of Ω(lg n/lg lg n). We can also prove the first Ω(lgB n) lower bound in the external memory model, without assumptions on the ..."
Abstract

Cited by 34 (4 self)
 Add to MetaCart
Abstract. We develop a new technique for proving cellprobe lower bounds on dynamic data structures. This enables us to prove Ω(lg n) bounds, breaking a longstanding barrier of Ω(lg n/lg lg n). We can also prove the first Ω(lgB n) lower bound in the external memory model, without assumptions on the data structure. We use our technique to prove better bounds for the partialsums problem, dynamic connectivity and (by reductions) other dynamic graph problems. Our proofs are surprisingly simple and clean. The bounds we obtain are often optimal, and lead to a nearly complete understanding of the problems. We also present new matching upper bounds for the partialsums problem. Key words. cellprobe complexity, lower bounds, data structures, dynamic graph problems, partialsums problem AMS subject classification. 68Q17
Implementing signatures for transactional memory
 40th Intl. Symp. on Microarchitecture
, 2007
"... Transactional Memory (TM) systems must track the read and write sets—items read and written during a transaction—to detect conflicts among concurrent transactions. Several TMs use signatures, which summarize unbounded read/write sets in bounded hardware at a performance cost of false positives (conf ..."
Abstract

Cited by 33 (6 self)
 Add to MetaCart
Transactional Memory (TM) systems must track the read and write sets—items read and written during a transaction—to detect conflicts among concurrent transactions. Several TMs use signatures, which summarize unbounded read/write sets in bounded hardware at a performance cost of false positives (conflicts detected when none exists). This paper examines different organizations to achieve hardwareefficient and accurate TM signatures. First, we find that implementing each signature with a single khashfunction Bloom filter (True Bloom signature) is inefficient, as it requires multiported SRAMs. Instead, we advocate using k singlehashfunction Bloom filters in parallel (Parallel Bloom signature), using areaefficient singleported SRAMs. Our formal analysis shows that both organizations perform equally well in theory and our simulationbased evaluation shows this to hold approximately in practice. We also show that by choosing highquality hash functions we can achieve signature designs noticeably more accurate than the previously proposed implementations. Finally, we adapt Pagh and Rodler’s cuckoo hashing to implement CuckooBloom signatures. While this representation does not support set intersection, it mitigates false positives for the common case of small read/write sets and performs like a Bloom filter for large sets. 1.
Randomized language models via perfect hash functions
 In Proc. of ACL08: HLT
, 2008
"... We propose a succinct randomized language model which employs a perfect hash function to encode fingerprints of ngrams and their associated probabilities, backoff weights, or other parameters. The scheme can represent any standard ngram model and is easily combined with existing model reduction te ..."
Abstract

Cited by 27 (0 self)
 Add to MetaCart
We propose a succinct randomized language model which employs a perfect hash function to encode fingerprints of ngrams and their associated probabilities, backoff weights, or other parameters. The scheme can represent any standard ngram model and is easily combined with existing model reduction techniques such as entropypruning. We demonstrate the spacesavings of the scheme via machine translation experiments within a distributed language modeling framework. 1
Monotone Minimal Perfect Hashing: Searching a Sorted Table with O(1) Accesses
"... A minimal perfect hash function maps a set S of n keys into the set { 0, 1,..., n − 1} bijectively. Classical results state that minimal perfect hashing is possible in constant time using a structure occupying space close to the lower bound of log e bits per element. Here we consider the problem of ..."
Abstract

Cited by 20 (8 self)
 Add to MetaCart
A minimal perfect hash function maps a set S of n keys into the set { 0, 1,..., n − 1} bijectively. Classical results state that minimal perfect hashing is possible in constant time using a structure occupying space close to the lower bound of log e bits per element. Here we consider the problem of monotone minimal perfect hashing, in which the bijection is required to preserve the lexicographical ordering of the keys. A monotone minimal perfect hash function can be seen as a very weak form of index that provides ranking just on the set S (and answers randomly outside of S). Our goal is to minimise the description size of the hash function: we show that, for a set S of n elements out of a universe of 2 w elements, O(n log log w) bits are sufficient to hash monotonically with evaluation time O(log w). Alternatively, we can get space O(n log w) bits with O(1) query time. Both of these data structures improve a straightforward construction with O(n log w) space and O(log w) query time. As a consequence, it is possible to search a sorted table with O(1) accesses to the table (using additional O(n log log w) bits). Our results are based on a structure (of independent interest) that represents a trie in a very compact way, but admits errors. As a further application of the same structure, we show how to compute the predecessor (in the sorted order of S) of an arbitrary element, using O(1) accesses in expectation and an index of O(n log w) bits, improving the trivial result of O(nw) bits. This implies an efficient index for searching a blocked memory.
To randomize or not to randomize: Space optimal summaries for hyperlink analysis
 In Proceedings of the 15th World Wide Web Conference (WWW
, 2006
"... Personalized PageRank expresses linkbased page quality around user selected pages in a similar way as PageRank expresses quality over the entire Web. The only previous personalized PageRank algorithm that can serve online queries for an unrestricted choice of pages on large graphs is our Monte Car ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
Personalized PageRank expresses linkbased page quality around user selected pages in a similar way as PageRank expresses quality over the entire Web. The only previous personalized PageRank algorithm that can serve online queries for an unrestricted choice of pages on large graphs is our Monte Carlo algorithm [WAW 2004]; all earlier methods restrict the choice of personalization in some way. In this paper we achieve unrestricted personalization by combining rounding and randomized sketching techniques in the dynamic programming algorithm of Jeh and Widom [WWW 2003]. We evaluate the precision of approximation experimentally on large scale realworld data and find significant improvement over previous results. As a key theoretical contribution we show that our algorithms use an optimal amount of space by also improving earlier asymptotic worstcase lower bounds. Our lower bounds and algorithms apply to the SimRank linkbased similarity function as well; of independent interest is the reduction of the SimRank computation to personalized PageRank.
Succinct Data Structures for Retrieval and Approximate Membership
"... Abstract. The retrieval problem is the problem of associating data with keys in a set. Formally, the data structure must store a function f: U → {0, 1} r that has specified values on the elements of a given set S ⊆ U, S  = n, but may have any value on elements outside S. All known methods (e. g. ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
Abstract. The retrieval problem is the problem of associating data with keys in a set. Formally, the data structure must store a function f: U → {0, 1} r that has specified values on the elements of a given set S ⊆ U, S  = n, but may have any value on elements outside S. All known methods (e. g. those based on perfect hash functions), induce a space overhead of Θ(n) bits over the optimum, regardless of the evaluation time. We show that for any k, query time O(k) can be achieved using space that is within a factor 1 + e −k of optimal, asymptotically for large n. The time to construct the data structure is O(n), expected. If we allow logarithmic evaluation time, the additive overhead can be reduced to O(log log n) bits whp. A general reduction transfers the results on retrieval into analogous results on approximate membership, a problem traditionally addressed using Bloom filters. Thus we obtain space bounds arbitrarily close to the lower bound for this problem as well. The evaluation procedures of our data structures are extremely simple. For the results stated above we assume free access to fully random hash functions. This assumption can be justified using space o(n) to simulate full randomness on a RAM. 1
On the falsepositive rate of Bloom filters
 REPORT, SCHOOL OF COMP. SCI., CARLETON UNIV., 2007.HTTP://CG.SCS.CARLETON.CA/ Â¼ MORIN/PUBLICATIONS/DS/ BLOOMSUBMITTED.PDF
, 2007
"... Abstract. Bloom filters are a randomized data structure for membership queries dating back to 1970. Bloom filters sometimes give erroneous answers to queries, called false positives. Bloom analyzed the probability of such erroneous answers, called the falsepositive rate, and Bloom's analysis has ap ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
Abstract. Bloom filters are a randomized data structure for membership queries dating back to 1970. Bloom filters sometimes give erroneous answers to queries, called false positives. Bloom analyzed the probability of such erroneous answers, called the falsepositive rate, and Bloom's analysis has appeared in many publications throughout the years. We show that Bloom's analysis is incorrect and give a correct analysis.
Backyard Cuckoo Hashing: Constant WorstCase Operations with a Succinct Representation
, 2010
"... The performance of a dynamic dictionary is measured mainly by its update time, lookup time, and space consumption. In terms of update time and lookup time there are known constructions that guarantee constanttime operations in the worst case with high probability, and in terms of space consumption ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
The performance of a dynamic dictionary is measured mainly by its update time, lookup time, and space consumption. In terms of update time and lookup time there are known constructions that guarantee constanttime operations in the worst case with high probability, and in terms of space consumption there are known constructions that use essentially optimal space. In this paper we settle two fundamental open problems: • We construct the first dynamic dictionary that enjoys the best of both worlds: we present a twolevel variant of cuckoo hashing that stores n elements using (1+ϵ)n memory words, and guarantees constanttime operations in the worst case with high probability. Specifically, for any ϵ = Ω((log log n / log n) 1/2) and for any sequence of polynomially many operations, with high probability over the randomness of the initialization phase, all operations are performed in constant time which is independent of ϵ. The construction is based on augmenting cuckoo hashing with a “backyard ” that handles a large fraction of the elements, together with a deamortized perfect hashing scheme for eliminating the dependency on ϵ.
Fast evaluation of unionintersection expressions
, 2007
"... Abstract. We show how to represent sets in a linear space data structure such that expressions involving unions and intersections of sets can be computed in a worstcase efficient way. This problem has applications in e.g. information retrieval and database systems. We mainly consider the RAM model ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Abstract. We show how to represent sets in a linear space data structure such that expressions involving unions and intersections of sets can be computed in a worstcase efficient way. This problem has applications in e.g. information retrieval and database systems. We mainly consider the RAM model of computation, and sets of machine words, but also state our results in the I/O model. On a RAM with word size w, a special case of our result is that the intersection of m (preprocessed) sets, containing n elements in total, can be computed in expected time O(n(log w) 2 /w + km), where k is the number of elements in the intersection. If the first of the two terms dominates, this is a factor w 1−o(1) faster than the standard solution of merging sorted lists. We show a log k cell probe lower bound of time Ω(n/(wm log m) + (1 −)k), meaning w that our upper bound is nearly optimal for small m. Our algorithm uses a novel combination of approximate set representations and wordlevel parallelism. 1