Results 1  10
of
226
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality
, 1998
"... The nearest neighbor problem is the following: Given a set of n points P = fp 1 ; : : : ; png in some metric space X, preprocess P so as to efficiently answer queries which require finding the point in P closest to a query point q 2 X. We focus on the particularly interesting case of the ddimens ..."
Abstract

Cited by 909 (38 self)
 Add to MetaCart
The nearest neighbor problem is the following: Given a set of n points P = fp 1 ; : : : ; png in some metric space X, preprocess P so as to efficiently answer queries which require finding the point in P closest to a query point q 2 X. We focus on the particularly interesting case of the ddimensional Euclidean space where X = ! d under some l p norm. Despite decades of effort, the current solutions are far from satisfactory; in fact, for large d, in theory or in practice, they provide little improvement over the bruteforce algorithm which compares the query point to each data point. Of late, there has been some interest in the approximate nearest neighbors problem, which is: Find a point p 2 P that is an fflapproximate nearest neighbor of the query q in that for all p 0 2 P , d(p; q) (1 + ffl)d(p 0 ; q). We present two algorithmic results for the approximate version that significantly improve the known bounds: (a) preprocessing cost polynomial in n and d, and a trul...
Balanced Allocations
 SIAM Journal on Computing
, 1994
"... Suppose that we sequentially place n balls into n boxes by putting each ball into a randomly chosen box. It is well known that when we are done, the fullest box has with high probability (1 + o(1)) ln n/ ln ln n balls in it. Suppose instead that for each ball we choose two boxes at random and place ..."
Abstract

Cited by 288 (8 self)
 Add to MetaCart
(Show Context)
Suppose that we sequentially place n balls into n boxes by putting each ball into a randomly chosen box. It is well known that when we are done, the fullest box has with high probability (1 + o(1)) ln n/ ln ln n balls in it. Suppose instead that for each ball we choose two boxes at random and place the ball into the one which is less full at the time of placement. We show that with high probability, the fullest box contains only ln ln n/ ln 2 +O(1) balls  exponentially less than before. Furthermore, we show that a similar gap exists in the infinite process, where at each step one ball, chosen uniformly at random, is deleted, and one ball is added in the manner above. We discuss consequences of this and related theorems for dynamic resource allocation, hashing, and online load balancing.
Approximate distance oracles
 J. ACM
"... Let G = (V, E) be an undirected weighted graph with V  = n and E  = m. Let k ≥ 1 be an integer. We show that G = (V, E) can be preprocessed in O(kmn 1/k) expected time, constructing a data structure of size O(kn 1+1/k), such that any subsequent distance query can be answered, approximately, in ..."
Abstract

Cited by 250 (9 self)
 Add to MetaCart
Let G = (V, E) be an undirected weighted graph with V  = n and E  = m. Let k ≥ 1 be an integer. We show that G = (V, E) can be preprocessed in O(kmn 1/k) expected time, constructing a data structure of size O(kn 1+1/k), such that any subsequent distance query can be answered, approximately, in O(k) time. The approximate distance returned is of stretch at most 2k − 1, i.e., the quotient obtained by dividing the estimated distance by the actual distance lies between 1 and 2k−1. A 1963 girth conjecture of Erdős, implies that Ω(n 1+1/k) space is needed in the worst case for any real stretch strictly smaller than 2k + 1. The space requirement of our algorithm is, therefore, essentially optimal. The most impressive feature of our data structure is its constant query time, hence the name “oracle”. Previously, data structures that used only O(n 1+1/k) space had a query time of Ω(n 1/k). Our algorithms are extremely simple and easy to implement efficiently. They also provide faster constructions of sparse spanners of weighted graphs, and improved tree covers and distance labelings of weighted or unweighted graphs. 1
Succinct indexable dictionaries with applications to encoding kary trees and multisets
 In Proceedings of the 13th Annual ACMSIAM Symposium on Discrete Algorithms (SODA
"... We consider the indexable dictionary problem, which consists of storing a set S ⊆ {0,...,m − 1} for some integer m, while supporting the operations of rank(x), which returns the number of elements in S that are less than x if x ∈ S, and −1 otherwise; and select(i) which returns the ith smallest ele ..."
Abstract

Cited by 238 (9 self)
 Add to MetaCart
(Show Context)
We consider the indexable dictionary problem, which consists of storing a set S ⊆ {0,...,m − 1} for some integer m, while supporting the operations of rank(x), which returns the number of elements in S that are less than x if x ∈ S, and −1 otherwise; and select(i) which returns the ith smallest element in S. We give a data structure that supports both operations in O(1) time on the RAM model and requires B(n,m)+ o(n)+O(lg lg m) bits to store a set of size n, where B(n,m) = ⌈ lg ( m) ⌉ n is the minimum number of bits required to store any nelement subset from a universe of size m. Previous dictionaries taking this space only supported (yes/no) membership queries in O(1) time. In the cell probe model we can remove the O(lg lg m) additive term in the space bound, answering a question raised by Fich and Miltersen, and Pagh. We present extensions and applications of our indexable dictionary data structure, including: • an informationtheoretically optimal representation of a kary cardinal tree that supports standard operations in constant time, • a representation of a multiset of size n from {0,...,m − 1} in B(n,m+n) + o(n) bits that supports (appropriate generalizations of) rank and select operations in constant time, and • a representation of a sequence of n nonnegative integers summing up to m in B(n,m + n) + o(n) bits that supports prefix sum queries in constant time. 1
Compact routing schemes
 in SPAA ’01: Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
"... We describe several compact routing schemes for general weighted undirected networks. Our schemes are simple and easy to implement. The routing tables stored at the nodes of the network are all very small. The headers attached to the routed messages, including the name of the destination, are extrem ..."
Abstract

Cited by 217 (5 self)
 Add to MetaCart
(Show Context)
We describe several compact routing schemes for general weighted undirected networks. Our schemes are simple and easy to implement. The routing tables stored at the nodes of the network are all very small. The headers attached to the routed messages, including the name of the destination, are extremely short. The routing decision at each node takes constant time. Yet, the stretch of these routing schemes, i.e., the worst ratio between the cost of the path on which a packet is routed and the cost of the cheapest path from source to destination, is a small constant. Our schemes achieve a nearoptimal tradeoff between the size of the routing tables used and the resulting stretch. More specifically, we obtain: 1. A routing scheme that uses only ~ O(n 1=2) bits of memory at each node of an nnode network that has stretch 3. The space is optimal, up to logarithmic factors, in the sense that
Compressed suffix arrays and suffix trees with applications to text indexing and string matching
, 2005
"... The proliferation of online text, such as found on the World Wide Web and in online databases, motivates the need for spaceefficient text indexing methods that support fast string searching. We model this scenario as follows: Consider a text T consisting of n symbols drawn from a fixed alphabet Σ. ..."
Abstract

Cited by 211 (18 self)
 Add to MetaCart
The proliferation of online text, such as found on the World Wide Web and in online databases, motivates the need for spaceefficient text indexing methods that support fast string searching. We model this scenario as follows: Consider a text T consisting of n symbols drawn from a fixed alphabet Σ. The text T can be represented in n lg Σ  bits by encoding each symbol with lg Σ  bits. The goal is to support fast online queries for searching any string pattern P of m symbols, with T being fully scanned only once, namely, when the index is created at preprocessing time. The text indexing schemes published in the literature are greedy in terms of space usage: they require Ω(n lg n) additional bits of space in the worst case. For example, in the standard unit cost RAM, suffix trees and suffix arrays need Ω(n) memory words, each of Ω(lg n) bits. These indexes are larger than the text itself by a multiplicative factor of Ω(lg Σ  n), which is significant when Σ is of constant size, such as in ascii or unicode. On the other hand, these indexes support fast searching, either in O(m lg Σ) timeorinO(m +lgn) time, plus an outputsensitive cost O(occ) for listing the occ pattern occurrences. We present a new text index that is based upon compressed representations of suffix arrays and suffix trees. It achieves a fast O(m / lg Σ  n +lgɛ Σ  n) search time in the worst case, for any constant
Tracing Traitors
, 1994
"... We give cryptographic schemes that help trace the source of leaks when sensitive or proprietary data is made available to a large set of parties. A very relevant application is in the context of pay television, where only paying customers should be able to view certain programs. In this application ..."
Abstract

Cited by 167 (10 self)
 Add to MetaCart
We give cryptographic schemes that help trace the source of leaks when sensitive or proprietary data is made available to a large set of parties. A very relevant application is in the context of pay television, where only paying customers should be able to view certain programs. In this application the programs are normally encrypted and then the sensitive data is the decryption keys that are given to paying customers. If a pirate decoder is found it is desirable to reveal the source of its decryption keys. We describe fully resilient schemes which can be used against any decoder which decrypts with nonnegligible probability. Since there is typically little demand for decoders which decrypt only a small fraction of the transmissions (even if it is nonnegligible), we further introduce threshold tracing schemes which can only be used against decoders which succeed in decryption with probability greater than some threshold. Threshold schemes are considerably more efficient than fully resilient schemes.
Dynamic Perfect Hashing: Upper and Lower Bounds
, 1990
"... The dynamic dictionary problem is considered: provide an algorithm for storing a dynamic set, allowing the operations insert, delete, and lookup. A dynamic perfect hashing strategy is given: a randomized algorithm for the dynamic dictionary problem that takes O(1) worstcase time for lookups and ..."
Abstract

Cited by 140 (14 self)
 Add to MetaCart
The dynamic dictionary problem is considered: provide an algorithm for storing a dynamic set, allowing the operations insert, delete, and lookup. A dynamic perfect hashing strategy is given: a randomized algorithm for the dynamic dictionary problem that takes O(1) worstcase time for lookups and O(1) amortized expected time for insertions and deletions; it uses space proportional to the size of the set stored. Furthermore, lower bounds for the time complexity of a class of deterministic algorithms for the dictionary problem are proved. This class encompasses realistic hashingbased schemes that use linear space. Such algorithms have amortized worstcase time complexity \Omega(log n) for a sequence of n insertions and
On Data Structures and Asymmetric Communication Complexity
 JOURNAL OF COMPUTER AND SYSTEM SCIENCES
, 1994
"... In this paper we consider two party communication complexity when the input sizes of the two players differ significantly, the "asymmetric" case. Most of previous work on communication complexity only considers the total number of bits sent, but we study tradeoffs between the number of ..."
Abstract

Cited by 93 (9 self)
 Add to MetaCart
In this paper we consider two party communication complexity when the input sizes of the two players differ significantly, the "asymmetric" case. Most of previous work on communication complexity only considers the total number of bits sent, but we study tradeoffs between the number of bits the first player sends and the number of bits the second sends. These
The Bloomier filter: An efficient data structure for static support lookup tables
 in Proc. Symposium on Discrete Algorithms
, 2004
"... “Oh boy, here is another David Nelson” ..."
(Show Context)