Results 1  10
of
33
Efficient Search for Approximate Nearest Neighbor in High Dimensional Spaces
, 1998
"... We address the problem of designing data structures that allow efficient search for approximate nearest neighbors. More specifically, given a database consisting of a set of vectors in some high dimensional Euclidean space, we want to construct a spaceefficient data structure that would allow us to ..."
Abstract

Cited by 202 (9 self)
 Add to MetaCart
(Show Context)
We address the problem of designing data structures that allow efficient search for approximate nearest neighbors. More specifically, given a database consisting of a set of vectors in some high dimensional Euclidean space, we want to construct a spaceefficient data structure that would allow us to search, given a query vector, for the closest or nearly closest vector in the database. We also address this problem when distances are measured by the L 1 norm, and in the Hamming cube. Significantly improving and extending recent results of Kleinberg, we construct data structures whose size is polynomial in the size of the database, and search algorithms that run in time nearly linear or nearly quadratic in the dimension (depending on the case; the extra factors are polylogarithmic in the size of the database). Computer Science Department, Technion  IIT, Haifa 32000, Israel. Email: eyalk@cs.technion.ac.il y Bell Communications Research, MCC1C365B, 445 South Street, Morristown, NJ ...
Approximate Range Selection Queries in PeertoPeer
 In CIDR
, 2002
"... We present an architecture for a data sharing peertopeer system where the data is shared in the form of database relations. In general, peertopeer systems try to locate exactmatch data objects to simple user queries. ..."
Abstract

Cited by 99 (6 self)
 Add to MetaCart
We present an architecture for a data sharing peertopeer system where the data is shared in the form of database relations. In general, peertopeer systems try to locate exactmatch data objects to simple user queries.
OrderPreserving Symmetric Encryption
"... We initiate the cryptographic study of orderpreserving symmetric encryption (OPE), a primitive suggested in the database community by Agrawal et al. (SIGMOD ’04) for allowing efficient range queries on encrypted data. Interestingly, we first show that a straightforward relaxation of standard securi ..."
Abstract

Cited by 63 (1 self)
 Add to MetaCart
(Show Context)
We initiate the cryptographic study of orderpreserving symmetric encryption (OPE), a primitive suggested in the database community by Agrawal et al. (SIGMOD ’04) for allowing efficient range queries on encrypted data. Interestingly, we first show that a straightforward relaxation of standard security notions for encryption such as indistinguishability against chosenplaintext attack (INDCPA) is unachievable by a practical OPE scheme. Instead, we propose a security notion in the spirit of pseudorandom functions (PRFs) and related primitives asking that an OPE scheme look “asrandomaspossible ” subject to the orderpreserving constraint. We then design an efficient OPE scheme and prove its security under our notion based on pseudorandomness of an underlying blockcipher. Our construction is based on a natural relation we uncover between a random orderpreserving function and the hypergeometric probability distribution. In particular, it makes blackbox use of an efficient sampling algorithm for the latter. 1
Distributed Clustering Using Collective Principal Component Analysis
 Knowledge and Information Systems
, 1999
"... This paper considers distributed clustering of high dimensional heterogeneous data using a distributed Principal Component Analysis (PCA) technique called the Collective PCA. It presents the Collective PCA technique that can be used independent of the clustering application. It shows a way to inte ..."
Abstract

Cited by 59 (9 self)
 Add to MetaCart
(Show Context)
This paper considers distributed clustering of high dimensional heterogeneous data using a distributed Principal Component Analysis (PCA) technique called the Collective PCA. It presents the Collective PCA technique that can be used independent of the clustering application. It shows a way to integrate the Collective PCA with a given otheshelf clustering algorithm in order to develop a distributed clustering technique. It also presents experimental results using dierent test data sets including an application for web mining.
LocalityPreserving Hashing in Multidimensional Spaces
 In Proceedings of the 29th ACM Symposium on Theory of Computing
, 1997
"... this paper was published in Proceedings of the 29th Annual ACM Symposium on Theory of Computing, pages 618625, 1997 ..."
Abstract

Cited by 52 (4 self)
 Add to MetaCart
(Show Context)
this paper was published in Proceedings of the 29th Annual ACM Symposium on Theory of Computing, pages 618625, 1997
Authenticated hash tables
 In ACM Conference on Computer and Communications Security (CCS ’08
, 2008
"... Hash tables are fundamental data structures that optimally answer membership queries. Suppose a client stores n elements in a hash table that is outsourced at a remote server so that the client can save space or achieve load balancing. Authenticating the hash table functionality, i.e., verifying the ..."
Abstract

Cited by 40 (11 self)
 Add to MetaCart
(Show Context)
Hash tables are fundamental data structures that optimally answer membership queries. Suppose a client stores n elements in a hash table that is outsourced at a remote server so that the client can save space or achieve load balancing. Authenticating the hash table functionality, i.e., verifying the correctness of queries answered by the server and ensuring the integrity of the stored data, is crucial because the server, lying outside the administrative control of the client, can be malicious. We design efficient and secure protocols for optimally authenticating membership queries on hash tables: for any fixed constants 0 < ǫ < 1 and κ> 1/ǫ, the server can provide a proof of integrity of the answer to a (non)membership query in constant time, requiring O ( n ǫ / log κǫ−1 n) time to treat updates, yet keeping the communication and verification costs constant. This is the first construction for authenticating a hash table with constant query cost and sublinear update cost. Our solution employs the RSA accumulator in a nested way over the stored data, strictly improving upon previous accumulatorbased solutions. Our construction applies to two concrete data authentication models and lends itself to a scheme that achieves different tradeoffs—namely, constant update time and O(n ǫ / log κǫ n) query time for fixed ǫ> 0 and κ> 0. An experimental evaluation of our solution shows very good scalability.
Entropy based nearest neighbor search in high dimensions
 In SODA ’06: Proceedings of the seventeenth annual ACMSIAM Symposium on Discrete Algorithms
"... In this paper we study the problem of finding the approximate nearest neighbor of a query point in the high dimensional space, focusing on the Euclidean space. The earlier approaches use localitypreserving hash functions (that tend to map nearby points to the same value) to construct several hash ta ..."
Abstract

Cited by 37 (5 self)
 Add to MetaCart
In this paper we study the problem of finding the approximate nearest neighbor of a query point in the high dimensional space, focusing on the Euclidean space. The earlier approaches use localitypreserving hash functions (that tend to map nearby points to the same value) to construct several hash tables to ensure that the query point hashes to the same bucket as its nearest neighbor in at least one table. Our approach is different – we use one (or a few) hash table and hash several randomly chosen points in the neighborhood of the query point showing that at least one of them will hash to the bucket containing its nearest neighbor. We show that the number of randomly chosen points in the neighborhood of the query point q required depends on the entropy of the hash value h(p) of a random point p at the same distance from q at its nearest neighbor, given q and the locality preserving hash function h chosen randomly from the hash family. Precisely, we show that if the entropy I(h(p)q, h) = M and g is a bound on the probability that two faroff points will hash to the same bucket, then we can find the approximate nearest neighbor in O(nρ) time and near linear Õ(n) space where ρ = M / log(1/g). Alternatively we can build a data structure of size Õ(n1/(1−ρ)) to answer queries in Õ(d) time. By applying this analysis to the locality preserving hash functions in [15, 19, 6] and adjusting the parameters we show that the c nearest neighbor can be computed in time Õ(nρ) and near linear space where ρ ≈ 2.06/c as c becomes large. 1
Low Latency Photon Mapping Using Block Hashing
 IN PROCEEDINGS OF THE CONFERENCE ON GRAPHICS HARDWARE 2002
, 2002
"... Photon mapping is useful in the acceleration of global illumination and caustic effects computed by path tracing. For hardware accelerated rendering, photon maps would be especially useful for simulating caustic lighting effects on nonLambertian surfaces. For this to be possible, an efficient hardw ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
Photon mapping is useful in the acceleration of global illumination and caustic effects computed by path tracing. For hardware accelerated rendering, photon maps would be especially useful for simulating caustic lighting effects on nonLambertian surfaces. For this to be possible, an efficient hardware algorithm for the computation of the k nearest neighbours to a sample point is required. Existing
SemPFS: Integrating semanticsbased access mechanisms with P2P file systems
 University of Cincinnati
, 2003
"... We present an architecture for a peertopeer (P2P) file system which supports semanticsbased access. Central to this work is to provide semantic indexing and retrieval capabilities. Our semantic indexing and locating approach is based on distributed hash tables (DHTs) where the indices of semant ..."
Abstract

Cited by 14 (6 self)
 Add to MetaCart
(Show Context)
We present an architecture for a peertopeer (P2P) file system which supports semanticsbased access. Central to this work is to provide semantic indexing and retrieval capabilities. Our semantic indexing and locating approach is based on distributed hash tables (DHTs) where the indices of semantically close files are clustered to the same peers with high probability (nearly 100%) by the use of locality sensitive hash functions. A query for finding semantically close files can be answered by consulting only a small number of peer nodes which are most responsible for such a query, instead of by query flooding. Our approach only adds index information to peer nodes, thus imposing only a small storage overhead. This paper constitutes an initial step to integrate semanticsbased access mechanisms into a P2P file system. 1.
An Improved Algorithm Finding Nearest Neighbor Using Kdtrees
, 2008
"... We suggest a simple modification to the Kdtree search algorithm for nearest neighbor search resulting in an improved performance. The Kdtree data structure seems to work well in finding nearest neighbors in low dimensions but its performance degrades even if the number of dimensions increases to ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
We suggest a simple modification to the Kdtree search algorithm for nearest neighbor search resulting in an improved performance. The Kdtree data structure seems to work well in finding nearest neighbors in low dimensions but its performance degrades even if the number of dimensions increases to more than two. Since the exact nearest neighbor search problem suffers from the curse of dimensionality we focus on approximate solutions; a capproximate nearest neighbor is any neighbor within distance at most c times the distance to the nearest neighbor. We show that for a randomly constructed database of points if the query point is chosen close to one of the points in the data base, the traditional Kdtree search algorithm has a very low probability of finding an approximate nearest neighbor; the probability of success drops exponentially in the number of dimensions d as e −Ω(d/c). However, a simple change to the search algorithm results in a much higher chance of success. Instead of searching for the query point in the Kdtree we search for a random set of points in the neighborhood of the query point. It turns out that searching for e Ω(d/c) such points can find the capproximate nearest neighbor with a much higher chance of success.