Results 1 - 10
of
22
Efficient Search for Approximate Nearest Neighbor in High Dimensional Spaces
, 1998
"... We address the problem of designing data structures that allow efficient search for approximate nearest neighbors. More specifically, given a database consisting of a set of vectors in some high dimensional Euclidean space, we want to construct a space-efficient data structure that would allow us to ..."
Abstract
-
Cited by 173 (9 self)
- Add to MetaCart
We address the problem of designing data structures that allow efficient search for approximate nearest neighbors. More specifically, given a database consisting of a set of vectors in some high dimensional Euclidean space, we want to construct a space-efficient data structure that would allow us to search, given a query vector, for the closest or nearly closest vector in the database. We also address this problem when distances are measured by the L 1 norm, and in the Hamming cube. Significantly improving and extending recent results of Kleinberg, we construct data structures whose size is polynomial in the size of the database, and search algorithms that run in time nearly linear or nearly quadratic in the dimension (depending on the case; the extra factors are polylogarithmic in the size of the database). Computer Science Department, Technion --- IIT, Haifa 32000, Israel. Email: eyalk@cs.technion.ac.il y Bell Communications Research, MCC-1C365B, 445 South Street, Morristown, NJ ...
Approximate Range Selection Queries in Peer-to-Peer
- In CIDR
, 2002
"... We present an architecture for a data sharing peer-to-peer system where the data is shared in the form of database relations. In general, peer-to-peer systems try to locate exactmatch data objects to simple user queries. ..."
Abstract
-
Cited by 76 (6 self)
- Add to MetaCart
We present an architecture for a data sharing peer-to-peer system where the data is shared in the form of database relations. In general, peer-to-peer systems try to locate exactmatch data objects to simple user queries.
Locality-Preserving Hashing in Multidimensional Spaces
- In Proceedings of the 29th ACM Symposium on Theory of Computing
, 1997
"... this paper was published in Proceedings of the 29th Annual ACM Symposium on Theory of Computing, pages 618--625, 1997 ..."
Abstract
-
Cited by 46 (2 self)
- Add to MetaCart
this paper was published in Proceedings of the 29th Annual ACM Symposium on Theory of Computing, pages 618--625, 1997
Distributed Clustering Using Collective Principal Component Analysis
- Knowledge and Information Systems
, 1999
"... This paper considers distributed clustering of high dimensional heterogeneous data using a distributed Principal Component Analysis (PCA) technique called the Collective PCA. It presents the Collective PCA technique that can be used independent of the clustering application. It shows a way to inte ..."
Abstract
-
Cited by 38 (8 self)
- Add to MetaCart
This paper considers distributed clustering of high dimensional heterogeneous data using a distributed Principal Component Analysis (PCA) technique called the Collective PCA. It presents the Collective PCA technique that can be used independent of the clustering application. It shows a way to integrate the Collective PCA with a given o-the-shelf clustering algorithm in order to develop a distributed clustering technique. It also presents experimental results using dierent test data sets including an application for web mining.
Entropy based Nearest Neighbor Search in High Dimensions
, 2005
"... In this paper we study the problem of finding the approximate nearest neighbor of a query point in the high dimensional space, focusing on the Euclidean space. The earlier approaches use locality-preserving hash functions (that tend to map nearby points to the same value) to construct several hash t ..."
Abstract
-
Cited by 19 (5 self)
- Add to MetaCart
In this paper we study the problem of finding the approximate nearest neighbor of a query point in the high dimensional space, focusing on the Euclidean space. The earlier approaches use locality-preserving hash functions (that tend to map nearby points to the same value) to construct several hash tables to ensure that the query point hashes to the same bucket as its nearest neighbor in at least one table. Our approach is different – we use one (or a few) hash table and hash several randomly chosen points in the neighborhood of the query point showing that at least one of them will hash to the bucket containing its nearest neighbor. We show that the number of randomly chosen points in the neighborhood of the query point q required depends on the entropy of the hash value h(p) of a random point p at the same distance from q at its nearest neighbor, given q and the locality preserving hash function h chosen randomly from the hash family. Precisely, we show that if the entropy I(h(p)|q, h) = M and g is a bound on the probability that two far-off points will hash to the same bucket, then we can find the approximate nearest neighbor in O(nρ) time and near linear Õ(n) space where ρ = M/log(1/g). Alternatively we can build a data structure of size Õ(n1/(1−ρ) ) to answer queries in Õ(d) time. By applying this analysis to the locality preserving hash functions in [17, 21, 6] and adjusting the parameters we show that the c nearest neighbor can be computed in time Õ(nρ) and near linear space where ρ ≈ 2.06/c as c becomes large.
Low Latency Photon Mapping Using Block Hashing
- IN PROCEEDINGS OF THE CONFERENCE ON GRAPHICS HARDWARE 2002
, 2002
"... Photon mapping is useful in the acceleration of global illumination and caustic effects computed by path tracing. For hardware accelerated rendering, photon maps would be especially useful for simulating caustic lighting effects on non-Lambertian surfaces. For this to be possible, an efficient hardw ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
Photon mapping is useful in the acceleration of global illumination and caustic effects computed by path tracing. For hardware accelerated rendering, photon maps would be especially useful for simulating caustic lighting effects on non-Lambertian surfaces. For this to be possible, an efficient hardware algorithm for the computation of the k nearest neighbours to a sample point is required. Existing
Authenticated hash tables
- In ACM Conference on Computer and Communications Security (CCS ’08
, 2008
"... Hash tables are fundamental data structures that optimally answer membership queries. Suppose a client stores n elements in a hash table that is outsourced at a remote server so that the client can save space or achieve load balancing. Authenticating the hash table functionality, i.e., verifying the ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
Hash tables are fundamental data structures that optimally answer membership queries. Suppose a client stores n elements in a hash table that is outsourced at a remote server so that the client can save space or achieve load balancing. Authenticating the hash table functionality, i.e., verifying the correctness of queries answered by the server and ensuring the integrity of the stored data, is crucial because the server, lying outside the administrative control of the client, can be malicious. We design efficient and secure protocols for optimally authenticating membership queries on hash tables: for any fixed constants 0 < ǫ < 1 and κ> 1/ǫ, the server can provide a proof of integrity of the answer to a (non-)membership query in constant time, requiring O ( n ǫ / log κǫ−1 n) time to treat updates, yet keeping the communication and verification costs constant. This is the first construction for authenticating a hash table with constant query cost and sublinear update cost. Our solution employs the RSA accumulator in a nested way over the stored data, strictly improving upon previous accumulator-based solutions. Our construction applies to two concrete data authentication models and lends itself to a scheme that achieves different trade-offs—namely, constant update time and O(n ǫ / log κǫ n) query time for fixed ǫ> 0 and κ> 0. An experimental evaluation of our solution shows very good scalability.
Order-Preserving Symmetric Encryption
"... We initiate the cryptographic study of order-preserving symmetric encryption (OPE), a primitive suggested in the database community by Agrawal et al. (SIGMOD ’04) for allowing efficient range queries on encrypted data. Interestingly, we first show that a straightforward relaxation of standard securi ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
We initiate the cryptographic study of order-preserving symmetric encryption (OPE), a primitive suggested in the database community by Agrawal et al. (SIGMOD ’04) for allowing efficient range queries on encrypted data. Interestingly, we first show that a straightforward relaxation of standard security notions for encryption such as indistinguishability against chosen-plaintext attack (IND-CPA) is unachievable by a practical OPE scheme. Instead, we propose a security notion in the spirit of pseudorandom functions (PRFs) and related primitives asking that an OPE scheme look “as-random-as-possible ” subject to the order-preserving constraint. We then design an efficient OPE scheme and prove its security under our notion based on pseudorandomness of an underlying blockcipher. Our construction is based on a natural relation we uncover between a random order-preserving function and the hypergeometric probability distribution. In particular, it makes black-box use of an efficient sampling algorithm for the latter. 1
Nearest Neighbor Search in Multidimensional Spaces
, 1999
"... The Nearest Neighbor Search problem is defined as follows: given a set P of n points, preprocess the points so as to efficiently answer queries that require finding the closest point in P to a query point q. If we are willing to settle for a point that is almost as close as the nearest neighbor, t ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
The Nearest Neighbor Search problem is defined as follows: given a set P of n points, preprocess the points so as to efficiently answer queries that require finding the closest point in P to a query point q. If we are willing to settle for a point that is almost as close as the nearest neighbor, then we can relax the problem to the approximate Nearest Neighbor Search. Nearest Neighbor Search (exact or approximate) is an integral component in a wide range of applications that include multimedia databases, computational biology, data mining, and information retrieval. The common thread in all these applications is similarity search: given a database of objects, we want to return the object in the database that is most similar to a query object. The objects are mapped onto points in a high dimensional metric space , and similarity search reduces to a nearest neighbor search. The dimension of the underlying space may be in the order of a few hundreds, or thousands; therefore, we r...
Locality preserving dictionaries: Theory and application to clustering in databases
, 1999
"... We discuss strategies for building locality preserving dictionaries (LPDs) in which all data items within a range lie together, within a space that is a small function of the number of items in the range. We describe an approach where the memory space is partitioned and items are placed in sorted o ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
We discuss strategies for building locality preserving dictionaries (LPDs) in which all data items within a range lie together, within a space that is a small function of the number of items in the range. We describe an approach where the memory space is partitioned and items are placed in sorted order, with judiciously placed gaps between them, resulting in efficient insert, delete, and search operations. We adapt our algorithms to the particular application of storing database relations on disk via LPDs. By providing a natural clustering mechanism for data in a sorted order instead of simply clustering data at a page granularity, LPDs provide much better I/O performance than traditional clustered indexes on range searches, as well as on access of data in sorted order. Analytical studies of LPDs and clustered B-Trees show that using LPDs results in up to 5 to 13 times faster range searches and sorted order accesses over using a clustered B-Tree, at the expense of 0 to 75% overhead i...

