Results 1  10
of
109
Indexdriven similarity search in metric spaces
 ACM Transactions on Database Systems
, 2003
"... Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some similarity measure. In this article, we focus on methods for similarity search th ..."
Abstract

Cited by 192 (8 self)
 Add to MetaCart
Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some similarity measure. In this article, we focus on methods for similarity search that make the general assumption that similarity is represented with a distance metric d. Existing methods for handling similarity search in this setting typically fall into one of two classes. The first directly indexes the objects based on distances (distancebased indexing), while the second is based on mapping to a vector space (mappingbased approach). The main part of this article is dedicated to a survey of distancebased indexing methods, but we also briefly outline how search occurs in mappingbased methods. We also present a general framework for performing search based on distances, and present algorithms for common types of queries that operate on an arbitrary “search hierarchy. ” These algorithms can be applied on each of the methods presented, provided a suitable search hierarchy is defined.
Virtual Landmarks for the Internet
, 2003
"... Internet coordinate schemes have been proposed as a method for estimating minimum round trip time between hosts without direct measurement. In such a scheme, each host is assigned a set of coordinates, and Euclidean distance is used to form the desired estimate. Two key questions are: How accurate a ..."
Abstract

Cited by 187 (3 self)
 Add to MetaCart
Internet coordinate schemes have been proposed as a method for estimating minimum round trip time between hosts without direct measurement. In such a scheme, each host is assigned a set of coordinates, and Euclidean distance is used to form the desired estimate. Two key questions are: How accurate are coordinate schemes across the Internet as a whole? And: are coordinate assignment schemes fast enough, and scalable enough, for large scale use? In this paper we make contributions toward answering both those questions. Whereas the coordinate assignment problem has in the past been approached by nonlinear optimization, we develop a faster method based on dimensionality reduction of the Lipschitz embedding. We show that this method is reasonably accurate, even when applied to measurements spanning the Internet, and that it naturally leads to a scalable measurement strategy based on the notion of virtual landmarks.
Estimating 3D Hand Pose From a Cluttered Image
, 2003
"... A method is proposed that can generate a ranked list of plausible threedimensional hand configurations that best match an input image. Hand pose estimation is formulated as an image database indexing problem, where the closest matches for an input hand image are retrieved from a large database of s ..."
Abstract

Cited by 173 (7 self)
 Add to MetaCart
(Show Context)
A method is proposed that can generate a ranked list of plausible threedimensional hand configurations that best match an input image. Hand pose estimation is formulated as an image database indexing problem, where the closest matches for an input hand image are retrieved from a large database of synthetic hand images. In contrast to previous approaches, the system can function in the presence of clutter, thanks to two novel cluttertolerant indexing methods. First, a computationally efficient approximation of the imagetomodel chamfer distance is obtained by embedding binary edge images into a highdimensional Euclidean space. Second, a generalpurpose, probabilistic line matching method identifies those line segment correspondences between model and input images that are the least likely to have occurred by chance. The performance of this cluttertolerant approach is demonstrated in quantitative experiments with hundreds of real hand images.
BoostMap: A Method for Efficient Approximate Similarity Rankings
, 2003
"... This paper introduces BoostMap, a method that can significantly reduce retrieval time in image and video database systems that employ computationally expensive distance measures, metric or nonmetric. Database and query objects are embedded into a Euclidean space, in which similarities can be rapidl ..."
Abstract

Cited by 112 (13 self)
 Add to MetaCart
(Show Context)
This paper introduces BoostMap, a method that can significantly reduce retrieval time in image and video database systems that employ computationally expensive distance measures, metric or nonmetric. Database and query objects are embedded into a Euclidean space, in which similarities can be rapidly measured using a weighted Manhattan distance. Embedding construction is formulated as a machine learning task, where AdaBoost is used to combine many simple, 1D embeddings into a multidimensional embedding that preserves a significant amount of the proximity structure in the original space. Performance is evaluated in a hand pose estimation system, and a dynamic gesture recognition system, where the proposed method is used to retrieve approximate nearest neighbors under expensive image and video similarity measures. In both systems, BoostMap significantly increases efficiency, with minimal losses in accuracy. Moreover, the experiments indicate that BoostMap compares favorably with existing embedding methods that have been employed in computer vision and database applications, i.e., FastMap and Bourgain embeddings.
ZeroConfiguration, Robust Indoor Localization: Theory and Experimentation
"... With the technical advances in ubiquitous computing and wireless networking, there has been an increasing need to capture the context information (such as the location) and to figure it into applications. In this paper, we establish the theoretical base and develop a localization algorithm for buil ..."
Abstract

Cited by 70 (0 self)
 Add to MetaCart
(Show Context)
With the technical advances in ubiquitous computing and wireless networking, there has been an increasing need to capture the context information (such as the location) and to figure it into applications. In this paper, we establish the theoretical base and develop a localization algorithm for building a zeroconfiguration and robust indoor localization and tracking system to support locationbased network services and management. The localization algorithm takes as input the online measurements of received signal strengths (RSSs) between 802.11 APs and between a client and its neighboring APs, and estimates the location of the client. The online RSS measurements among 802.11 APs are used to capture (in realtime) the effects of RF multipath fading, temperature and humidity variations, opening and closing of doors, furniture relocation, and human mobility on the RSS measurements, and to create, based on the truncated singular value decomposition (SVD) technique, a mapping between the RSS measure and the actual geographical distance. The proposed system requires zeroconfiguration because the online calibration of the effect of wireless physical characteristics on RSS measurement is automated and no onsite survey or initial training is required to bootstrap the system. It is also quite responsive to environmental dynamics, as the impacts of physical characteristics changes have been explicitly figured in the mapping between the RSS measures and the actual geographical distances. We have implemented the proposed system with inexpensive offtheshelf WiFi hardware and sensory functions of IEEE 802.11, and carried out a detailed empirical study in our division building. The empirical results show the proposed system is quite robust and gives accurate localization results (i.e., with the localization error within 3 meters).
Searching in Metric Spaces with UserDefined and Approximate Distances
 ACM Transactions on Database Systems
, 2002
"... Metric access methods (MAMs), such as the Mtree, are powerful index structures for supporting similarity queries on metric spaces, which represent a common abstraction forthIj searchrc problems tho arise in many modern application areas, such as multimedia, data mining, decision support, pattern re ..."
Abstract

Cited by 34 (6 self)
 Add to MetaCart
Metric access methods (MAMs), such as the Mtree, are powerful index structures for supporting similarity queries on metric spaces, which represent a common abstraction forthIj searchrc problems tho arise in many modern application areas, such as multimedia, data mining, decision support, pattern recognition, and genomic databases. As compared to multidimensional (spatial) access methods (SAMs), MAMs are more general, yet they are reputed to lose in flexibility, since it is commonly deemed th= th= can only answer queries using th same distance function used to buildth index. In thj paper we sh wth" th" limitation is only apparent  thus MAMs are far more flexible than believed  and extend the Mtree so as to be able to support userdefined distance criteria, approximate distance functions to speed up query evaluation, as well as dissimilarity functions whD h are not metrics. The soextended Mtree, also called QICMtree, can deal with three distinct distances at a time: 1) a query (userdefined) distance,2)anindex distance (used to buildth tree), and 3) a comparison(iso oximate) distance (used to quickly discard from th search uninteresting parts of th tree). We develop an analytical cost model thl accurately characterizes the performance of QICMtree and validate such model thjj"[ extensive experimentation on real metric data sets. In particular, our analysis is able to predict th best evaluation strategy (i.e.whe h distances to use) under a variety of configurations, by properly taking into account relevant factors such as th distribution of distances, th cost of computing distances, and th actual index structure. We also prove thF the overall saving in CPU search costs whj using an approximate distance can be estimated by using information on the data set only  thus...
Nearest Neighbor Retrieval Using DistanceBased Hashing
"... Abstract — A method is proposed for indexing spaces with arbitrary distance measures, so as to achieve efficient approximate nearest neighbor retrieval. Hashing methods, such as Locality Sensitive Hashing (LSH), have been successfully applied for similarity indexing in vector spaces and string space ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
(Show Context)
Abstract — A method is proposed for indexing spaces with arbitrary distance measures, so as to achieve efficient approximate nearest neighbor retrieval. Hashing methods, such as Locality Sensitive Hashing (LSH), have been successfully applied for similarity indexing in vector spaces and string spaces under the Hamming distance. The key novelty of the hashing technique proposed here is that it can be applied to spaces with arbitrary distance measures, including nonmetric distance measures. First, we describe a domainindependent method for constructing a family of binary hash functions. Then, we use these functions to construct multiple multibit hash tables. We show that the LSH formalism is not applicable for analyzing the behavior of these tables as index structures. We present a novel formulation, that uses statistical observations from sample data to analyze retrieval accuracy and efficiency for the proposed indexing method. Experiments on several realworld data sets demonstrate that our method produces good tradeoffs between accuracy and efficiency, and significantly outperforms VPtrees, which are a wellknown method for distancebased indexing. I.
Techniques for Similarity Searching in Multimedia Databases
, 2010
"... Techniques for similarity searching in multimedia databases are reviewed. This includes a discussion of the curse of dimensionality, as well as multidimensional indexing, distancebased indexing, and the actual search process which is realized by nearest neighbor finding. ..."
Abstract

Cited by 27 (3 self)
 Add to MetaCart
Techniques for similarity searching in multimedia databases are reviewed. This includes a discussion of the curse of dimensionality, as well as multidimensional indexing, distancebased indexing, and the actual search process which is realized by nearest neighbor finding.
Querysensitive embeddings
 In ACM International Conference on Management of Data (SIGMOD). 706–717. ACM Transactions on Database Systems, Vol. ?, No. ?, ? 20?. · Vassilis Athitsos et al
"... A common problem in many types of databases is retrieving the most similar matches to a query object. Finding those matches in a large database can be too slow to be practical, especially in domains where objects are compared using computationally expensive similarity (or distance) measures. Embeddi ..."
Abstract

Cited by 24 (11 self)
 Add to MetaCart
(Show Context)
A common problem in many types of databases is retrieving the most similar matches to a query object. Finding those matches in a large database can be too slow to be practical, especially in domains where objects are compared using computationally expensive similarity (or distance) measures. Embedding methods can significantly speed up retrieval by mapping objects into a vector space, where distances can be measured rapidly using a Minkowski metric. In this paper we present a novel way to improve embedding quality. In particular, we propose to construct embeddings that use a “querysensitive ” distance measure for the target space of the embedding. This distance measure is used to compare the vectors that the query and database objects are mapped to. The term “querysensitive ” means that the distance measure changes depending on the current query object. We demonstrate theoretically that using a querysensitive distance measure increases the modeling power of embeddings and allows them to capture more of the structure of the original space. We also demonstrate experimentally that querysensitive embeddings can significantly improve retrieval performance. In experiments with an image database of handwritten digits and a timeseries database, the proposed method outperforms existing stateoftheart nonEuclidean indexing methods, meaning that it provides significantly better tradeoffs between efficiency and retrieval accuracy.