• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Incremental Similarity Search in Multimedia Databases (2000)

by G R Hjaltason, H Samet
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 18
Next 10 →

Index-driven similarity search in metric spaces

by Gisli R. Hjaltason, Hanan Samet - ACM Transactions on Database Systems , 2003
"... Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some similarity measure. In this article, we focus on methods for similarity search th ..."
Abstract - Cited by 118 (6 self) - Add to MetaCart
Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some similarity measure. In this article, we focus on methods for similarity search that make the general assumption that similarity is represented with a distance metric d. Existing methods for handling similarity search in this setting typically fall into one of two classes. The first directly indexes the objects based on distances (distance-based indexing), while the second is based on mapping to a vector space (mapping-based approach). The main part of this article is dedicated to a survey of distance-based indexing methods, but we also briefly outline how search occurs in mapping-based methods. We also present a general framework for performing search based on distances, and present algorithms for common types of queries that operate on an arbitrary “search hierarchy. ” These algorithms can be applied on each of the methods presented, provided a suitable search hierarchy is defined.

Properties of embedding methods for similarity searching in metric spaces

by Gísli R. Hjaltason, Hanan Samet - PAMI , 2003
"... Complex data types—such as images, documents, DNA sequences, etc.—are becoming increasingly important in modern database applications. A typical query in many of these applications seeks to find objects that are similar to some target object, where (dis)similarity is defined by some distance functi ..."
Abstract - Cited by 70 (4 self) - Add to MetaCart
Complex data types—such as images, documents, DNA sequences, etc.—are becoming increasingly important in modern database applications. A typical query in many of these applications seeks to find objects that are similar to some target object, where (dis)similarity is defined by some distance function. Often, the cost of evaluating the distance between two objects is very high. Thus, the number of distance evaluations should be kept at a minimum, while (ideally) maintaining the quality of the result. One way to approach this goal is to embed the data objects in a vector space so that the distances of the embedded objects approximates the actual distances. Thus, queries can be performed (for the most part) on the embedded objects. In this paper, we are especially interested in examining the issue of whether or not the embedding methods will ensure that no relevant objects are left out (i.e., there are no false dismissals and, hence, the correct result is reported). Particular attention is paid to the SparseMap, FastMap, and MetricMap embedding methods. SparseMap is a variant of Lipschitz embeddings, while FastMap and MetricMap are inspired by dimension reduction methods for Euclidean spaces (using KLT or the related PCA and SVD). We show that, in general, none of these embedding methods guarantee that queries on the embedded objects have no false dismissals, while also demonstrating the limited cases in which the guarantee does hold. Moreover, we describe a variant of SparseMap that allows queries with no false dismissals. In addition, we show that with FastMap and MetricMap, the distances of the embedded objects can be much greater than the actual distances. This makes it impossible (or at least impractical) to modify FastMap and MetricMap to guarantee no false dismissals.

A compact space decomposition for effective metric indexing

by Gonzalo Navarro - Pattern Recognition Letters , 2005
"... Abstract The metric space model abstracts many proximity search problems, from nearest-neighborclassifiers to textual and multimedia information retrieval. In this context, an index is a data structure that speeds up proximity queries. However, indexes lose their efficiency as the intrinsicdata dime ..."
Abstract - Cited by 23 (6 self) - Add to MetaCart
Abstract The metric space model abstracts many proximity search problems, from nearest-neighborclassifiers to textual and multimedia information retrieval. In this context, an index is a data structure that speeds up proximity queries. However, indexes lose their efficiency as the intrinsicdata dimensionality increases. In this paper we present a simple index called list of clusters (LC), which is based on a compact partitioning of the data set. The LC is shown to require little space,to be suitable both for main and secondary memory implementations, and most importantly, to be very resistant to the intrinsic dimensionality of the data set. In this aspect our structure isunbeaten. We finish with a discussion of the role of unbalancing in metric space searching, and how it permits trading memory space for construction time. 1 Introduction The problem of proximity searching has received much attention in recent times, due to an increasing interest in manipulating and retrieving the more and more common multimedia data. Multimedia data have to be classified, forecasted, filtered, organized, and so on. Their manipulation poses new challenges to classifiers and function approximators. The well-known k-nearest neighbor (knn) classifier is a favorite candidate for this task for being simple enough and well understood. One of the main obstacles, however, of using this classifier for massive data classification is its linear complexity to find a set of k neighbors for a given query.

Probabilistic Proximity Searching Algorithms Based on Compact Partitions

by Benjamin Bustos, Gonzalo Navarro - Journal of Discrete Algorithms , 2002
"... The main bottleneck of the research in metric space searching is the so-called curse of dimensionality, which makes the task of searching some metric spaces intrinsically dicult, whatever algorithm is used. A recent trend to break this bottleneck resorts to probabilistic algorithms, where it has ..."
Abstract - Cited by 16 (7 self) - Add to MetaCart
The main bottleneck of the research in metric space searching is the so-called curse of dimensionality, which makes the task of searching some metric spaces intrinsically dicult, whatever algorithm is used. A recent trend to break this bottleneck resorts to probabilistic algorithms, where it has been shown that one can nd 99% of the relevant objects at a fraction of the cost of the exact algorithm. These algorithms are welcome in most applications because resorting to metric space searching already involves a fuzziness in the retrieval requirements.

Using the k-nearest neighbor graph for proximity searching in metric spaces

by Rodrigo Paredes, Edgar Chávez - In Proc. SPIRE’05, LNCS 3772 , 2005
"... Abstract. Proximity searching consists in retrieving from a database, objects that are close to a query. For this type of searching problem, the most general model is the metric space, where proximity is defined in terms of a distance function. A solution for this problem consists in building an off ..."
Abstract - Cited by 7 (3 self) - Add to MetaCart
Abstract. Proximity searching consists in retrieving from a database, objects that are close to a query. For this type of searching problem, the most general model is the metric space, where proximity is defined in terms of a distance function. A solution for this problem consists in building an offline index to quickly satisfy online queries. The ultimate goal is to use as few distance computations as possible to satisfy queries, since the distance is considered expensive to compute. Proximity searching is central to several applications, ranging from multimedia indexing and querying to data compression and clustering. In this paper we present a new approach to solve the proximity searching problem. Our solution is based on indexing the database with the knearest neighbor graph (knng), which is a directed graph connecting each element to its k closest neighbors. We present two search algorithms for both range and nearest neighbor queries which use navigational and metrical features of the knng graph. We show that our approach is competitive against current ones. For instance, in the document metric space our nearest neighbor search algorithms perform 30 % more distance evaluations than AESA using only a 0.25 % of its space requirement. In the same space, the pivot-based technique is completely useless. 1

Practical Construction of k-Nearest Neighbor Graphs in Metric Spaces

by Rodrigo Paredes, Edgar Chávez, Karina Figueroa, Gonzalo Navarro
"... Abstract. Let U be a set of elements and d a distance function defined among them. Let NNk(u) be the k elements in U − {u} which have the smallest distance towards u. The k-nearest neighbor graph (knng) is a weighted directed graph G(U, E) such that E = {(u, v), v ∈ NNk(u)}. We focus on the metric s ..."
Abstract - Cited by 6 (3 self) - Add to MetaCart
Abstract. Let U be a set of elements and d a distance function defined among them. Let NNk(u) be the k elements in U − {u} which have the smallest distance towards u. The k-nearest neighbor graph (knng) is a weighted directed graph G(U, E) such that E = {(u, v), v ∈ NNk(u)}. We focus on the metric space context, so d is a metric. Several knng construction algorithms are known, but they are not suitable to general metric spaces. We present a general methodology to construct knngs that exploits several features of metric spaces, requiring empirically around O(n 1.27) distance computations for low and medium dimensional spaces, and O(n 1.90) for high dimensional ones. Keywords: Graph Algorithms, Metric Spaces, Nearest Neighbors. 1

Dynamic Spatial Approximation Trees for Massive Data

by Gonzalo Navarro, Nora Reyes
"... Abstract—Metric space searching is an emerging technique to address the problem of efficient similarity searching in many applications, including multimedia databases and other repositories handling complex objects. Although promising, the metric space approach is still immature in several aspects t ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
Abstract—Metric space searching is an emerging technique to address the problem of efficient similarity searching in many applications, including multimedia databases and other repositories handling complex objects. Although promising, the metric space approach is still immature in several aspects that are well established in traditional databases. In particular, most indexing schemes are not dynamic, that is, few of them tolerate insertion of elements at reasonable cost over an existing index and only a few work efficiently in secondary memory. In this paper we introduce a secondary-memory variant of the Dynamic Spatial Approximation Tree, which has shown to be competitive in main memory. The resulting index handles well the secondary memory scenario and is competitive with the state of the art, becoming a useful alternative in a wide range of database applications. Moreover, our ideas are applicable to other secondary-memory trees where there is little control over the tree shape. I.

Practical construction of k nearest neighbor graphs in metric spaces

by Rodrigo Paredes, Gonzalo Navarro , 2005
"... Abstract. Let U be a set of elements and d a distance function defined among them. Let NNk(u)d be the k elements in U − {u} which have the smallest distance to u. The k-nearest neighbors graph (knng) is a directed graph G(U, E) such that E = {(u, v, d(u, v)), v ∈ NNk(u)d}. We focus on the metric spa ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
Abstract. Let U be a set of elements and d a distance function defined among them. Let NNk(u)d be the k elements in U − {u} which have the smallest distance to u. The k-nearest neighbors graph (knng) is a directed graph G(U, E) such that E = {(u, v, d(u, v)), v ∈ NNk(u)d}. We focus on the metric space context, so d is a metric. Several knngs construction algorithms are known, but they are not suitable to general metric spaces. We present two practical algorithms to construct knngs that exploit several features of metric spaces, obtaining time costs of the form O(n 1.63..2.24 k 0.02..0.59), and using O(n 0.91..1.96 k 0.04..0.66) distance computations. 1

EFFICIENT ARCHIVAL DATA STORAGE

by Lawrence You , 2006
"... ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract not found

Memory-Adaptative Dynamic Spatial Approximation Trees

by Diego Arroyuelo, Francisca Muñoz, Gonzalo Navarro, Nora Reyes , 2003
"... Dynamic spatial approximation trees (dsa{trees for short) have shown to be suitable data structures for searching in high dimensional metric spaces. However, if sucient storage space is available, pivoting schemes beat dsa{trees in any metric space. In this paper we present new data structures for p ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Dynamic spatial approximation trees (dsa{trees for short) have shown to be suitable data structures for searching in high dimensional metric spaces. However, if sucient storage space is available, pivoting schemes beat dsa{trees in any metric space. In this paper we present new data structures for proximity searching in metric spaces, based on combining the concepts of spatial approximation and pivot based algorithms.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University