Results 1 - 10
of
13
Index-driven similarity search in metric spaces
- ACM Transactions on Database Systems
, 2003
"... Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some similarity measure. In this article, we focus on methods for similarity search th ..."
Abstract
-
Cited by 118 (6 self)
- Add to MetaCart
Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some similarity measure. In this article, we focus on methods for similarity search that make the general assumption that similarity is represented with a distance metric d. Existing methods for handling similarity search in this setting typically fall into one of two classes. The first directly indexes the objects based on distances (distance-based indexing), while the second is based on mapping to a vector space (mapping-based approach). The main part of this article is dedicated to a survey of distance-based indexing methods, but we also briefly outline how search occurs in mapping-based methods. We also present a general framework for performing search based on distances, and present algorithms for common types of queries that operate on an arbitrary “search hierarchy. ” These algorithms can be applied on each of the methods presented, provided a suitable search hierarchy is defined.
Searching in Metric Spaces by Spatial Approximation
, 1999
"... We propose a new data structure to search in metric spaces. A metric space is formed by a collection of objects and a distance function defined among them, which satisfies the triangle inequality. The goal is, given a set of objects and a query, retrieve those objects close enough to the query. The ..."
Abstract
-
Cited by 62 (20 self)
- Add to MetaCart
We propose a new data structure to search in metric spaces. A metric space is formed by a collection of objects and a distance function defined among them, which satisfies the triangle inequality. The goal is, given a set of objects and a query, retrieve those objects close enough to the query. The complexity measure is the number of distances computed to achieve this goal. Our data structure, called sa-tree ("spatial approximation tree"), is based on approaching spatially the searched objects, that is, getting closer and closer to them, rather than the classical divide-and-conquer approach of other data structures. We analyze our method and show that the number of distance evaluations to search among n objects is sublinear. We show experimentally that the sa-tree is the best existing technique when the metric space is hard to search or the query has low selectivity. These are the most important unsolved cases in real applications. As a practical advantage, our data structure is one of the few that do not need to tune parameters, which makes it appealing for use by non-experts.
A compact space decomposition for effective metric indexing
- Pattern Recognition Letters
, 2005
"... Abstract The metric space model abstracts many proximity search problems, from nearest-neighborclassifiers to textual and multimedia information retrieval. In this context, an index is a data structure that speeds up proximity queries. However, indexes lose their efficiency as the intrinsicdata dime ..."
Abstract
-
Cited by 23 (6 self)
- Add to MetaCart
Abstract The metric space model abstracts many proximity search problems, from nearest-neighborclassifiers to textual and multimedia information retrieval. In this context, an index is a data structure that speeds up proximity queries. However, indexes lose their efficiency as the intrinsicdata dimensionality increases. In this paper we present a simple index called list of clusters (LC), which is based on a compact partitioning of the data set. The LC is shown to require little space,to be suitable both for main and secondary memory implementations, and most importantly, to be very resistant to the intrinsic dimensionality of the data set. In this aspect our structure isunbeaten. We finish with a discussion of the role of unbalancing in metric space searching, and how it permits trading memory space for construction time. 1 Introduction The problem of proximity searching has received much attention in recent times, due to an increasing interest in manipulating and retrieving the more and more common multimedia data. Multimedia data have to be classified, forecasted, filtered, organized, and so on. Their manipulation poses new challenges to classifiers and function approximators. The well-known k-nearest neighbor (knn) classifier is a favorite candidate for this task for being simple enough and well understood. One of the main obstacles, however, of using this classifier for massive data classification is its linear complexity to find a set of k neighbors for a given query.
Incremental Similarity Search in Multimedia Databases
, 2000
"... Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some distance measure d, usually a distance metric. Existing methods for handling simi ..."
Abstract
-
Cited by 22 (2 self)
- Add to MetaCart
Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some distance measure d, usually a distance metric. Existing methods for handling similarity search in this setting fall into one of two classes. The first is based on mapping to a low-dimensionalvector space (making use of data structures such as the R-tree), while the second directly indexes the objects based on distances (making use of data structures such as the M-tree). We introduce a general framework for performing search based on distances, and present an incremental nearest neighbor algorithm that operates on an arbitrary "search hierarchy". We show how this framework can be applied in both classes of similarity search methods, by defining a suitable search hierarchy for a number of different indexing structures. Armed with an appropriate search hierarchy, our algorithm thus performs incremental similarity search, wherein the result objects are reported one by one in order of similarity to a query object, with as little effort as possible expended to produce each new result object. This is especially important in interactive database applications, as it makes it possible to display partial query results early. The incremental aspect also provides significant benefits in situations when the number of desired neighbors is unknown in advance. Furthermore, our algorithm is at least as efficient as existing k-nearest neighbor algorithms, in terms of the number of distance computations and index node accesses. In fact, provided that the search hierarchy is properly defined, our algorithm can be shown to be optimal in the sense of performing as few distance ...
Fully Dynamic Spatial Approximation Trees
- In Proceedings of the 9th International Symposium on String Processing and Information Retrieval (SPIRE 2002), LNCS 2476
, 2002
"... The Spatial Approximation Tree (sa-tree) is a recently proposed data structure for searching in metric spaces. It has been shown that it compares favorably against alternative data structures in spaces of high dimension or queries with low selectivity. Its main drawbacks are: costly construction ..."
Abstract
-
Cited by 22 (12 self)
- Add to MetaCart
The Spatial Approximation Tree (sa-tree) is a recently proposed data structure for searching in metric spaces. It has been shown that it compares favorably against alternative data structures in spaces of high dimension or queries with low selectivity. Its main drawbacks are: costly construction time, poor performance in low dimensional spaces or queries with high selectivity, and the fact of being a static data structure, that is, once built, one cannot add or delete elements.
An Effective Clustering Algorithm to Index High Dimensional Metric Spaces
"... A metric space consists of a collection of objects and a distance function defined among them, which satisfies the triangular inequality. The goal is to preprocess the set so that, given a set of objects and a query, retrieve those objects close enough to the query. The number of distances computed ..."
Abstract
-
Cited by 18 (8 self)
- Add to MetaCart
A metric space consists of a collection of objects and a distance function defined among them, which satisfies the triangular inequality. The goal is to preprocess the set so that, given a set of objects and a query, retrieve those objects close enough to the query. The number of distances computed to achieve this goal is the complexity measure. The problem is very difficult in the so-called high-dimensional metric spaces, where the histogram of distances has a large mean and a small variance. A recent survey on methods to index metric spaces has shown that the so-called clustering algorithms are better suited than their competitors, pivotbased algorithms, to cope with high-dimensional metric spaces. In this paper we present a new clustering method that achieves much better performance than all the existing data structures. We present analytical and experimental results that support our claims and that give the users the tuning parameters to make optimal use of this data structure.
A Fast Nearest Neighbor Algorithm Based on a Principal Axis Search Tree
, 2001
"... A new fast nearest neighbor algorithm is described that uses principal component analysis to build an efficient search tree. At each node in the tree, the data set is partitioned along the direction of maximum variance. The search algorithm efficiently uses a depth-first-search and a new elimination ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
A new fast nearest neighbor algorithm is described that uses principal component analysis to build an efficient search tree. At each node in the tree, the data set is partitioned along the direction of maximum variance. The search algorithm efficiently uses a depth-first-search and a new elimination criterion. The new algorithm was compared to sixteen other fast nearest neighbor algorithms on three types of common benchmark data sets including problems from time series prediction and image vector quantization. This comparative study illustrates the strengths and weaknesses of all of the leading algorithms. The new algorithm performed very well on all of the data sets and was consistently ranked among the top three algorithms.
A Nearest Trajectory Strategy for Time Series Prediction
, 1998
"... This paper proposes a nonparametric forecasting method for univariate time series that contain little or no noise. For practical purposes it is assumed that the time series is generated by a nonlinear dynamic system governed by the following equations, ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
This paper proposes a nonparametric forecasting method for univariate time series that contain little or no noise. For practical purposes it is assumed that the time series is generated by a nonlinear dynamic system governed by the following equations,
Fast Nearest-Neighbor Search Algorithms Based on Approximation-Elimination Search
- In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms
, 2000
"... In this paper, we provide an overview of fast nearest-neighbor search algorithms based on an &approxima- tion}elimination' framework under a class of elimination rules, namely, partial distance elimination, hypercube elimination and absolute-error-inequality elimination derived from approximations o ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
In this paper, we provide an overview of fast nearest-neighbor search algorithms based on an &approxima- tion}elimination' framework under a class of elimination rules, namely, partial distance elimination, hypercube elimination and absolute-error-inequality elimination derived from approximations of Euclidean distance. Previous algorithms based on these elimination rules are reviewed in the context of approximation}elimination search. The main emphasis in this paper is a comparative study of these elimination constraints with reference to their approximation}elimination e$ciency set within di!erent approximation schemes. # 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.
Winning entry of the K. U. leuven time series prediction competition
- International Journal of Bifurcation and Chaos
, 1999
"... In this paper we describe the winning entry of the time series prediction competition which was part of the International Workshop on Advanced Black-Box Techniques for Nonlinear Modeling,held at K.U. Leuven, Belgium on July 8–10, 1998. We also describe the source of the data set, a nonlinear transfo ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this paper we describe the winning entry of the time series prediction competition which was part of the International Workshop on Advanced Black-Box Techniques for Nonlinear Modeling,held at K.U. Leuven, Belgium on July 8–10, 1998. We also describe the source of the data set, a nonlinear transform of a 5-scroll generalized Chua’s circuit. Participants were given 2000 data points and were asked to predict the next 200 points in the series. The winning entry exploited symmetry that was discovered during exploratory data analysis and a method of local modeling designed specifically for the prediction of chaotic time series. This method includes an exponentially weighted metric, a nearest trajectory algorithm, integrated local averaging, and a novel multi-step ahead cross-validation estimation of model error for the purpose of parameter optimization. 2 1

