## Speeding up Spatial Approximation Search in Metric Spaces

### Cached

### Download Links

### BibTeX

@MISC{Figueroa_speedingup,

author = {Karina Figueroa and Edgar Chavez and Gonzalo Navarro and Rodrigo Paredes},

title = {Speeding up Spatial Approximation Search in Metric Spaces},

year = {}

}

### OpenURL

### Abstract

Proximity searching consists in retrieving from a database those elements that are similar to a query object. The usual model for proximity searching is a metric space where the distance, which models the proximity, is expensive to compute. An index uses precomputed distances to speed up query processing. Among all the known indices, the baseline for performance for about twenty years has been AESA. This index uses an iterative procedure, where at each iteration it first chooses the next promising element (“pivot”) to compare to the query, and then it discards database elements that can be proved not relevant to the query using the pivot. The next pivot in AESA is chosen as the one minimizing the sum of lower bounds to the distance to the query proved by previous pivots. In this paper we introduce the new index iAESA, which establishes a new performance baseline for metric space searching. The difference with AESA is the method to select the next pivot. In iAESA, each candidate sorts previous pivots by closeness to it, and chooses the next pivot as the candidate whose order is most similar to that of the query. We also propose a modification to AESA-like algorithms to turn them into probabilistic algorithms. Our empirical results confirm a consistent improvement in query performance. For example, we perform as few as 60 % of the distance evaluations of AESA in a database of documents, a

### Citations

2362 | Modern Information Retrieval
- Baeza-Yates, Ribeiro-Neto
- 1999
(Show Context)
Citation Context ...closest object found so far. 4. EXPERIMENTAL RESULTS We experimented on different synthetic and real-life metric databases. The real-life metric spaces are TREC-3 documents under the cosine distance [=-=Baeza-Yates and Ribeiro-Neto 1999-=-], and a database of feature vectors of face images under Euclidean distance [Navarrete and Ruiz-del-Solar 2002]. The synthetic metric space examples are random vectors in the unitary cube under the E... |

773 | An optimal algorithm for approximate nearest neighbor searching in fixed dimensions
- Arya, Mount, et al.
- 1998
(Show Context)
Citation Context ...urned into probabilistic, by letting it work until some predefined work threshold and measuring how many relevant answers it found. Probabilistic algorithms have been proposed both for vector spaces [=-=Arya et al. 1994-=-; White and Jain 1996] and for general metric spaces [Clarkson 1999; Ciaccia and Patella 2002; Chávez and Navarro 2003; Bustos and Navarro 2003]. Bustos and Navarro [2003] define a probabilistic algor... |

369 |
The FERET database and evaluation procedure for face recognition algorithms
- Phillips, Wechsler, et al.
- 1998
(Show Context)
Citation Context ... and n=3000. 4.2.2 FERET Face Images Database. Many real databases are composed of few objects, each of them with very high intrinsic dimension. This is the case of the FERET database of face images [=-=Phillips et al. 1998-=-]. We used a target set with 762 images of 254 different classes (three images per class), and a set of 254 queries (1 image per class). Here each class is a person, the three images in the class are ... |

322 | Searching in Metric Spaces
- Cha´vez, Navarro, et al.
- 2001
(Show Context)
Citation Context ... among database elements. This information is used later to discard some elements without comparing them directly with the query object. Different indices store different information about distances [=-=Chávez et al. 2001-=-]. Some store a subset of the distances, e.g. all the distances between k chosen pivots and all the rest, or all the distances between an element and its subtree, in a treestructured index. Some indic... |

228 | Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann - Samet - 2005 |

193 | Comparing top k lists
- Fagin, Kumar, et al.
- 2003
(Show Context)
Citation Context ...n the literature for measuring dissimilarities between permutations. In previous work [Chávez et al. 2008] we experimented with a few of them, such as Kendall Tau, Spearman Rho, or Spearman Footrule [=-=Fagin et al. 2003-=-]. We found that their performance, for our purposes, were similar. Since Spearman Footrule is the least expensive to compute among those, we choose it as our dissimilarity measure. Spearman Footrule ... |

134 | Index-Driven Similarity Search in Metric Spaces - Hjaltason, Samet - 2003 |

112 | Nearest neighbor queries in metric spaces
- Clarkson
- 1999
(Show Context)
Citation Context ...k threshold and measuring how many relevant answers it found. Probabilistic algorithms have been proposed both for vector spaces [Arya et al. 1994; White and Jain 1996] and for general metric spaces [=-=Clarkson 1999-=-; Ciaccia and Patella 2002; Chávez and Navarro 2003; Bustos and Navarro 2003]. Bustos and Navarro [2003] define a probabilistic algorithm using a technique relevant to this work. They use different cr... |

70 |
Algorithms and strategies for similarity retrieval
- White, Jain
- 1996
(Show Context)
Citation Context ...listic, by letting it work until some predefined work threshold and measuring how many relevant answers it found. Probabilistic algorithms have been proposed both for vector spaces [Arya et al. 1994; =-=White and Jain 1996-=-] and for general metric spaces [Clarkson 1999; Ciaccia and Patella 2002; Chávez and Navarro 2003; Bustos and Navarro 2003]. Bustos and Navarro [2003] define a probabilistic algorithm using a techniqu... |

56 |
An algorithm for finding nearest neighbors in (approximately) constant average time
- Vidal
- 1986
(Show Context)
Citation Context ...ed to exhaustive surveys [Chávez et al. 2001; Hjaltason and Samet 2003] or books [Zezula et al. 2006; Samet 2006]. We will focus on the canonical algorithm that uses all the possible distances, AESA [=-=Vidal 1986-=-]. For 20 years AESA has been the indexing technique requiring, by far, the least number of distance computations among all other indices (which require much less space). In this paper we show, for th... |

40 |
A new version of the nearest-neighbor approximating and eliminating search (AESA) with linear preprocessing-time and memory requirements
- Micó, Oncina, et al.
- 1994
(Show Context)
Citation Context ... years the algorithm that computes the least number of distance evaluations to answer proximity queries. There have been some algorithms aimed at reducing its preprocessing time or space used. LAESA [=-=Micó et al. 1994-=-] chooses k elements of U as potential pivots, then reducing the space to O(kn). An improved version of LAESA is Tree LAESA [Micó et al. 1996] which achieves sublinear side computations at query time ... |

31 | Searching in metric spaces with user-defined and approximate distances - Ciaccia, Patella |

28 | A compact space decomposition for effective metric indexing
- Chávez, Navarro
- 2005
(Show Context)
Citation Context ... 14 and n = 5, 000, and we retrieved different number of nearest neighbors. Just to confirm that AESA is by far the best performing index, we compare in Figure 4 AESA and iAESA with List of Clusters [=-=Chávez and Navarro 2005-=-], ACM Journal Name, Vol. V, No. N, Month 20YY.Speeding up Spatial Approximation Search · 11 3500 3000 Distance evaluations 2500 2000 1500 1000 500 iAESA AESA List of Clusters 0 1 1.5 2 2.5 3 3.5 4 4... |

27 | A fast branch and bound nearest neighbour classifier in metric spaces
- Mic, Oncina, et al.
- 1996
(Show Context)
Citation Context ...d at reducing its preprocessing time or space used. LAESA [Micó et al. 1994] chooses k elements of U as potential pivots, then reducing the space to O(kn). An improved version of LAESA is Tree LAESA [=-=Micó et al. 1996-=-] which achieves sublinear side computations at query time with just approximately twice the average number of distance evaluations. Reduced Overhead AESA [Vilar 1995] strictly calculates the same dis... |

22 |
Searching in highdimensional spaces:index structures for improving the per
- Bohm, Berchtold, et al.
(Show Context)
Citation Context ...tary Cube. The performance of state-of-the-art proximity searching algorithms when answering both range and k-nearest neighbor queries worsens as the dimension of the space grows [Chávez et al. 2001; =-=Böhm et al. 2001-=-]. Therefore, it is interesting to experiment in spaces with different dimensions. A way to control the dimension of the space is to generate synthetic sets uniformly distributed in the unitary cube, ... |

21 | G,2007: “Effective proximity retrieval by ordering permutations
- Chavez, Figueroa, et al.
(Show Context)
Citation Context ...similarity between Πu and Πq. Tie breaking in permutations will be discussed shortly. There are several choices in the literature for measuring dissimilarities between permutations. In previous work [=-=Chávez et al. 2008-=-] we experimented with a few of them, such as Kendall Tau, Spearman Rho, or Spearman Footrule [Fagin et al. 2003]. We found that their performance, for our purposes, were similar. Since Spearman Footr... |

16 | Probabilistic proximity searching algorithms based on compact partitions - Bustos, Navarro - 2002 |

15 | Analysis and Comparison of Eigenspace-Based Face Recognition Approaches - Navarrete, Ruiz-del-Solar - 2002 |

13 | G.: Probabilistic proximity search: Fighting the curse of dimensionality in metric spaces - Chávez, Navarro - 2003 |

11 |
Reducing the overhead of the AESA metric-space nearest neighbor searching algorithm
- Vilar
- 1995
(Show Context)
Citation Context ...ion of LAESA is Tree LAESA [Micó et al. 1996] which achieves sublinear side computations at query time with just approximately twice the average number of distance evaluations. Reduced Overhead AESA [=-=Vilar 1995-=-] strictly calculates the same distances as AESA but reduces the query processing time. Recently, graph t-spanner indices [Navarro et al. 2007] were used to simulate AESA, almost reaching its number o... |

8 | Graphs for Metric Space Searching
- Paredes
- 2008
(Show Context)
Citation Context ...the average number of distance evaluations. Reduced Overhead AESA [Vilar 1995] strictly calculates the same distances as AESA but reduces the query processing time. Recently, graph t-spanner indices [=-=Navarro et al. 2007-=-] were used to simulate AESA, almost reaching its number of distance calculations using less memory. In fact, all ACM Journal Name, Vol. V, No. N, Month 20YY.Speeding up Spatial Approximation Search ... |

7 | Optimal incremental sorting - Paredes, Navarro - 2006 |

5 | Engineering efficient metric indexes
- Fredriksson
- 2007
(Show Context)
Citation Context ...ly avoided because O(n 2 ) space is unacceptable for realistic database applications. However, the space is affordable in some areas such as pattern recognition, as well as to index database subsets [=-=Fredriksson 2007-=-]. What is especially relevant of this approach is that the use of all the available information establishes a baseline on how good could an index possibly be. Actually, all the development on metric ... |

4 | Approximate similarity search: A multi-faceted problem - Patella, Ciaccia |