MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Fast Approximate String Matching in a Dictionary (1998) [16 citations — 7 self]

by Ricardo Baeza-yates ,  Gonzalo Navarro
In Proc. SPIRE'98
Add To MetaCart

Abstract:

A successful technique to search large textual databases allowing errors relies on an online search in the vocabulary of the text. To reduce the time of that online search, we index the vocabulary as a metric space. We show that with reasonable space overhead we can improve by a factor of two over the fastest online algorithms, when the tolerated error level is low (which is reasonable in text searching). 1 Introduction Approximate string matching is a recurrent problem in many branches of computer science, with applications to text searching, computational biology, pattern recognition, signal processing, etc. The problem can be stated as follows: given a long text of length n, and a (comparatively short) pattern of length m, retrieve all the segments (or "occurrences") of the text whose edit distance to the pattern is at most k. The edit distance ed() between two strings is defined as the minimum number of character insertions, deletions and replacements needed to make them equal. I...

Citations

313 FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia databases – FALOUTSOS, LIN - 1995
249 Fast text search allowing errors – Manber, Wu - 1992
196 The theory and computation of evolutionary distances: pattern recognition – Sellers - 1980
175 Overview of the third text retrieval conference – Harman - 1994
171 GLIMPSE: A Tool to Search Through Entire File Systems – Manber, Wu - 1994
163 Data structures and algorithms for nearest neighbor search in general metric spaces – Yianilos - 1993
131 Near neighbor search in large metric spaces. 21st VLDB – Brin - 1995
127 Finding approximate patterns in strings – Ukkonen - 1985
114 Satisfying general proximity/similarity queries with metric trees – Uhlmann - 1991
103 A fast bit-vector algorithm for approximate string matching based on dynamic programming – Myers - 1999
97 Some approaches to best-match file searching – Burkhard, Keller - 1973
83 Information Retrieval, Computational and Theoretical Aspects – Heaps - 1978
46 Proximity matching using fixedqueries trees – Baeza-Yates, Cunto, et al.
46 A faster algorithm for approximate string matching – Baeza-Yates, Navarro - 1996
45 Theoretical and empirical comparisons of approximate string matching algorithms – Chang, Lampe - 1992
44 Fast string matching with k differences – Landau, Vishkin - 1988
42 Text retrieval: Theory and practice – Baeza-Yates - 1992
38 On using q-gram locations in approximate string matching – Sutinen, Tarhio - 1995
34 Block-addressing indices for approximate text retrieval – Baeza-Yates, Navarro - 1998
33 Large Text Searching Allowing Errors – Araújo, Navarro, et al. - 1997
29 Fast and practical approximate pattern matching – Baeza-Yates, Perleberg - 1996
28 The choice of reference points in best-match file searching – Shapiro - 1977
20 An algorithm for finding nearest neighbours in (approximately) constant average time – Vidal - 1986