## Fast and Practical Approximate String Matching (1992)

Venue: | In Combinatorial Pattern Matching, Third Annual Symposium |

Citations: | 53 - 0 self |

### BibTeX

@INPROCEEDINGS{Baeza-yates92fastand,

author = {Ricardo A. Baeza-yates and Chris H. Perleberg},

title = {Fast and Practical Approximate String Matching},

booktitle = {In Combinatorial Pattern Matching, Third Annual Symposium},

year = {1992},

pages = {185--192},

publisher = {Springer-Verlag}

}

### Years of Citing Articles

### OpenURL

### Abstract

We present new algorithms for approximate string matching based in simple, but efficient, ideas. First, we present an algorithm for string matching with mismatches based in arithmetical operations that runs in linear worst case time for most practical cases. This is a new approach to string searching. Second, we present an algorithm for string matching with errors based on partitioning the pattern that requires linear expected time for typical inputs. 1 Introduction Approximate string matching is one of the main problems in combinatorial pattern matching. Recently, several new approaches emphasizing the expected search time and practicality have appeared [3, 4, 27, 32, 31, 17], in contrast to older results, most of them are only of theoretical interest. Here, we continue this trend, by presenting two new simple and efficient algorithms for approximate string matching. First, we present an algorithm for string matching with k mismatches. This problem consists of finding all instances o...

### Citations

628 |
Fast pattern matching in strings
- Knuth, Morris
- 1977
(Show Context)
Citation Context ...rors. Here we combine the idea with traditional multiple string searching algorithms. The simplest algorithm is to build an Aho-Corasick machine [2] (the extension of the KnuthMorris -Pratt algorithm =-=[18]-=- to search for multiple patterns) for the k + 1 blocks of length r (less blocks if some of them are equal). For every match found, we extend the match, checking if there are at most k errors, by using... |

531 |
Efficient string matching: An aid to bibliographic search
- Aho, Corasick
- 1975
(Show Context)
Citation Context ...f the shift-or algorithm [3] to string matching with errors. Here we combine the idea with traditional multiple string searching algorithms. The simplest algorithm is to build an Aho-Corasick machine =-=[2]-=- (the extension of the KnuthMorris -Pratt algorithm [18] to search for multiple patterns) for the k + 1 blocks of length r (less blocks if some of them are equal). For every match found, we extend the... |

318 |
Fast text search allowing errors
- Manber, Wu
- 1992
(Show Context)
Citation Context ...duction Approximate string matching is one of the main problems in combinatorial pattern matching. Recently, several new approaches emphasizing the expected search time and practicality have appeared =-=[3, 4, 27, 32, 31, 17]-=-, in contrast to older results, most of them are only of theoretical interest. Here, we continue this trend, by presenting two new simple and efficient algorithms for approximate string matching. Firs... |

224 | A new approach to text searching
- Baeza-Yates, Gonnet
- 1992
(Show Context)
Citation Context ...duction Approximate string matching is one of the main problems in combinatorial pattern matching. Recently, several new approaches emphasizing the expected search time and practicality have appeared =-=[3, 4, 27, 32, 31, 17]-=-, in contrast to older results, most of them are only of theoretical interest. Here, we continue this trend, by presenting two new simple and efficient algorithms for approximate string matching. Firs... |

162 |
Handbook of Algorithms and Data Structures
- Gonnet
- 1984
(Show Context)
Citation Context ...ralizes to any algorithm based in partitioning the pattern, as in [32], and is simpler than Chang and Lawler's algorithm [9]. For a complete set of references on these problems we refer the reader to =-=[14]-=-. A preliminary version of this paper was presented in [7]. 2 String Matching with Mismatches First let us describe the string matching with k mismatches algorithm using the best (and practical) case ... |

150 |
Finding approximate patterns in strings
- Ukkonen
- 1985
(Show Context)
Citation Context ...log(k) or c = p k. For these cases the expected time is better than the best worst case known of O(kn). Figure 3 gives experimental results for the following algorithms in addition to ours: Ukkonen's =-=[28]-=-, Chang's [8], Wu and Manber's [32] and Wu, Manber and Myers's [33]. Our algorithm uses a multiple string searching algorithm based in the Boyer-Moore-Sunday [26] string searching algorithm and Ukkone... |

130 | agrep - A Fast Approximate Pattern-Matching Tool
- Wu, Manber
- 1991
(Show Context)
Citation Context ...duction Approximate string matching is one of the main problems in combinatorial pattern matching. Recently, several new approaches emphasizing the expected search time and practicality have appeared =-=[3, 4, 27, 32, 31, 17]-=-, in contrast to older results, most of them are only of theoretical interest. Here, we continue this trend, by presenting two new simple and efficient algorithms for approximate string matching. Firs... |

114 |
String matching and other products
- Fischer, Paterson
- 1974
(Show Context)
Citation Context ... underlying alphabet (typically ASCII symbols). The only algorithm which is similar to ours is a O(n +R log log m) result for insertions and deletions, reported in [21]. Related ideas can be found in =-=[11, 1]-=-. String matching with errors consists of finding all substrings of the text that have at most k errors with the pattern, where the errors are counted as the minimal number of insertions, deletions, o... |

107 |
A very fast substring search algorithm
- Sunday
- 1990
(Show Context)
Citation Context ...rithms in addition to ours: Ukkonen's [28], Chang's [8], Wu and Manber's [32] and Wu, Manber and Myers's [33]. Our algorithm uses a multiple string searching algorithm based in the Boyer-Moore-Sunday =-=[26]-=- string searching algorithm and Ukkonen's [28] dynamic programming algorithm for the checking phase. We used a random text of size 1Mb with an alphabet of size 32. The values are the average of ten se... |

104 |
Generalized string matching
- Abrahamson
- 1987
(Show Context)
Citation Context ... underlying alphabet (typically ASCII symbols). The only algorithm which is similar to ours is a O(n +R log log m) result for insertions and deletions, reported in [21]. Related ideas can be found in =-=[11, 1]-=-. String matching with errors consists of finding all substrings of the text that have at most k errors with the pattern, where the errors are counted as the minimal number of insertions, deletions, o... |

84 |
A string matching algorithm fast on the average
- Commentz-Walter
- 1979
(Show Context)
Citation Context ...ngth). For larger k, Chang's and Wu, Manber and Myers's algorithms are the fastest. We can improve the searching phase by using multiple string searching algorithms based on the Boyer-Moore algorithm =-=[10, 25, 6]-=- or the shift-or algorithm [3, 31]. This improvement is significant when the number of blocks found is small (or in other words, when the alphabet is large). To improve the check phase, we need to dec... |

78 |
1990]. \An improved algorithm for approximate string matching
- Galil, Park
(Show Context)
Citation Context ...pattern, where the errors are counted as the minimal number of insertions, deletions, or substitutions of characters needed to convert one string to another. The best worst case running time is O(kn) =-=[20, 13, 29]-=-. Recently, practical algorithms have been proposed [27, 32, 31], the latter This work was partially supported by Grant 1950622 from Fondecyt. E-mail: rbaeza@dcc.uchile.cl. 0 0 2 2 1 0 0 3 0 0 0 0 0 0... |

53 | G.: A faster algorithm for approximate string matching
- Baeza-Yates, Navarro
- 1996
(Show Context)
Citation Context ...have a good algorithm to search patterns with at most 1 or 2 errors. This could be used to search for blocks of length 2r or 3r, decreasing the number of potential matches. This idea has been used in =-=[5]-=-. Our algorithm shows that reducing a text searching problem to simpler problems to find potential answers may lead to simpler and fast algorithms, if the number of potential matches to check is small... |

53 |
Fast string matching with k differences
- Landau, Vishkin
- 1988
(Show Context)
Citation Context ...pattern, where the errors are counted as the minimal number of insertions, deletions, or substitutions of characters needed to convert one string to another. The best worst case running time is O(kn) =-=[20, 13, 29]-=-. Recently, practical algorithms have been proposed [27, 32, 31], the latter This work was partially supported by Grant 1950622 from Fondecyt. E-mail: rbaeza@dcc.uchile.cl. 0 0 2 2 1 0 0 3 0 0 0 0 0 0... |

48 |
Theoretical and empirical comparisons of approximate string matching algorithms
- Chang, Lampe
- 1992
(Show Context)
Citation Context ...p k. For these cases the expected time is better than the best worst case known of O(kn). Figure 3 gives experimental results for the following algorithms in addition to ours: Ukkonen's [28], Chang's =-=[8]-=-, Wu and Manber's [32] and Wu, Manber and Myers's [33]. Our algorithm uses a multiple string searching algorithm based in the Boyer-Moore-Sunday [26] string searching algorithm and Ukkonen's [28] dyna... |

48 | A subquadratic algorithm for approximate limited expression matching
- Wu, Manber, et al.
- 1996
(Show Context)
Citation Context ... the best worst case known of O(kn). Figure 3 gives experimental results for the following algorithms in addition to ours: Ukkonen's [28], Chang's [8], Wu and Manber's [32] and Wu, Manber and Myers's =-=[33]-=-. Our algorithm uses a multiple string searching algorithm based in the Boyer-Moore-Sunday [26] string searching algorithm and Ukkonen's [28] dynamic programming algorithm for the checking phase. We u... |

46 |
Approximate string matching in sublinear expected time
- CHANG, LAWLER
- 1990
(Show Context)
Citation Context ...ected search time for ksO(m= log m) using O(m 2 ) extra space. This result generalizes to any algorithm based in partitioning the pattern, as in [32], and is simpler than Chang and Lawler's algorithm =-=[9]-=-. For a complete set of references on these problems we refer the reader to [14]. A preliminary version of this paper was presented in [7]. 2 String Matching with Mismatches First let us describe the ... |

46 | Fast string searching
- Hume, Sunday
- 1991
(Show Context)
Citation Context |

40 |
Efficient string matching with k mismatches
- Landau, Vishkin
- 1986
(Show Context)
Citation Context ...m, solvable in O(n) time. Various algorithms have been developed to solve the problem of string matching with k mismatches. Running times have ranged from O(mn) for the brute force algorithm to O(kn) =-=[19, 12]-=- or O(n log m) [4, 15]. In this paper we present a simple algorithm (one page of C code) that runs in O(n) time, worst case, if all the characters p i in P are distinct (i.e. none of the characters in... |

31 |
Fast and practical approximate pattern matching
- Baeza-Yates, Perleberg
- 1996
(Show Context)
Citation Context ... as in [32], and is simpler than Chang and Lawler's algorithm [9]. For a complete set of references on these problems we refer the reader to [14]. A preliminary version of this paper was presented in =-=[7]-=-. 2 String Matching with Mismatches First let us describe the string matching with k mismatches algorithm using the best (and practical) case of only distinct characters in P . In this case, each char... |

27 |
Improved string matching with k mismatches
- Galil, Giancarlo
- 1986
(Show Context)
Citation Context ...m, solvable in O(n) time. Various algorithms have been developed to solve the problem of string matching with k mismatches. Running times have ranged from O(mn) for the brute force algorithm to O(kn) =-=[19, 12]-=- or O(n log m) [4, 15]. In this paper we present a simple algorithm (one page of C code) that runs in O(n) time, worst case, if all the characters p i in P are distinct (i.e. none of the characters in... |

24 |
Simple and efficient string matching with k mismatches
- Grossi, Luccio
- 1989
(Show Context)
Citation Context ...e. Various algorithms have been developed to solve the problem of string matching with k mismatches. Running times have ranged from O(mn) for the brute force algorithm to O(kn) [19, 12] or O(n log m) =-=[4, 15]-=-. In this paper we present a simple algorithm (one page of C code) that runs in O(n) time, worst case, if all the characters p i in P are distinct (i.e. none of the characters in P are identical) and ... |

20 | Fast two dimensional pattern matching
- Baeza-Yates, Régnier
- 1993
(Show Context)
Citation Context ...ngth). For larger k, Chang's and Wu, Manber and Myers's algorithms are the fastest. We can improve the searching phase by using multiple string searching algorithms based on the Boyer-Moore algorithm =-=[10, 25, 6]-=- or the shift-or algorithm [3, 31]. This improvement is significant when the number of blocks found is small (or in other words, when the alphabet is large). To improve the check phase, we need to dec... |

20 |
Approximate string matching with suffix automata, Algorithmica 10
- Ukkonen, Wood
- 1993
(Show Context)
Citation Context ...pattern, where the errors are counted as the minimal number of insertions, deletions, or substitutions of characters needed to convert one string to another. The best worst case running time is O(kn) =-=[20, 13, 29]-=-. Recently, practical algorithms have been proposed [27, 32, 31], the latter This work was partially supported by Grant 1950622 from Fondecyt. E-mail: rbaeza@dcc.uchile.cl. 0 0 2 2 1 0 0 3 0 0 0 0 0 0... |

19 |
Boyer-Moore approach to approximate string matching
- Tarhio, Ukkonen
- 1990
(Show Context)
Citation Context |

18 | Fast string matching with mismatches - Baeza-Yates, Gonnet - 1994 |

18 | Experiments with a very fast substring search algorithm - Smith - 1991 |

4 |
An O(ND) difference algorithm and its variants
- Myers
- 1986
(Show Context)
Citation Context ...ew algorithm which is based in two classical algorithms and the partition approach mentioned in [32]. We show that this very simple algorithm has linear expected running time for most values of k. In =-=[22]-=- is presented an algorithm that achieves expected linear time when one of the strings is random. The partition approach is based on the following fact: an occurrence with at most k errors of a pattern... |

3 |
On saving space in parallel computation
- Hagerup
- 1988
(Show Context)
Citation Context ... of the alphabet \Sigma, and the array of m counters. Running time for preprocessing is O(2m + j\Sigmaj) as each entry of the array of offsets (size j\Sigmaj) is initialized (this can be avoided, see =-=[16]-=-), m entries for the m characters of P are written into the array, and the m counters are initialized. The space used for the case of non-distinct characters in P is O(2m+ j\Sigmaj) (we use up to m \G... |

2 |
An algorithm for approximate string matching with non uniform costs
- Manber, Wu
- 1989
(Show Context)
Citation Context ...O(2m + j\Sigmaj) where \Sigma is the underlying alphabet (typically ASCII symbols). The only algorithm which is similar to ours is a O(n +R log log m) result for insertions and deletions, reported in =-=[21]-=-. Related ideas can be found in [11, 1]. String matching with errors consists of finding all substrings of the text that have at most k errors with the pattern, where the errors are counted as the min... |

2 |
Three Longest Substring Algorithms
- Perleberg
- 1993
(Show Context)
Citation Context ...ing algorithms for similar problems by changing the computation model. In fact, this idea has been used for simple and fast algorithms to find the longest common substring between a text and a pattern=-=[23]-=-, and variations of string matching with errors. The second algorithm achieves linear expected time in most cases, and the bound on k is similar to the algorithms of Chang et al. [9]. One way to impro... |

2 |
Efficient algorithms for multiple pattern matching
- Sridhar
- 1986
(Show Context)
Citation Context ...ngth). For larger k, Chang's and Wu, Manber and Myers's algorithms are the fastest. We can improve the searching phase by using multiple string searching algorithms based on the Boyer-Moore algorithm =-=[10, 25, 6]-=- or the shift-or algorithm [3, 31]. This improvement is significant when the number of blocks found is small (or in other words, when the alphabet is large). To improve the check phase, we need to dec... |