## Faster Bit-parallel Approximate String Matching (2002)

### Cached

### Download Links

- [www.dcc.uchile.cl]
- [www.cs.uta.fi]
- [www.cs.uta.fi]
- [ftp.dcc.uchile.cl]
- [www.cs.uta.fi]
- [www.cs.uta.fi]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proc. 13th Combinatorial Pattern Matching (CPM'2002), LNCS 2373 |

Citations: | 32 - 18 self |

### BibTeX

@INPROCEEDINGS{Hyyrö02fasterbit-parallel,

author = {Heikki Hyyrö and Gonzalo Navarro},

title = {Faster Bit-parallel Approximate String Matching},

booktitle = {In Proc. 13th Combinatorial Pattern Matching (CPM'2002), LNCS 2373},

year = {2002},

pages = {203--224}

}

### OpenURL

### Abstract

We present a new bit-parallel technique for approximate string matching. We build on two previous techniques. The first one [Myers, J. of the ACM, 1999], searches for a pattern of length m in a text of length n permitting k differences in O(mn=w) time, where w is the width of the computer word. The second one [Navarro and Raffinot, ACM JEA, 2000], extends a sublinear-time exact algorithm to approximate searching. The latter technique makes use of an O(kmn=w) time algorithm [Wu and Manber, Comm. ACM, 1992] for its internal workings.

### Citations

1212 |
Binary codes capable of correcting deletions, insertions and reversals
- Levenshtein
- 1966
(Show Context)
Citation Context ...he task of approximate string matching is to find from the text all indices j for which ed(P, Th..j) ≤ k for some h ≤ j. Perhaps the most common form of edit distance is the Levenshtein edit distance =-=[6]-=-, which is defined as the minimum number of single-character insertions, deletions and substitutions (Fig. 1a) needed in order to make A and B equal. Another common form of edit distance is the Damera... |

413 | A Guided Tour to Approximate Strings Matching
- Navarro
(Show Context)
Citation Context ...evenshtein edit distance is used. In particular the algorithms of Wu & Manber [15], Baeza-Yates & Navarro [1] and Myers [7] dominate the field when the pattern length and the error level are moderate =-=[8]-=-. In this paper we showed how these algorithms can be modified to use the Damerau edit distance, which is an important distance especially in natural language [5]. Our modification adds only a constan... |

354 |
Techniques for automatically correcting words in text
- Kukich
- 1992
(Show Context)
Citation Context ... (Fig. 1b). The Damerau edit ⋆ Supported by the Academy of Finland and Tampere Graduate School in Information Science and Engineering.distance is important for example in spelling error applications =-=[5]-=-. In this paper we use the notation edL(A, B) to denote the Levenshtein edit distance and edD(A, B) to denote the Damerau edit distance between A and B. During the last decade, algorithms based on bit... |

318 |
Fast text search allowing errors
- Manber, Wu
- 1992
(Show Context)
Citation Context ...on bit-parallelism have emerged as the fastest approximate string matching algorithms in practice for the Levenshtein edit distance [6]. The first of these was the O(kn⌈m/w⌉) algorithm of Wu & Manber =-=[15]-=-, where w is the computer word size. Later Wright [14] presented an O(mn log(σ)/w) algorithm, where σ is the alphabet size. Then Baeza-Yates & Navarro followed with their O(⌈km/w⌉n) algorithm. Finally... |

234 |
The theory and computation of evolutionary distances: Pattern recognition
- Sellers
- 1980
(Show Context)
Citation Context ...Baeza-Yates & Navarro followed with their O(⌈km/w⌉n) algorithm. Finally Myers [7] achieved an O(⌈m/w⌉n) algorithm, which is an optimal speedup from the basic O(mn) dynamic programming algorithm (e.g. =-=[11]-=-). With the exception of the algorithm of Wright, the bit-parallel algorithms dominate the other verification capable 1 algorithms with moderate pattern lengths [8]. a) insertion: cat → cast b) transp... |

207 |
A Technique for Computer Detection and Correction of Spelling Errors
- Damerau
- 1964
(Show Context)
Citation Context ...ed as the minimum number of single-character insertions, deletions and substitutions (Fig. 1a) needed in order to make A and B equal. Another common form of edit distance is the Damerau edit distance =-=[2]-=-, which is in principle an extension of the Levenshtein distance by permitting also the operation of transposing two adjacent characters (Fig. 1b). The Damerau edit ⋆ Supported by the Academy of Finla... |

193 |
The Third Text REtrieval Conference (TREC-3
- Harman, Ed
- 1994
(Show Context)
Citation Context ...tions are assumed to grow fromright to left. In addition we use superscript to denote bit-repetition. As an example let V = 1001110 be a bit vector. Then V [1] = V [5] = V [6] = 0, V [2] = V [3] = V =-=[4]-=- = V [7] = 1, and we could also write V = 10 2 1 3 0. 3.1 The Bit-parallel NFA of Wu & Manber The bit-parallel approximate string matching algorithm of Wu & Manber [15] is based on representing a non-... |

187 |
Algorithms for approximate string matching
- Ukkonen
- 1985
(Show Context)
Citation Context ...set A = P and B = T , the situation corresponds to the earlier definition of approximate string matching. From now on we assume that the dynamic programming table D is filled in this manner. Ukkonen (=-=[12, 13]-=-) has studied the properties of the dynamic programming matrix. Among these there were the following two, which apply to both the edit distance and the approximate string matching versions of D: -The ... |

150 |
Finding approximate patterns in strings
- Ukkonen
- 1985
(Show Context)
Citation Context ...set A = P and B = T , the situation corresponds to the earlier definition of approximate string matching. From now on we assume that the dynamic programming table D is filled in this manner. Ukkonen (=-=[12, 13]-=-) has studied the properties of the dynamic programming matrix. Among these there were the following two, which apply to both the edit distance and the approximate string matching versions of D: -The ... |

139 | A fast bit-vector algorithm for approximate string matching based on dynamic programming
- Myers
- 1999
(Show Context)
Citation Context ... w is the computer word size. Later Wright [14] presented an O(mn log(σ)/w) algorithm, where σ is the alphabet size. Then Baeza-Yates & Navarro followed with their O(⌈km/w⌉n) algorithm. Finally Myers =-=[7]-=- achieved an O(⌈m/w⌉n) algorithm, which is an optimal speedup from the basic O(mn) dynamic programming algorithm (e.g. [11]). With the exception of the algorithm of Wright, the bit-parallel algorithms... |

72 | Faster approximate string matching
- Baeza-Yates, Navarro
- 1999
(Show Context)
Citation Context ...ds O(k⌈m/w⌉) work to the original algorithm, whereas the additional cost of our method is only O(⌈m/w⌉). Our method is also more general in that its principle works with also the other two algorithms =-=[1, 7]-=- with very little changes. We begin by discussing the basic dynamic programming solutions for the Levenshtein and Damerau distances. In this part we also reformulate the dynamic programming solution f... |

35 | NR-grep: a fast and flexible pattern-matching tool - Navarro |

25 | Approiximate String Matching using Within-Word Parallelism
- Wright
(Show Context)
Citation Context ...mate string matching algorithms in practice for the Levenshtein edit distance [6]. The first of these was the O(kn⌈m/w⌉) algorithm of Wu & Manber [15], where w is the computer word size. Later Wright =-=[14]-=- presented an O(mn log(σ)/w) algorithm, where σ is the alphabet size. Then Baeza-Yates & Navarro followed with their O(⌈km/w⌉n) algorithm. Finally Myers [7] achieved an O(⌈m/w⌉n) algorithm, which is a... |

17 |
A model and a fast algorithm for multiple errors spelling correction
- Du, Chang
- 1992
(Show Context)
Citation Context ...amerau edit distance can be computed in basically the same way, but Recurrence 1 needs a slight change. The following Recurrence 2 for the Damerau edit distance is derived from the work of Du & Chang =-=[3]-=-. The superscript R denotes the reverse of a string (that is, if A = “abc”, then A R = “cba”). Recurrence 2 D[i, −1] = D[−1, j] = max(|A|, |B|). D[i, 0] = ⎧i, D[0, j] = j. ⎪⎨ D[i − 1, j − 1], if Ai = ... |

16 | Improving an algorithm for approximate pattern matching
- Navarro, Baeza-Yates
- 1998
(Show Context)
Citation Context ...h (m, k)-combination. The searched text was a 10 MB sample from Wall Street Journal articles taken from the TREC-collection [4]. The version of the algorithm of Baeza-Yates & Navarro was the one from =-=[10]-=-, which includes a smart mechanism to keep only a required part of the automaton active when it needs several bit-vectors. As the patterns lengths were ≤ w = 32, the other two algorithms did not need ... |