Computing the Threshold for q-Gram Filters (2002)
| Venue: | Proceedings of the 8th Scandinavian Workshop on Algorithm Theory (SWAT 2002), 2368 of LNCS:348–357 |
| Citations: | 4 - 0 self |
BibTeX
@INPROCEEDINGS{Kärkkäinen02computingthe,
author = {Juha Kärkkäinen},
title = {Computing the Threshold for q-Gram Filters},
booktitle = {Proceedings of the 8th Scandinavian Workshop on Algorithm Theory (SWAT 2002), 2368 of LNCS:348–357},
year = {2002},
pages = {348--357},
publisher = {Springer}
}
OpenURL
Abstract
Abstract. A popular and much studied class of filters for approximate string matching is based on finding common q-grams, substrings of length q, between the pattern and the text. A variation of the basic idea uses gapped q-grams and has been recently shown to provide significant improvements in practice. A major difficulty with gapped q-gram filters is the computation of the so-called threshold which defines the filter criterium. We describe the first general method for computing the threshold for q-gram filters. The method is based on a carefully chosen precise statement of the problem which is then transformed into a constrained shortest path problem. In its generic form the method leaves certain parts open but is applicable to a large variety of q-gram filters and may be extensible even to other classes of filters. We also give a full algorithm for a specific subclass. For this subclass, the algorithm has been implemented and used succesfully in an experimental comparison. 1







