## Efficient approximate dictionary look-up over small alphabets (2005)

Citations: | 3 - 1 self |

### BibTeX

@TECHREPORT{Arslan05efficientapproximate,

author = {Abdullah N. Arslan},

title = {Efficient approximate dictionary look-up over small alphabets},

institution = {},

year = {2005}

}

### OpenURL

### Abstract

Given a dictionary W consisting of n binary strings of length m each, a d-query asks if there exists a string in W within Hamming distance d of a given binary query string q. The problem was posed by Minsky and Papert in 1969 [10] as a challenge to data structure design. Efficient solutions have been developed only for the special case when d = 1 (the 1-query problem). We assume the standard RAM model of computation, and consider the case of the problem when alphabet size is arbitrary but finite, and d is small. We preprocess the dictionary, and construct an edge-labelled tree with bounded branching factor, and height. We present an algorithm to answer dictionary look-up within given distance d of a given query string q. The algorithm is efficient when the alphabet size is small, or the dictionary is sparse. In particular, for the d-query problem the algorithm takes time O(m(log 4/3 n − 1) d (log 2 n) d+1). This is an improvement over previously known algorithms for the d-query problem when d> 1. We also generalize the results for the case of the problem when edit distances are used. The algorithm can be modified such that it allows for words of different lengths as well as different lengths of query strings. 1

### Citations

1279 | Binary codes capable of correcting deletions, insertions, and reversals - Levenshtein - 1966 |

691 |
The string-tostring correction problem
- Wagner, Fischer
- 1974
(Show Context)
Citation Context ...minimum-cost path from (0, 0) to (i, j), and can be computed from the minimum costs achieved at nodes (i − 1, j), (i − 1, j − 1), and (i, j − 1). Hence it has a simple dynamic programming formulation =-=[13]-=-: Di,j = min{ Di−1,j + 1, Di−1,j−1 + H(xi, yj), Di,j−1 + 1} (2) for all i, j, 0 ≤ i, j ≤ m with boundary conditions Di,−1 = D−1,j = ∞, D0,0 = 0. 7sWith respect to a given query string q, let e be a fu... |

415 |
Algorithms on strings, trees, and sequences: Computer science and computational biology. Cambridge Univ
- GUSFIELD
- 1997
(Show Context)
Citation Context ...e complexity analysis. They assume a trie representation for the dictionary W. For the approximate dictionary look-up problem they present algorithms that use hybrid tree/dynamic programming approach =-=[8]-=- that combines tree traversal with partial computation of distances. Their method allows for the use of simple edit distance as well as the Hamming distance. The simple edit distance between two strin... |

406 | Error detecting and error correcting codes - Hamming - 1950 |

219 |
E.: Storing a Sparse Table with O(1) Worst Case Access Time
- Fredman, Komlós, et al.
- 1984
(Show Context)
Citation Context ...This yields a recursive data structure that takes space O(mn log m), and using this data structure a 1-query can be answered in time O(mlog log n). This recursive data structure uses FKS dictionaries =-=[7]-=-. An FKS dictionary requires O(n) m-bit words as storage and answers an exact query by accessing O(1) words. Brodal and Gasieniec [2] construct a data structure for answering 1-queries that requires s... |

218 | Average Case Analysis of Algorithms on Sequences - Szpankowski - 2001 |

197 |
Algorithms for approximate string matching
- Ukkonen
- 1985
(Show Context)
Citation Context ...uring which the entries of the dynamic programming matrix are partially computed. To determine if two strings are within edit distance d it is sufficient to consider a diagonal band of the edit graph =-=[12]-=-. Algorithm DFT-LOOK-UP ed uses this observation (see Figure 8). For a given node v in S rooted at r, we define Dv,j where max{0, i − ⌊d/2⌋} ≤ j ≤ min{m, i + ⌊d/2⌋}, and i = |pr,v| (see Figure 8) as f... |

137 | Should tables be sorted
- Yao
- 1981
(Show Context)
Citation Context ...ure of size O(mn) words that supports 1-queries in O(m) memory accesses. Yao and Yao [15] present a method for 1-queries in the bitwise complexity model [10, 6] (or, equivalently the cell-probe model =-=[14]-=- with word size 1). The method is based on the observation that two strings differing in one bit match in one half. This yields a recursive data structure that takes space O(mn log m), and using this ... |

121 | The string B-tree: a new data structure for string search in external memory and its applications - Ferragina, Grossi - 1999 |

62 | Distinguishing string selection problems - Lanctot, Li, et al. - 1999 |

60 | Asymptotic growth of a class of random trees - Pittel - 1985 |

54 | Indexing methods for approximate string matching - Navarro, Baeza-Yates, et al. |

53 | Dictionary matching and indexing with errors and don’t cares - Cole, Gottlieb, et al. |

31 | Tries for approximate string matching - Shang, Merrettai - 1996 |

29 | An algorithm for approximate membership checking with application to password security
- Manber, Wu
- 1994
(Show Context)
Citation Context ...apert in 1969 [10] in which they asked if there is a data structure that supports fast d-queries. Algorithms for answering d-queries and its variations have been a topic of interest in the literature =-=[1, 2, 3, 4, 5, 9, 15]-=-. The approximate dictionary look-up [1, 2, 3, 9, 15], and the approximate dictionary query [4, 5] problems are variations of the d-query problem. Approximate dictionary look-up is a problem of dictio... |

26 | A Study of Trie-Like Structures Under the Density Model - Devroye - 1992 |

23 |
Efficient storage and retrieval by content and address of static files
- Elias
- 1974
(Show Context)
Citation Context ... Brodal and Venkadesh [3] present a data structure of size O(mn) words that supports 1-queries in O(m) memory accesses. Yao and Yao [15] present a method for 1-queries in the bitwise complexity model =-=[10, 6]-=- (or, equivalently the cell-probe model [14] with word size 1). The method is based on the observation that two strings differing in one bit match in one half. This yields a recursive data structure t... |

15 |
Neighborhood Preserving Hashing and Approximate Queries
- Dolev, Harari, et al.
- 1994
(Show Context)
Citation Context ...apert in 1969 [10] in which they asked if there is a data structure that supports fast d-queries. Algorithms for answering d-queries and its variations have been a topic of interest in the literature =-=[1, 2, 3, 4, 5, 9, 15]-=-. The approximate dictionary look-up [1, 2, 3, 9, 15], and the approximate dictionary query [4, 5] problems are variations of the d-query problem. Approximate dictionary look-up is a problem of dictio... |

15 | Finding the neighborhood of a query in a dictionary
- Dolev, Harai, et al.
- 1993
(Show Context)
Citation Context ...apert in 1969 [10] in which they asked if there is a data structure that supports fast d-queries. Algorithms for answering d-queries and its variations have been a topic of interest in the literature =-=[1, 2, 3, 4, 5, 9, 15]-=-. The approximate dictionary look-up [1, 2, 3, 9, 15], and the approximate dictionary query [4, 5] problems are variations of the d-query problem. Approximate dictionary look-up is a problem of dictio... |

15 |
Dictionary look-up with one error
- Yao, Yao
- 1997
(Show Context)
Citation Context |

15 | A Note on the Probabilistic Analysis of Patricia Trees - Devroye - 1992 |

15 | A practical algorithm to find the best subsequence patterns - Hirao, Hoshino, et al. |

13 | Approximate dictionary queries
- Brodal, Ga¸sieniec
- 1996
(Show Context)
Citation Context |

11 |
editors. Time Wraps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison
- Sankoff, Kruskal
- 1983
(Show Context)
Citation Context ...9]. Other applications include spellchecking, speech-recognition, study of bird-singing, and searching biological sequence databases for an approximate match for a given query pattern (or motif) (see =-=[11]-=- for many possible applications). 1sA naive method for answering a d-query is to generate the whole set of �d �m� k=0 k strings each differing from q in at most d positions, and with every string gene... |

8 | Universal asymptotics for random tries and PATRICIA trees - Devroye - 2005 |

8 | Text indexing with errors - Maass, Nowak - 2005 |

7 | Improved bounds for dictionary look-up with one error
- Brodal, Venkatesh
- 2000
(Show Context)
Citation Context |

7 | PAST: Fast structure-based searching - Täubig, Buchner, et al. |

5 | Obtaining provably good performance from suffix trees in secondary storage - Ko, Aluru |

4 | Dictionary look-up within small edit distance
- Arslan, Egecioglu
- 2002
(Show Context)
Citation Context |

3 | Height in generalized tries and PATRICIA tries - Szpankowski, Knessl - 2000 |

1 | Mining minimal distinguishing subsequence with gap constraints - Ji, Bailey, et al. - 2005 |

1 | Distinguishing string selection problems. The art of computer programming: sorting and searching - Knuth - 1973 |

1 | Average-case analysis of approximate trie search - Maass - 2004 |

1 | PATRICIA - Practical algorithm to retrieve information coded in alphanumeric - Perceptrons - 1969 |