## A linear size index for approximate pattern matching (2006)

### Cached

### Download Links

- [www.cs.pitt.edu]
- [www.comp.nus.edu]
- [www.comp.nus.edu.sg]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proc. 17th Annual Symposium on Combinatorial Pattern Matching |

Citations: | 12 - 1 self |

### BibTeX

@INPROCEEDINGS{Chan06alinear,

author = {Ho-leung Chan and Tak-wah Lam and Wing-kin Sung and Siu-lung Tam and Swee-seong Wong},

title = {A linear size index for approximate pattern matching},

booktitle = {In Proc. 17th Annual Symposium on Combinatorial Pattern Matching},

year = {2006},

pages = {49--59},

publisher = {Springer}

}

### OpenURL

### Abstract

Abstract. This paper revisits the problem of indexing a text S[1..n]to support searching substrings in S that match a given pattern P[1..m] with at most k errors. A naive solution either has a worst-case matching time complexity of Ω(m k)orrequiresΩ(n k) space. Devising a solution with better performance has been a challenge until Cole et al. [5] showed an O(nlog k n)-space index that can support k-error matching in O(m+occ+log k nlog log n) time, where occ is the number of occurrences. Motivated by the indexing of DNA, we investigate in this paper the feasibility of devising a linear-size index that still has a time complexity linear in m. In particular, we give an O(n)-space index that supports k-error matching in O(m + occ +(logn) k(k+1) log log n) worst-case time. Furthermore, the index can be compressed from O(n) wordsintoO(n) bits with a slight increase in the time complexity. 1

### Citations

644 | Suffix arrays: a new method for on-line string searches
- Manber, Myers
- 1990
(Show Context)
Citation Context ...ays are the most well-known indexes. Suffix trees [12,15] occupy O(n) space and achieve the optimal matching time, i.e., O(m + occ), where occ is the number occurrences of P in S. 3 For suffix arrays =-=[11]-=-, the space requirement is also O(n) space (but with a smaller constant), and the matching time is O(m + occ + log n). Recently, two 3 Unless otherwise stated, the space complexity is measured in term... |

548 |
A space–economical suffix tree construction algorithm
- McCreight
- 1976
(Show Context)
Citation Context ...cations include the indexing of DNA or protein sequences for biological research. To support exact matching (i.e., k = 0), suffix trees and suffix arrays are the most well-known indexes. Suffix trees =-=[12,15]-=- occupy O(n) space and achieve the optimal matching time, i.e., O(m + occ), where occ is the number occurrences of P in S. 3 For suffix arrays [11], the space requirement is also O(n) space (but with ... |

426 |
Linear pattern matching algorithms
- Weiner
- 1973
(Show Context)
Citation Context ...cations include the indexing of DNA or protein sequences for biological research. To support exact matching (i.e., k = 0), suffix trees and suffix arrays are the most well-known indexes. Suffix trees =-=[12,15]-=- occupy O(n) space and achieve the optimal matching time, i.e., O(m + occ), where occ is the number occurrences of P in S. 3 For suffix arrays [11], the space requirement is also O(n) space (but with ... |

188 | Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching
- Grossi, Vitter
(Show Context)
Citation Context ...Recently, two 3 Unless otherwise stated, the space complexity is measured in terms of the number of words, where a word can store O(log n) bits.scompressed solutions, namely, compressed suffix arrays =-=[7]-=- and FM-index [6], have been proposed; they requires O(n) bits only and the matching time is O(m + occ log ɛ n), where ɛ > 0. Indexing a string for approximate matching is a challenging problem. Even ... |

180 | Opportunistic Data Structures with Application - Ferrragina, Manzini - 2000 |

56 | A hybrid indexing method for approximate string matching
- Navarro, Baeza-Yates
- 2000
(Show Context)
Citation Context ...g n) log ɛ n) time. Other related results. Note that the above results concern worst-case performance. The literature also contains several interesting results on average-case performance (see, e.g., =-=[3, 10, 13]-=-). 2 An O(n)-word index for k-error matching This section considers Hamming distance only and presents an O(n)-word index for a text S[1..n]. Given any pattern P [1..m], the index finds all substrings... |

52 | Compressed suffix trees with full functionality
- Sadakane
(Show Context)
Citation Context ... O(n)-bit space We can choose β = k3k log k+2 n. Then, the error-trees and the tree-cross-product data structures takes O(n)-bit space. We can replace the suffix tree of S by a compressed suffix tree =-=[14]-=-, which still supports the preprocessing of P in O(m) time. The matching time for pattern of length at least k3k log k+2 n is O(m + occ + k218k log 2k+2 n log log n). For patterns of length less than ... |

51 | Dictionary matching and indexing with errors and don’t cares
- Cole, Gottlieb, et al.
- 2004
(Show Context)
Citation Context ...st k errors. A naive solution either has a worst-case matching time complexity of Ω(m k ) or requires Ω(n k ) space. Devising a solution with better performance has been a challenge until Cole et al. =-=[5]-=- showed an O(n log k n)-space index that can support k-error matching in O(m+occ+log k n log log n) time, where occ is the number of occurrences. Motivated by the indexing of DNA, we investigate in th... |

38 |
Fast approximate matching using suffix trees
- Cobbs
- 1995
(Show Context)
Citation Context ...tention. A simple solution is to use the suffix tree of S and repeatedly search for every 1-error modification of the query pattern; this solution uses O(n) space and the matching time is O(m2 + occ) =-=[4]-=-. With a bigger index of size O(n log n), the matching time complexity has been improved tremendously by a chain of results to O(m log n log log n + occ) [1], O(m log log n + occ) [2], and finally O(m... |

26 | Approximate string matching using compressed suffix arrays
- Huynh, Hon, et al.
- 2004
(Show Context)
Citation Context ... chain of results to O(m log n log log n + occ) [1], O(m log log n + occ) [2], and finally O(m + occ + log n log log n) [5]. It is also known that indexes using O(n) space takes O(m log n + occ) time =-=[8]-=- and O(m log log n + occ) time [9] for 1-error matching. These two indexes can also be compressed to O(n) bits, and the matching time are O(m log 2 n + occ log n) and O((m log log n + occ) log ɛ n), r... |

24 | Text indexing and dictionary matching with one error
- Amir, Keselman, et al.
(Show Context)
Citation Context ...space and the matching time is O(m2 + occ) [4]. With a bigger index of size O(n log n), the matching time complexity has been improved tremendously by a chain of results to O(m log n log log n + occ) =-=[1]-=-, O(m log log n + occ) [2], and finally O(m + occ + log n log log n) [5]. It is also known that indexes using O(n) space takes O(m log n + occ) time [8] and O(m log log n + occ) time [9] for 1-error m... |

20 | A Metric Index for Approximate String Matching
- Chávez, Navarro
- 2002
(Show Context)
Citation Context ...g n) log ɛ n) time. Other related results. Note that the above results concern worst-case performance. The literature also contains several interesting results on average-case performance (see, e.g., =-=[3, 10, 13]-=-). 2 An O(n)-word index for k-error matching This section considers Hamming distance only and presents an O(n)-word index for a text S[1..n]. Given any pattern P [1..m], the index finds all substrings... |

18 | Range searching over tree cross products
- Buchsbaum, Goodrich, et al.
- 2000
(Show Context)
Citation Context ...e is O(m2 + occ) [4]. With a bigger index of size O(n log n), the matching time complexity has been improved tremendously by a chain of results to O(m log n log log n + occ) [1], O(m log log n + occ) =-=[2]-=-, and finally O(m + occ + log n log log n) [5]. It is also known that indexes using O(n) space takes O(m log n + occ) time [8] and O(m log log n + occ) time [9] for 1-error matching. These two indexes... |

11 |
Improved approximate string matching using compressed suffix data structures
- Lam, Sung, et al.
- 2005
(Show Context)
Citation Context ... log n + occ) [1], O(m log log n + occ) [2], and finally O(m + occ + log n log log n) [5]. It is also known that indexes using O(n) space takes O(m log n + occ) time [8] and O(m log log n + occ) time =-=[9]-=- for 1-error matching. These two indexes can also be compressed to O(n) bits, and the matching time are O(m log 2 n + occ log n) and O((m log log n + occ) log ɛ n), respectively, where ɛ < 1. To cater... |

8 | Text indexing with errors - Maass, Nowak - 2005 |

6 | Text indexing with errors - Maaß, Nowak - 2005 |