Results 1 
8 of
8
A linear size index for approximate pattern matching
 In Proc. 17th Annual Symposium on Combinatorial Pattern Matching
, 2006
"... Abstract. This paper revisits the problem of indexing a text S[1..n]to support searching substrings in S that match a given pattern P[1..m] with at most k errors. A naive solution either has a worstcase matching time complexity of Ω(m k)orrequiresΩ(n k) space. Devising a solution with better perfor ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
Abstract. This paper revisits the problem of indexing a text S[1..n]to support searching substrings in S that match a given pattern P[1..m] with at most k errors. A naive solution either has a worstcase matching time complexity of Ω(m k)orrequiresΩ(n k) space. Devising a solution with better performance has been a challenge until Cole et al. [5] showed an O(nlog k n)space index that can support kerror matching in O(m+occ+log k nlog log n) time, where occ is the number of occurrences. Motivated by the indexing of DNA, we investigate in this paper the feasibility of devising a linearsize index that still has a time complexity linear in m. In particular, we give an O(n)space index that supports kerror matching in O(m + occ +(logn) k(k+1) log log n) worstcase time. Furthermore, the index can be compressed from O(n) wordsintoO(n) bits with a slight increase in the time complexity. 1
Approximate String Matching with LempelZiv Compressed Indexes
, 2007
"... A compressed fulltext selfindex for a text T is a data structure requiring reduced space and able of searching for patterns P in T. Furthermore, the structure can reproduce any substring of T, thus it actually replaces T. Despite the explosion of interest on selfindexes in recent years, there has ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
A compressed fulltext selfindex for a text T is a data structure requiring reduced space and able of searching for patterns P in T. Furthermore, the structure can reproduce any substring of T, thus it actually replaces T. Despite the explosion of interest on selfindexes in recent years, there has not been much progress on search functionalities beyond the basic exact search. In this paper we focus on indexed approximate string matching (ASM), which is of great interest, say, in computational biology applications. We present an ASM algorithm that works on top of a LempelZiv selfindex. We consider the socalled hybrid indexes, which are the best in practice for this problem. We show that a LempelZiv index can be seen as an extension of the classical qsamples index. We give new insights on this type of index, which can be of independent interest, and then apply them to the LempelZiv index. We show experimentally that our algorithm has a competitive performance and provides a useful spacetime tradeoff compared to classical indexes.
String indexing for patterns with wildcards. Theory Comput
 Syst
"... Abstract. We consider the problem of indexing a string t of length n to report the occurrences of a query pattern p containing m characters and j wildcards. Let occ be the number of occurrences of p in t, and σ the size of the alphabet. We obtain the following results. – A linear space index with qu ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We consider the problem of indexing a string t of length n to report the occurrences of a query pattern p containing m characters and j wildcards. Let occ be the number of occurrences of p in t, and σ the size of the alphabet. We obtain the following results. – A linear space index with query time O(m+σj log logn+ occ). This significantly improves the previously best known linear space index by Lam et al. [ISAAC 2007], which requires query time Θ(jn) in the worst case. – An index with query timeO(m+j+occ) using spaceO(σk 2 n logk logn), where k is the maximum number of wildcards allowed in the pattern. This is the first nontrivial bound with this query time. – A timespace tradeoff, generalizing the index by Cole et al. [STOC 2004]. Our results are obtained using a novel combination of wellknown and new techniques, which could be of independent interest. 1
Approximate String Matching with ZivLempel Compressed Indexes
"... Abstract. A compressed fulltext selfindex for a text T is a data structure requiring reduced space and able of searching for patterns P in T. Furthermore, the structure can reproduce any substring of T, thus it actually replaces T. Despite the explosion of interest on selfindexes in recent years, ..."
Abstract
 Add to MetaCart
Abstract. A compressed fulltext selfindex for a text T is a data structure requiring reduced space and able of searching for patterns P in T. Furthermore, the structure can reproduce any substring of T, thus it actually replaces T. Despite the explosion of interest on selfindexes in recent years, there has not been much progress on search functionalities beyond the basic exact search. In this paper we focus on indexed approximate string matching (ASM), which is of great interest, say, in computational biology applications. We present an ASM algorithm that works on top of a LempelZiv selfindex. We consider the socalled hybrid indexes, which are the best in practice for this problem. We show that a LempelZiv index can be seen as an extension of the classical qsamples index. We give new insights on this type of index, which can be of independent interest, and then apply them to the ZivLempel index. We show experimentally that our algorithm has a competitive performance and provides a useful spacetime tradeoff compared to classical indexes. 1 Introduction and Related Work Approximate string matching (ASM) is an important problem that arises in applications related to text searching, pattern recognition, signal processing, and computational biology,
MASTER’S THESIS String Indexing for Patterns with Wildcards
, 2011
"... We consider the problem of indexing a string t of length n to report the occurrences of a query pattern p containing m characters and j wildcards. Let occ be the number of occurrences of p in t, and σ the size of the alphabet. We obtain the following results. • A linear space index with query time O ..."
Abstract
 Add to MetaCart
We consider the problem of indexing a string t of length n to report the occurrences of a query pattern p containing m characters and j wildcards. Let occ be the number of occurrences of p in t, and σ the size of the alphabet. We obtain the following results. • A linear space index with query time O(m + σj log log n + occ). This significantly improves the previously best known linear space index described by Lam et al. [ISAAC 2007], which requires query time Θ(jn) in the worst case. • An index with optimal query time O(m+ j+ occ) using space O(σk2n logk log n), where k is the maximum number of wildcards allowed in the pattern. This is the first nontrivial bound with this query time. • A timespace tradeoff for the problem which generalizes the index described by Cole et al. [STOC 2004]. The Longest Common Prefix (LCP) data structure introduced by Cole et al. is a key component in our results. We give a detailed explanation and show several new properties of the LCP data structure. Most importantly, we show that not only suffixes,
Optimal Prefix and Suffix Queries on Texts
, 2013
"... Abstract. In this paper, we study a restricted version of the position restricted pattern matching problem introduced and studied Mäkinen and Navarro [PositionRestricted Substring Searching, LATIN 2006]. In the problem handled in this paper, we are interested in those occurrences of the pattern tha ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. In this paper, we study a restricted version of the position restricted pattern matching problem introduced and studied Mäkinen and Navarro [PositionRestricted Substring Searching, LATIN 2006]. In the problem handled in this paper, we are interested in those occurrences of the pattern that lies in a suffix or in a prefix of the given text. We achieve optimal query time for our problem against a data structure which is an extension of the classic suffix tree data structure. The time and space complexity of the data structure is dominated by that of the suffix tree. Notably, the (best) algorithm by Mäkinen and Navarro, if applied to our problem, gives suboptimal query time and the corresponding data structure also requires more time and space. 1