Results 1  10
of
52
Dictionary Matching and Indexing with Errors and Don't Cares
 In Proceedings of STOC
, 2004
"... ..."
(Show Context)
Transposition invariant string matching
, 2003
"... Given strings A = a1a2...am and B = b1b2...bn over an alphabet Σ ⊆ U, whereU is some numerical universe closed under addition and subtraction, and a distance function d(A,B) that gives the score of the best (partial) matching of A and B, the transposition invariant distance is ..."
Abstract

Cited by 29 (8 self)
 Add to MetaCart
Given strings A = a1a2...am and B = b1b2...bn over an alphabet Σ ⊆ U, whereU is some numerical universe closed under addition and subtraction, and a distance function d(A,B) that gives the score of the best (partial) matching of A and B, the transposition invariant distance is
Finding patterns with variable length gaps or don’t cares
 of Lecture Notes in Computer Science
, 2006
"... Abstract. In this paper we have presented new algorithms to handle the pattern matching problem where the pattern can contain variable length gaps. Given a pattern P with variable length gaps and a text T our algorithm works in O(n + m + α log(max1<=i<=l(bi − ai))) time where n is the length o ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper we have presented new algorithms to handle the pattern matching problem where the pattern can contain variable length gaps. Given a pattern P with variable length gaps and a text T our algorithm works in O(n + m + α log(max1<=i<=l(bi − ai))) time where n is the length of the text, m is the summation of the lengths of the component subpatterns, α is the total number of occurrences of the component subpatterns in the text and ai and bi are, respectively, the minimum and maximum number of don’t cares allowed between the ith and (i+1)st component of the pattern. We also present another algorithm which, given a suffix array of the text, can report whether P occurs in T in O(m + α log log n) time. Both the algorithms record information to report all the occurrences of P in T. Furthermore, the techniques used in our algorithms are shown to be useful in many other contexts. 1
kmismatch with don’t cares
 In ESA
, 2007
"... Abstract. We give the first nontrivial algorithms for the kmismatch pattern matching problem with don’t cares. Given a text t of length n and a pattern p of length m with don’t care symbols and a bound k,our algorithms find all the places that the pattern matches the text with at most k mismatches ..."
Abstract

Cited by 17 (6 self)
 Add to MetaCart
(Show Context)
Abstract. We give the first nontrivial algorithms for the kmismatch pattern matching problem with don’t cares. Given a text t of length n and a pattern p of length m with don’t care symbols and a bound k,our algorithms find all the places that the pattern matches the text with at most k mismatches.WefirstgiveanO(n(k +lognlog log n)logm)time randomised solution which finds the correct answer with high probability. We then present a new deterministic O(nk 2 log 3 m) time solution that uses tools developed for group testing and finally an approach based on kselectors that runs in O(nk polylog m) time but requires O(poly m) time preprocessing. In each case, the location of the mismatches at each alignment is also given at no extra cost. 1
A fast, randomised, maximal subset matching algorithm for documentlevel music retrieval
 Ministry of Energy, Telecommunications and Posts
, 2006
"... We present MSM, a new maximal subset matching algorithm, for MIR at score level with polyphonic texts and patterns. First, we argue that the problem MSM and its ancestors, the SIA family of algorithms, solve is 3SUMhard and, therefore, subquadratic solutions must involve approximation. MSM is such ..."
Abstract

Cited by 14 (5 self)
 Add to MetaCart
(Show Context)
We present MSM, a new maximal subset matching algorithm, for MIR at score level with polyphonic texts and patterns. First, we argue that the problem MSM and its ancestors, the SIA family of algorithms, solve is 3SUMhard and, therefore, subquadratic solutions must involve approximation. MSM is such a solution; we describe it, and argue that, at O(n log n) time with no large constants, it is orders of magnitude more timeefficient than its closest competitor. We also evaluate MSM’s performance on a retrieval problem addressed by the OMRAS project, and show that it outperforms OMRAS on this task by a considerable margin.
Finding patterns in given intervals
 of Lecture Notes in Computer Science
, 2007
"... Abstract. In this paper, we study the pattern matching problem in given intervals. Depending on whether the intervals are given a priori for preprocessing, or during the query along with the pattern or, even in both cases, we develop solutions for different variants of this problem. In particular, ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper, we study the pattern matching problem in given intervals. Depending on whether the intervals are given a priori for preprocessing, or during the query along with the pattern or, even in both cases, we develop solutions for different variants of this problem. In particular, we present efficient indexing schemes for each of the above variants of the problem. 1
A Black Box for Online Approximate Pattern Matching
"... Abstract. We present a deterministic black box solution for online approximate matching. Given a pattern of length m and a streaming text of length n that arrives one character at a time, the task is to report the distance between the pattern and a sliding window of the text as soon as the new chara ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
Abstract. We present a deterministic black box solution for online approximate matching. Given a pattern of length m and a streaming text of length n that arrives one character at a time, the task is to report the distance between the pattern and a sliding window of the text as soon as the new character arrives. Our solution requires O(Σ log2 m j=1 T (n, 2j−1)/n) time for each input character, where T (n, m) is the total running time of the best offline algorithm. The types of approximation that are supported include exact matching with wildcards, matching under the Hamming norm, approximating the Hamming norm, kmismatch and numerical measures such as the L2 and L1 norms. For these examples, the resulting online algorithms take O(log 2 m), O ( √ m log m), O(log 2 m/ɛ 2), O ( √ k log k log m), O(log 2 m)andO ( √ m log m) time per character respectively. The space overhead is O(m) which we show is optimal. 1
Flexible music retrieval in sublinear time
 IN PROC. 10TH PRAGUE STRINGOLOGY CONFERENCE (PSC'05)
, 2005
"... Music sequences can be treated as texts in order to perform music retrieval tasks on them. However, the text search problems that result from this modeling are unique to music retrieval. Up to date, several approaches derived from classical string matching have been proposed to cope with the new s ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
(Show Context)
Music sequences can be treated as texts in order to perform music retrieval tasks on them. However, the text search problems that result from this modeling are unique to music retrieval. Up to date, several approaches derived from classical string matching have been proposed to cope with the new search problems, yet each problem had its own algorithms. In this paper we show that a technique recently developed for multipattern approximate string matching is flexible enough to be successfully extended to solve many different music retrieval problems, as well as combinations thereof not addressed before. We show that the resulting algorithms are close to optimal and much better than existing approaches in many practical cases.
Faster pattern matching with character classes using prime number encoding
 Journal of Computer and System Sciences
"... In pattern matching with character classes the goal is to find all occurrences of a pattern of length m in a text of length n, where each pattern position consists of an allowed set of characters from a finite alphabet Σ. We present an FFTbased algorithm that uses a novel primenumbers encoding sch ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
In pattern matching with character classes the goal is to find all occurrences of a pattern of length m in a text of length n, where each pattern position consists of an allowed set of characters from a finite alphabet Σ. We present an FFTbased algorithm that uses a novel primenumbers encoding scheme, which is log n / log m times faster than the fastest extant approaches, which are based on boolean convolutions. In particular, if m Σ  = n O(1) , our algorithm runs in time O(n log m), matching the complexity of the fastest techniques for wildcard matching, a special case of our problem. A major advantage of our algorithm is that it allows a tradeoff between the running time and the RAM word size. Our algorithm also speeds up solutions to approximate matching with character classes problems — namely, matching with k mismatches and Hamming distance, as well as to the subset matching problem.
Tree Pattern Matching to Subset Matching in Linear Time
 IN SIAM J. ON COMPUTING
, 2000
"... This paper is the first of two papers describing an O (n polylog(m)) time algorithm for the Tree Pattern Matching problem on a pattern of size m and a text of size n. In this paper, we show an O(n+m) time Turing reduction from the Tree Pattern Matching problem to another problem called the Subset Ma ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
This paper is the first of two papers describing an O (n polylog(m)) time algorithm for the Tree Pattern Matching problem on a pattern of size m and a text of size n. In this paper, we show an O(n+m) time Turing reduction from the Tree Pattern Matching problem to another problem called the Subset Matching problem. The second paper will give efficient deterministic and randomized algorithms for the Subset Matching problem. Together,these two papers will imply an O(n log³ m + m)time deterministic algorithm and an O (n (log³m/log log m)+m) time randomized algorithm for the Tree Pattern Matching problem.