Results 1  10
of
24
Dictionary matching and indexing with errors and don’t cares
 In STOC ’04
, 2004
"... This paper considers various flavors of the following online problem: preprocess a text or collection of strings, so that given a query string p, all matches of p with the text can be reported quickly. In this paper we consider matches in which a bounded number of mismatches are allowed, or in which ..."
Abstract

Cited by 50 (1 self)
 Add to MetaCart
This paper considers various flavors of the following online problem: preprocess a text or collection of strings, so that given a query string p, all matches of p with the text can be reported quickly. In this paper we consider matches in which a bounded number of mismatches are allowed, or in which a bounded number of “don’t care ” characters are allowed. The specific problems we look at are: indexing, in which there is a single text t, and we seek locations where p matches a substring of t; dictionary queries, in which a collection of strings is given upfront, and we seek those strings which match p in their entirety; and dictionary matching, in which a collection of strings is given upfront, and we seek those substrings of a (long) p which match an original string in its entirety. These are all instances of an alltoall matching problem, for which we provide a single solution. The performance bounds all have a similar character. For example, for the indexing problem with n = t  and m = p, the query time for k substitutions is O(m + (c1 log n) k k! # matches), with a data structure of size O(n (c2 log n) k k! and a preprocessing time of O(n (c2 log n) k), where c1, c2> k! 1 are constants. The deterministic preprocessing assumes a weakly nonuniform RAM model; this assumption is not needed if randomization is used in the preprocessing.
kmismatch with don’t cares
 In ESA
, 2007
"... Abstract. We give the first nontrivial algorithms for the kmismatch pattern matching problem with don’t cares. Given a text t of length n and a pattern p of length m with don’t care symbols and a bound k,our algorithms find all the places that the pattern matches the text with at most k mismatches ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
Abstract. We give the first nontrivial algorithms for the kmismatch pattern matching problem with don’t cares. Given a text t of length n and a pattern p of length m with don’t care symbols and a bound k,our algorithms find all the places that the pattern matches the text with at most k mismatches.WefirstgiveanO(n(k +lognlog log n)logm)time randomised solution which finds the correct answer with high probability. We then present a new deterministic O(nk 2 log 3 m) time solution that uses tools developed for group testing and finally an approach based on kselectors that runs in O(nk polylog m) time but requires O(poly m) time preprocessing. In each case, the location of the mismatches at each alignment is also given at no extra cost. 1
A fast, randomised, maximal subset matching algorithm for documentlevel music retrieval
 Ministry of Energy, Telecommunications and Posts
, 2006
"... We present MSM, a new maximal subset matching algorithm, for MIR at score level with polyphonic texts and patterns. First, we argue that the problem MSM and its ancestors, the SIA family of algorithms, solve is 3SUMhard and, therefore, subquadratic solutions must involve approximation. MSM is such ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
We present MSM, a new maximal subset matching algorithm, for MIR at score level with polyphonic texts and patterns. First, we argue that the problem MSM and its ancestors, the SIA family of algorithms, solve is 3SUMhard and, therefore, subquadratic solutions must involve approximation. MSM is such a solution; we describe it, and argue that, at O(n log n) time with no large constants, it is orders of magnitude more timeefficient than its closest competitor. We also evaluate MSM’s performance on a retrieval problem addressed by the OMRAS project, and show that it outperforms OMRAS on this task by a considerable margin.
Finding patterns with variable length gaps or don’t cares
 of Lecture Notes in Computer Science
, 2006
"... Abstract. In this paper we have presented new algorithms to handle the pattern matching problem where the pattern can contain variable length gaps. Given a pattern P with variable length gaps and a text T our algorithm works in O(n + m + α log(max1<=i<=l(bi − ai))) time where n is the length of the ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
Abstract. In this paper we have presented new algorithms to handle the pattern matching problem where the pattern can contain variable length gaps. Given a pattern P with variable length gaps and a text T our algorithm works in O(n + m + α log(max1<=i<=l(bi − ai))) time where n is the length of the text, m is the summation of the lengths of the component subpatterns, α is the total number of occurrences of the component subpatterns in the text and ai and bi are, respectively, the minimum and maximum number of don’t cares allowed between the ith and (i+1)st component of the pattern. We also present another algorithm which, given a suffix array of the text, can report whether P occurs in T in O(m + α log log n) time. Both the algorithms record information to report all the occurrences of P in T. Furthermore, the techniques used in our algorithms are shown to be useful in many other contexts. 1
Finding patterns in given intervals
 of Lecture Notes in Computer Science
, 2007
"... Abstract. In this paper, we study the pattern matching problem in given intervals. Depending on whether the intervals are given a priori for preprocessing, or during the query along with the pattern or, even in both cases, we develop solutions for different variants of this problem. In particular, ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Abstract. In this paper, we study the pattern matching problem in given intervals. Depending on whether the intervals are given a priori for preprocessing, or during the query along with the pattern or, even in both cases, we develop solutions for different variants of this problem. In particular, we present efficient indexing schemes for each of the above variants of the problem. 1
Tree Pattern Matching to Subset Matching in Linear Time
 IN SIAM J. ON COMPUTING
, 2000
"... This paper is the first of two papers describing an O (n polylog(m)) time algorithm for the Tree Pattern Matching problem on a pattern of size m and a text of size n. In this paper, we show an O(n+m) time Turing reduction from the Tree Pattern Matching problem to another problem called the Subset Ma ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
This paper is the first of two papers describing an O (n polylog(m)) time algorithm for the Tree Pattern Matching problem on a pattern of size m and a text of size n. In this paper, we show an O(n+m) time Turing reduction from the Tree Pattern Matching problem to another problem called the Subset Matching problem. The second paper will give efficient deterministic and randomized algorithms for the Subset Matching problem. Together,these two papers will imply an O(n log³ m + m)time deterministic algorithm and an O (n (log³m/log log m)+m) time randomized algorithm for the Tree Pattern Matching problem.
Sweepline the Music!
 Computer Science in Perspective
, 2003
"... The problem of matching sets of points or sets of horizontal line segments in plane under translations is considered. For finding the exact occurrences of a point set of size m within another point set of size n we give an algorithm with running time O(mn), and for finding partial occurrences an ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
The problem of matching sets of points or sets of horizontal line segments in plane under translations is considered. For finding the exact occurrences of a point set of size m within another point set of size n we give an algorithm with running time O(mn), and for finding partial occurrences an algorithm with running time O(mnlogm). To find the largest overlap between two line segment patterns we develop an algorithm with running time O(mnlog(mn)). All algorithms are based on a simple sweepline traversal of one of the patterns in the lexicographic order. The motivation for the problems studied comes from music retrieval and analysis.
Flexible music retrieval in sublinear time
 IN PROC. 10TH PRAGUE STRINGOLOGY CONFERENCE (PSC'05)
, 2005
"... Music sequences can be treated as texts in order to perform music retrieval tasks on them. However, the text search problems that result from this modeling are unique to music retrieval. Up to date, several approaches derived from classical string matching have been proposed to cope with the new s ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
Music sequences can be treated as texts in order to perform music retrieval tasks on them. However, the text search problems that result from this modeling are unique to music retrieval. Up to date, several approaches derived from classical string matching have been proposed to cope with the new search problems, yet each problem had its own algorithms. In this paper we show that a technique recently developed for multipattern approximate string matching is flexible enough to be successfully extended to solve many different music retrieval problems, as well as combinations thereof not addressed before. We show that the resulting algorithms are close to optimal and much better than existing approaches in many practical cases.
A Black Box for Online Approximate Pattern Matching
"... Abstract. We present a deterministic black box solution for online approximate matching. Given a pattern of length m and a streaming text of length n that arrives one character at a time, the task is to report the distance between the pattern and a sliding window of the text as soon as the new chara ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Abstract. We present a deterministic black box solution for online approximate matching. Given a pattern of length m and a streaming text of length n that arrives one character at a time, the task is to report the distance between the pattern and a sliding window of the text as soon as the new character arrives. Our solution requires O(Σ log2 m j=1 T (n, 2j−1)/n) time for each input character, where T (n, m) is the total running time of the best offline algorithm. The types of approximation that are supported include exact matching with wildcards, matching under the Hamming norm, approximating the Hamming norm, kmismatch and numerical measures such as the L2 and L1 norms. For these examples, the resulting online algorithms take O(log 2 m), O ( √ m log m), O(log 2 m/ɛ 2), O ( √ k log k log m), O(log 2 m)andO ( √ m log m) time per character respectively. The space overhead is O(m) which we show is optimal. 1
Deterministic length reduction: Fast convolution in sparse data and applications
 In CPM
, 2007
"... In this paper a deterministic algorithm for the length reduction problem is presented. This algorithm enables a new tool for performing fast convolution in sparse data. While the regular fast convolution of vectors V1, V2 whose sizes are N1, N2 respectively, takes O(N1 log N2) using FFT, the propose ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
In this paper a deterministic algorithm for the length reduction problem is presented. This algorithm enables a new tool for performing fast convolution in sparse data. While the regular fast convolution of vectors V1, V2 whose sizes are N1, N2 respectively, takes O(N1 log N2) using FFT, the proposed algorithm performs the convolution in O(n1 log 3 n1), where n1 is the number of nonzero values in V1. This algorithm assumes that V1 is given in advance, and the V2 is given in running time. This running time is achieved using a preprocessing phase on V1, which takes O(n 2 1) if N1 is polynomial in n1, and O(n 4 1) if N1 is exponential in n1 (which is rarely the case in practical applications). This tool is used to obtain faster results for several well known problems, such as the dDimensional Point Set Matching and Searching in Music Archives. 1