Results 1 - 10
of
20
Dictionary matching and indexing with errors and don’t cares
- In STOC ’04
, 2004
"... This paper considers various flavors of the following online problem: preprocess a text or collection of strings, so that given a query string p, all matches of p with the text can be reported quickly. In this paper we consider matches in which a bounded number of mismatches are allowed, or in which ..."
Abstract
-
Cited by 42 (1 self)
- Add to MetaCart
This paper considers various flavors of the following online problem: preprocess a text or collection of strings, so that given a query string p, all matches of p with the text can be reported quickly. In this paper we consider matches in which a bounded number of mismatches are allowed, or in which a bounded number of “don’t care ” characters are allowed. The specific problems we look at are: indexing, in which there is a single text t, and we seek locations where p matches a substring of t; dictionary queries, in which a collection of strings is given upfront, and we seek those strings which match p in their entirety; and dictionary matching, in which a collection of strings is given upfront, and we seek those substrings of a (long) p which match an original string in its entirety. These are all instances of an all-to-all matching problem, for which we provide a single solution. The performance bounds all have a similar character. For example, for the indexing problem with n = |t | and m = |p|, the query time for k substitutions is O(m + (c1 log n) k k! # matches), with a data structure of size O(n (c2 log n) k k! and a preprocessing time of O(n (c2 log n) k), where c1, c2> k! 1 are constants. The deterministic preprocessing assumes a weakly nonuniform RAM model; this assumption is not needed if randomization is used in the preprocessing.
A fast, randomised, maximal subset matching algorithm for document-level music retrieval
- Ministry of Energy, Telecommunications and Posts
, 2006
"... We present MSM, a new maximal subset matching algorithm, for MIR at score level with polyphonic texts and patterns. First, we argue that the problem MSM and its ancestors, the SIA family of algorithms, solve is 3SUM-hard and, therefore, subquadratic solutions must involve approximation. MSM is such ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
We present MSM, a new maximal subset matching algorithm, for MIR at score level with polyphonic texts and patterns. First, we argue that the problem MSM and its ancestors, the SIA family of algorithms, solve is 3SUM-hard and, therefore, subquadratic solutions must involve approximation. MSM is such a solution; we describe it, and argue that, at O(n log n) time with no large constants, it is orders of magnitude more time-efficient than its closest competitor. We also evaluate MSM’s performance on a retrieval problem addressed by the OMRAS project, and show that it outperforms OMRAS on this task by a considerable margin.
k-mismatch with don’t cares
- In ESA
, 2007
"... Abstract. We give the first non-trivial algorithms for the k-mismatch pattern matching problem with don’t cares. Given a text t of length n and a pattern p of length m with don’t care symbols and a bound k,our algorithms find all the places that the pattern matches the text with at most k mismatches ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
Abstract. We give the first non-trivial algorithms for the k-mismatch pattern matching problem with don’t cares. Given a text t of length n and a pattern p of length m with don’t care symbols and a bound k,our algorithms find all the places that the pattern matches the text with at most k mismatches.WefirstgiveanO(n(k +lognlog log n)logm)time randomised solution which finds the correct answer with high probability. We then present a new deterministic O(nk 2 log 3 m) time solution that uses tools developed for group testing and finally an approach based on k-selectors that runs in O(nk polylog m) time but requires O(poly m) time preprocessing. In each case, the location of the mismatches at each alignment is also given at no extra cost. 1
Tree Pattern Matching to Subset Matching in Linear Time
- In SIAM J. on Computing
, 2000
"... This paper is the first of two papers describing an O (n polylog(m)) time algorithm for the Tree Pattern Matching problem on a pattern of size m and a text of size n. In this paper, we show an O(n+m) time Turing reduction from the Tree Pattern Matching problem to another problem called the Subset Ma ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
This paper is the first of two papers describing an O (n polylog(m)) time algorithm for the Tree Pattern Matching problem on a pattern of size m and a text of size n. In this paper, we show an O(n+m) time Turing reduction from the Tree Pattern Matching problem to another problem called the Subset Matching problem. The second paper will give efficient deterministic and randomized algorithms for the Subset Matching problem. Together,these two papers will imply an O(n log^3 m + m)time deterministic algorithm and an O (n (log^3m/log log m)+m) time randomized algorithm for the Tree Pattern Matching problem.
Finding patterns with variable length gaps or don’t cares
- of Lecture Notes in Computer Science
, 2006
"... Abstract. In this paper we have presented new algorithms to handle the pattern matching problem where the pattern can contain variable length gaps. Given a pattern P with variable length gaps and a text T our algorithm works in O(n + m + α log(max1<=i<=l(bi − ai))) time where n is the length of the ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Abstract. In this paper we have presented new algorithms to handle the pattern matching problem where the pattern can contain variable length gaps. Given a pattern P with variable length gaps and a text T our algorithm works in O(n + m + α log(max1<=i<=l(bi − ai))) time where n is the length of the text, m is the summation of the lengths of the component subpatterns, α is the total number of occurrences of the component subpatterns in the text and ai and bi are, respectively, the minimum and maximum number of don’t cares allowed between the ith and (i+1)st component of the pattern. We also present another algorithm which, given a suffix array of the text, can report whether P occurs in T in O(m + α log log n) time. Both the algorithms record information to report all the occurrences of P in T. Furthermore, the techniques used in our algorithms are shown to be useful in many other contexts. 1
Finding patterns in given intervals
- of Lecture Notes in Computer Science
, 2007
"... Abstract. In this paper, we study the pattern matching problem in given intervals. Depending on whether the intervals are given a priori for pre-processing, or during the query along with the pattern or, even in both cases, we develop solutions for different variants of this problem. In particular, ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Abstract. In this paper, we study the pattern matching problem in given intervals. Depending on whether the intervals are given a priori for pre-processing, or during the query along with the pattern or, even in both cases, we develop solutions for different variants of this problem. In particular, we present efficient indexing schemes for each of the above variants of the problem. 1
Sweepline the Music!
- Computer Science in Perspective
, 2003
"... The problem of matching sets of points or sets of horizontal line segments in plane under translations is considered. For finding the exact occurrences of a point set of size m within another point set of size n we give an algorithm with running time O(mn), and for finding partial occurrences an ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
The problem of matching sets of points or sets of horizontal line segments in plane under translations is considered. For finding the exact occurrences of a point set of size m within another point set of size n we give an algorithm with running time O(mn), and for finding partial occurrences an algorithm with running time O(mnlogm). To find the largest overlap between two line segment patterns we develop an algorithm with running time O(mnlog(mn)). All algorithms are based on a simple sweepline traversal of one of the patterns in the lexicographic order. The motivation for the problems studied comes from music retrieval and analysis.
Flexible music retrieval in sublinear time
- IN PROC. 10TH PRAGUE STRINGOLOGY CONFERENCE (PSC'05)
, 2005
"... Music sequences can be treated as texts in order to perform music retrieval tasks on them. However, the text search problems that result from this modeling are unique to music retrieval. Up to date, several approaches derived from classical string matching have been proposed to cope with the new s ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Music sequences can be treated as texts in order to perform music retrieval tasks on them. However, the text search problems that result from this modeling are unique to music retrieval. Up to date, several approaches derived from classical string matching have been proposed to cope with the new search problems, yet each problem had its own algorithms. In this paper we show that a technique recently developed for multipattern approximate string matching is flexible enough to be successfully extended to solve many different music retrieval problems, as well as combinations thereof not addressed before. We show that the resulting algorithms are close to optimal and much better than existing approaches in many practical cases.
Deterministic length reduction: Fast convolution in sparse data and applications
- In CPM
, 2007
"... In this paper a deterministic algorithm for the length reduction problem is presented. This algorithm enables a new tool for performing fast convolution in sparse data. While the regular fast convolution of vectors V1, V2 whose sizes are N1, N2 respectively, takes O(N1 log N2) using FFT, the propose ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In this paper a deterministic algorithm for the length reduction problem is presented. This algorithm enables a new tool for performing fast convolution in sparse data. While the regular fast convolution of vectors V1, V2 whose sizes are N1, N2 respectively, takes O(N1 log N2) using FFT, the proposed algorithm performs the convolution in O(n1 log 3 n1), where n1 is the number of non-zero values in V1. This algorithm assumes that V1 is given in advance, and the V2 is given in running time. This running time is achieved using a preprocessing phase on V1, which takes O(n 2 1) if N1 is polynomial in n1, and O(n 4 1) if N1 is exponential in n1 (which is rarely the case in practical applications). This tool is used to obtain faster results for several well known problems, such as the d-Dimensional Point Set Matching and Searching in Music Archives. 1
Efficient (δ, γ)-Pattern-Matching with Don’t Cares
"... Here we consider string matching problems that arise naturally in applications to music retrieval. The δ-Matching problem calculates, for a given text T1..n and a pattern P1..m on an alphabet of integers, the list of all indices Iδ = {i: max m j=1 |Pj − Ti+j−1 | ≤ δ}. The γ-Matching problem computes ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Here we consider string matching problems that arise naturally in applications to music retrieval. The δ-Matching problem calculates, for a given text T1..n and a pattern P1..m on an alphabet of integers, the list of all indices Iδ = {i: max m j=1 |Pj − Ti+j−1 | ≤ δ}. The γ-Matching problem computes, for given T and P, the list of all indices Iγ = {i: � m j=1 |Pj − Ti+j−1 | ≤ γ}. In this paper, we extend the current result on the different matching problems to handle the presence of “don’t care ” symbols. We present efficient algorithms that calculate Iδ, Iγ, and I (δ,γ) = Iδ ∩ Iγ, for pattern P with occurrences of “don’t cares”.

