Results 1 
8 of
8
Practical Algorithms for TranspositionInvariant StringMatching
"... We consider the problems of (1) longest common subsequence (LCS) of two given strings in the case where the first may be shifted by some constant (that is, transposed) to match the second, and (2) transpositioninvariant text searching using indel distance. These problems have applications in music ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
We consider the problems of (1) longest common subsequence (LCS) of two given strings in the case where the first may be shifted by some constant (that is, transposed) to match the second, and (2) transpositioninvariant text searching using indel distance. These problems have applications in music comparison and retrieval. We introduce two novel techniques to solve these problems efficiently. The first is based on the branch and bound method, the second on bitparallelism. Our branch and bound algorithm computes the longest common transpositioninvariant subsequence (LCTS) in time O((m²+log log sigma) log sigma) in the best case and O((m²+log sigma)sigma) in the worst case, where m and sigma, respectively, are the length of the strings and the size of the alphabet. On the other hand, we show that the same problem can be solved by using bitparallelism and thus obtain a speedup of O(w/ log m) over the classical algorithms, where the computer word has w bits. The advantage of this latter algorithm over the present bitparallel ones is that it allows the use of more complex distances, including general integer weights. Since our branch and bound method is very flexible, it can be further improved by combining it with other efficient algorithms such as our novel bitparallel algorithm. We experiment on several combination possibilities and discuss which are the best settings for each of those combinations. Our algorithms are easily extended to other musically relevant cases, such as deltamatching and polyphony (where there are several parallel texts to be considered). We also show how our bitparallel algorithm is adapted to text searching and illustrate its effectiveness in complex cases where the only known competing method is the use of brute force.
Rotation and lighting invariant template matching
 In Proc. 6th Latin American Symposium on Theoretical Informatics (LATIN 2004), LNCS 2976
, 2003
"... We address the problem of searching for a twodimensional pattern in a twodimensional text (or image), such that the pattern can be found even if it appears rotated and it is brighter or darker than its occurrence. Furthermore, we consider approximate matching under several tolerance models. We obt ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
We address the problem of searching for a twodimensional pattern in a twodimensional text (or image), such that the pattern can be found even if it appears rotated and it is brighter or darker than its occurrence. Furthermore, we consider approximate matching under several tolerance models. We obtain algorithms that are almost optimal both in the worst and the average cases simultaneously. The complexities we obtain are very close to the best current results for the case where only rotations, but not lighting invariance, are supported. These are the first results for this problem under a combinatorial approach. 1
Sequential and indexed twodimensional combinatorial template matching allowing rotations
 THEORETICAL COMPUTER SCIENCE A
, 2005
"... We present new and faster algorithms to search for a 2dimensional pattern in a 2dimensional text allowing any rotation of the pattern. This has applications such as image databases and computational biology. We consider the cases of exact and approximate matching under several matching models, usi ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
We present new and faster algorithms to search for a 2dimensional pattern in a 2dimensional text allowing any rotation of the pattern. This has applications such as image databases and computational biology. We consider the cases of exact and approximate matching under several matching models, using a combinatorial approach that generalizes string matching techniques. We focus on sequential algorithms, where only the pattern can be preprocessed, as well as on indexed algorithms, where the text is preprocessed and an index built on it. On sequential searching we derive averagecase lower bounds and then obtain optimal averagecase algorithms for all the matching models. At the same time, these algorithms are worstcase optimal. On indexed searching we obtain search time polylogarithmic on the text size, as well as sublinear time in general for approximate searching.
Bitparallel branch and bound algorithm for transposition invariant LCS
 Proc. 11th International Symposium on String Processing and Information Retrieval (SPIRE’04), in: Lecture Notes in Comput. Sci
, 2004
"... Main Results. We consider the problem of longest common subsequence (LCS) of two given strings in the case where the first may be shifted by some constant (i.e. transposed) to match the second. For this longest common transposition invariant subsequence (LCTS) problem, that has applications for inst ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Main Results. We consider the problem of longest common subsequence (LCS) of two given strings in the case where the first may be shifted by some constant (i.e. transposed) to match the second. For this longest common transposition invariant subsequence (LCTS) problem, that has applications for instance in music comparison, we develop a branch and bound algorithm with best case time O((m 2 + log log σ)log σ) and worst case time O((m 2 + log σ)σ), where m and σ are the length of the strings and the number of possible transpositions, respectively. This compares favorably against the O(σm 2) naive algorithm in most cases and, for large m, against the O(m 2 log log m) time algorithm of [2]. Technical Details. Let A = a1,..., an and B = b1,..., bm be two strings, over a finite numeric alphabet Σ = {0...σ}. A subsequence of string A is obtained by deleting zero, one or several characters of A. The length of the longest common subsequence of A and B, denoted LCS(A, B), is the length of the longest string that is a subsequence both of A and B. The conventional dynamic programming approach computes LCS(A, B) in
O(mn log σ) Time Transposition Invariant LCS Computation
"... Abstract. Given strings A and B of lengths m and n over a finite alphabet Σ ⊂ Z of size O(σ), the length of the longest common transposition invariant subsequence is LCTS(A, B) = maxt∈Z{LCS(A +t, B)}, where A + t = (a1 + t)(a2 + t)...(am + t) and LCS(A + t, B) is the length of the longest common su ..."
Abstract
 Add to MetaCart
Abstract. Given strings A and B of lengths m and n over a finite alphabet Σ ⊂ Z of size O(σ), the length of the longest common transposition invariant subsequence is LCTS(A, B) = maxt∈Z{LCS(A +t, B)}, where A + t = (a1 + t)(a2 + t)...(am + t) and LCS(A + t, B) is the length of the longest common subsequence between A + t and B. LCTS(A, B) can be computed naively in O(mn σ) time. We present a simple and easy to implement algorithm obtaining O(mnlog σ) time. We also show that transposition invariant Levenshtein distance can be computed in O(mn √ σ) time. 1
Practical Algorithms for TranspositionInvariant StringMatching ⋆
"... We consider the problems of (1) longest common subsequence (LCS) of two given strings in the case where the first may be shifted by some constant (that is, transposed) to match the second, and (2) transpositioninvariant text searching using indel distance. These problems have applications in music ..."
Abstract
 Add to MetaCart
We consider the problems of (1) longest common subsequence (LCS) of two given strings in the case where the first may be shifted by some constant (that is, transposed) to match the second, and (2) transpositioninvariant text searching using indel distance. These problems have applications in music comparison and retrieval. We introduce two novel techniques to solve these problems efficiently. The first is based on the branch and bound method, the second on bitparallelism. Our branch and bound algorithm computes the longest common transpositioninvariant subsequence (LCTS) in time O((m 2 +log log σ)log σ) in the best case and O((m 2 +log σ)σ) in the worst case, where m and σ, respectively, are the length of the strings and the size of the alphabet. On the other hand, we show that the same problem can be solved by using bitparallelism and thus obtain a speedup of O(w/log m) over the classical algorithms, where the computer word has w bits. The advantage of this latter algorithm over the present bitparallel ones is that it allows the use of more complex distances, including general integer weights. Since our branch and bound
Improved Time and Space Complexities for Transposition Invariant String Matching
"... Given strings A = a1a2...am and B = b1b2...bn over a finite alphabet Σ ⊂ Z of size O(σ), and a distance d() defined among strings, the transposition invariant version of d() is d t (A,B) = mint∈Z d(A+t,B), where A+t = (a1+t)(a2+t)...(am+t). Distances d() of most interest are Levenshtein distance an ..."
Abstract
 Add to MetaCart
Given strings A = a1a2...am and B = b1b2...bn over a finite alphabet Σ ⊂ Z of size O(σ), and a distance d() defined among strings, the transposition invariant version of d() is d t (A,B) = mint∈Z d(A+t,B), where A+t = (a1+t)(a2+t)...(am+t). Distances d() of most interest are Levenshtein distance and indel distance (the dual of the Longest Common Subsequence), which can be computed in O(mn) time. Recent algorithms compute d t (A,B) in O(mn log log min(m,n)) time for those distances. In this paper we show how those complexities can be reduced to O(mn log log σ). Furthermore, we reduce the space requirements from O(mn) to O(σ 2 + min(m,n)). Key words: longest common subsequence, edit distance, music sequence comparison, transposition invariance, sparse dynamic programming 1
International Journal of Document Analysis (2005) DOI 10.1007/s1003200501476 REGULAR PAPER
, 2005
"... Abstract A significant portion of currently available documents exist in the form of images, for instance, as scanned documents. Electronic documents produced by scanning and OCR software contain recognition errors. This paper uses an automatic approach to examine the selection and the effectiveness ..."
Abstract
 Add to MetaCart
Abstract A significant portion of currently available documents exist in the form of images, for instance, as scanned documents. Electronic documents produced by scanning and OCR software contain recognition errors. This paper uses an automatic approach to examine the selection and the effectiveness of searching techniques for possible erroneous terms for query expansion. The proposed method consists of two basic steps. In the first step, confused characters in erroneous words are located and editing operations are applied to create a collection of erroneous errorgrams in the basic unit of the model. The second step uses query terms and errorgrams to generate additional query terms, identify appropriate matching terms, and determine the degree of relevance of retrieved document images to the user’s query, based on a vector space IR model. The proposed approach has been trained on 979 document images to construct about 2,822 errorgrams and tested on 100 scanned Web pages,