Results 1  10
of
25
A Guided Tour to Approximate String Matching
 ACM Computing Surveys
, 1999
"... We survey the current techniques to cope with the problem of string matching allowing errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining t ..."
Abstract

Cited by 404 (38 self)
 Add to MetaCart
We survey the current techniques to cope with the problem of string matching allowing errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms and their complexities. We present a number of experiments to compare the performance of the different algorithms and show which are the best choices according to each case. We conclude with some future work directions and open problems. 1
An Algorithm for Approximate Tandem Repeats
 In Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 684 of Lecture Notes in Computer Science
, 1993
"... A perfect single tandem repeat is defined as a nonempty string that can be divided into two identical substrings, e.g. abcabc. An approximate single tandem repeat is one in which the substrings are similar, but not identical, e.g. abcdaacd. ..."
Abstract

Cited by 75 (2 self)
 Add to MetaCart
A perfect single tandem repeat is defined as a nonempty string that can be divided into two identical substrings, e.g. abcabc. An approximate single tandem repeat is one in which the substrings are similar, but not identical, e.g. abcdaacd.
Faster Algorithms for String Matching with k Mismatches
 J. OF ALGORITHMS
, 2000
"... The string matching with mismatches problem is that of finding the number of mismatches between a pattern P of length m and every length m substring of the text T . Currently, the fastest algorithms for this problem are the following. The LandauVishkin algorithm finds all locations where the pat ..."
Abstract

Cited by 52 (11 self)
 Add to MetaCart
The string matching with mismatches problem is that of finding the number of mismatches between a pattern P of length m and every length m substring of the text T . Currently, the fastest algorithms for this problem are the following. The LandauVishkin algorithm finds all locations where the pattern has at most k errors (where k is part of the input) in time O(nk). The Abrahamson algorithm finds the number of mismatches at every location in time O(n p m log m). We present
Pattern Matching with Swaps
, 1997
"... Let a text string T of n symbols and a pattern string P of m symbols from alphabet \Sigma be given. A swapped version T 0 of T is a length n string derived from T by a series of local swaps, (i.e. t 0 ` / t `+1 and t 0 `+1 / t ` ) where each element can participate in no more than one swap. ..."
Abstract

Cited by 19 (8 self)
 Add to MetaCart
Let a text string T of n symbols and a pattern string P of m symbols from alphabet \Sigma be given. A swapped version T 0 of T is a length n string derived from T by a series of local swaps, (i.e. t 0 ` / t `+1 and t 0 `+1 / t ` ) where each element can participate in no more than one swap. The Pattern Matching with Swaps problem is that of finding all locations i for which there exists a swapped version T 0 of T where there is an exact matching of P in location i of T 0 . It has been an open problem whether swapped matching can be done in less than O(mn) time. In this paper we show the first algorithm that solves the pattern matching with swaps problem in time o(mn). We present an algorithm whose time complexity is O(nm 1=3 log m log 2 min(m; j\Sigmaj)) for a general alphabet \Sigma. Key Words: Design and analysis of algorithms, combinatorial algorithms on words, pattern matching, pattern matching with swaps, nonstandard pattern matching. Department of Mathematics...
Overlap Matching
 Information and Computation
, 2001
"... We propose a new paradigm for string matching, namely structural matching. In structural matching, the text and pattern contents are not important. Rather, some areas in the text and patterns are singled out, say intervals. A "match" is a text location where a specified relation between the text ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
We propose a new paradigm for string matching, namely structural matching. In structural matching, the text and pattern contents are not important. Rather, some areas in the text and patterns are singled out, say intervals. A "match" is a text location where a specified relation between the text and pattern areas is satisfied. In particular we define the structural matching problem of Overlap (Parity) Matching. We seek the text locations where all overlaps of the given pattern and text intervals have even length. We show that this problem can be solved in time O(n log m), where the text length is n and the pattern length is m. As an application of overlap matching, we show how to reduce the String Matching with Swaps problem to the overlap matching problem. The String Matching with Swaps problem is the problem of string matching in the presence of local swaps. The best known deterministic upper bound for this problem was O(nm 1/3 log m log #) for a general alphabet #, wher...
A simple algorithm for detecting circular permutations in proteins
 Bioinformatics
, 1999
"... Results: A simple and efficient algorithm that runs in time N 2 is presented. The algorithm is based on duplication of one of the two sequences, and then performing modified version of the standard dynamic programming algorithms. While the algorithm is not guaranteed to find the optimal results, we ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
Results: A simple and efficient algorithm that runs in time N 2 is presented. The algorithm is based on duplication of one of the two sequences, and then performing modified version of the standard dynamic programming algorithms. While the algorithm is not guaranteed to find the optimal results, we present data that indicate that in practice the algorithm performs very well. Availability: A Fortran program that calculates the optimal edit distance under circular permutation is available upon request from the authors. Contact:
All semilocal longest common subsequences in subquadratic time
 In Proceedings of CSR
, 2006
"... subquadratic time ..."
Approximating edit distance in nearlinear time
, 2009
"... We show how to compute the edit distance between two strings of length n up to a factor of 2 Õ( √ log n) in n 1+o(1) time. This is the first subpolynomial approximation algorithm for this problem that runs in nearlinear time, improving on the stateoftheart n 1/3+o(1) approximation. Previously, ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
We show how to compute the edit distance between two strings of length n up to a factor of 2 Õ( √ log n) in n 1+o(1) time. This is the first subpolynomial approximation algorithm for this problem that runs in nearlinear time, improving on the stateoftheart n 1/3+o(1) approximation. Previously, approximation of 2 Õ( √ log n) was known only for embedding edit distance into ℓ1, and it is not known if that embedding can be computed in less than a quadratic time.
Approximate Periods of Strings
 In Proc. Tenth Combinatorial Pattern Matching Conference, Lecture Notes in Computer Science 1645
"... The study of approximately periodic strings is relevant to diverse applications such as molecular biology, data compression, and computerassisted music analysis. Here we study dierent forms of approximate periodicity under a variety of distance functions. We consider three related problems, for two ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
The study of approximately periodic strings is relevant to diverse applications such as molecular biology, data compression, and computerassisted music analysis. Here we study dierent forms of approximate periodicity under a variety of distance functions. We consider three related problems, for two of which we derive polynomialtime algorithms; we then show that the third problem is NPcomplete. Key words: periodicity, approximate periods, repetitions, distance function 1
Semilocal string comparison: Algorithmic techniques and applications
 Mathematics in Computer Science 1(4) (2008) 571–603 See also arXiv: 0707.3619
"... The longest common subsequence (LCS) problem is a classical problem in computer science. The semilocal LCS problem is a generalisation of the LCS problem, arising naturally in the context of string comparison. In this work, we present a number of algorithmic techniques related to the semilocal LCS ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
The longest common subsequence (LCS) problem is a classical problem in computer science. The semilocal LCS problem is a generalisation of the LCS problem, arising naturally in the context of string comparison. In this work, we present a number of algorithmic techniques related to the semilocal LCS problem, and give a number of algorithmic applications of these techniques. Summarising the presented results, we conclude that semilocal string comparison turns out to be a useful algorithmic plugin, which unifies, and often improves on, a number of previous approaches to various substring and subsequencerelated problems. Contents