Results 1  10
of
52
A Guided Tour to Approximate String Matching
 ACM COMPUTING SURVEYS
, 1999
"... We survey the current techniques to cope with the problem of string matching allowing errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining t ..."
Abstract

Cited by 585 (38 self)
 Add to MetaCart
We survey the current techniques to cope with the problem of string matching allowing errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms and their complexities. We present a number of experiments to compare the performance of the different algorithms and show which are the best choices according to each case. We conclude with some future work directions and open problems.
An Algorithm for Approximate Tandem Repeats
 In Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 684 of Lecture Notes in Computer Science
, 1993
"... A perfect single tandem repeat is defined as a nonempty string that can be divided into two identical substrings, e.g. abcabc. An approximate single tandem repeat is one in which the substrings are similar, but not identical, e.g. abcdaacd. ..."
Abstract

Cited by 89 (3 self)
 Add to MetaCart
(Show Context)
A perfect single tandem repeat is defined as a nonempty string that can be divided into two identical substrings, e.g. abcabc. An approximate single tandem repeat is one in which the substrings are similar, but not identical, e.g. abcdaacd.
Faster Algorithms for String Matching with k Mismatches
 J. OF ALGORITHMS
, 2000
"... The string matching with mismatches problem is that of finding the number of mismatches between a pattern P of length m and every length m substring of the text T . Currently, the fastest algorithms for this problem are the following. The LandauVishkin algorithm finds all locations where the pat ..."
Abstract

Cited by 70 (15 self)
 Add to MetaCart
The string matching with mismatches problem is that of finding the number of mismatches between a pattern P of length m and every length m substring of the text T . Currently, the fastest algorithms for this problem are the following. The LandauVishkin algorithm finds all locations where the pattern has at most k errors (where k is part of the input) in time O(nk). The Abrahamson algorithm finds the number of mismatches at every location in time O(n p m log m). We present
Overlap Matching
 Information and Computation
, 2001
"... We propose a new paradigm for string matching, namely structural matching. In structural matching, the text and pattern contents are not important. Rather, some areas in the text and patterns are singled out, say intervals. A "match" is a text location where a specified relation between ..."
Abstract

Cited by 34 (9 self)
 Add to MetaCart
(Show Context)
We propose a new paradigm for string matching, namely structural matching. In structural matching, the text and pattern contents are not important. Rather, some areas in the text and patterns are singled out, say intervals. A "match" is a text location where a specified relation between the text and pattern areas is satisfied. In particular we define the structural matching problem of Overlap (Parity) Matching. We seek the text locations where all overlaps of the given pattern and text intervals have even length. We show that this problem can be solved in time O(n log m), where the text length is n and the pattern length is m. As an application of overlap matching, we show how to reduce the String Matching with Swaps problem to the overlap matching problem. The String Matching with Swaps problem is the problem of string matching in the presence of local swaps. The best known deterministic upper bound for this problem was O(nm 1/3 log m log #) for a general alphabet #, wher...
Pattern Matching with Swaps
, 1997
"... Let a text string T of n symbols and a pattern string P of m symbols from alphabet \Sigma be given. A swapped version T 0 of T is a length n string derived from T by a series of local swaps, (i.e. t 0 ` / t `+1 and t 0 `+1 / t ` ) where each element can participate in no more than one swap. ..."
Abstract

Cited by 29 (9 self)
 Add to MetaCart
Let a text string T of n symbols and a pattern string P of m symbols from alphabet \Sigma be given. A swapped version T 0 of T is a length n string derived from T by a series of local swaps, (i.e. t 0 ` / t `+1 and t 0 `+1 / t ` ) where each element can participate in no more than one swap. The Pattern Matching with Swaps problem is that of finding all locations i for which there exists a swapped version T 0 of T where there is an exact matching of P in location i of T 0 . It has been an open problem whether swapped matching can be done in less than O(mn) time. In this paper we show the first algorithm that solves the pattern matching with swaps problem in time o(mn). We present an algorithm whose time complexity is O(nm 1=3 log m log 2 min(m; j\Sigmaj)) for a general alphabet \Sigma. Key Words: Design and analysis of algorithms, combinatorial algorithms on words, pattern matching, pattern matching with swaps, nonstandard pattern matching. Department of Mathematics...
A simple algorithm for detecting circular permutations in proteins
 Bioinformatics
, 1999
"... Results: A simple and efficient algorithm that runs in time N 2 is presented. The algorithm is based on duplication of one of the two sequences, and then performing modified version of the standard dynamic programming algorithms. While the algorithm is not guaranteed to find the optimal results, we ..."
Abstract

Cited by 29 (1 self)
 Add to MetaCart
Results: A simple and efficient algorithm that runs in time N 2 is presented. The algorithm is based on duplication of one of the two sequences, and then performing modified version of the standard dynamic programming algorithms. While the algorithm is not guaranteed to find the optimal results, we present data that indicate that in practice the algorithm performs very well. Availability: A Fortran program that calculates the optimal edit distance under circular permutation is available upon request from the authors. Contact:
Approximate Text Searching
, 1998
"... This thesis focuses on the problem of text retrieval allowing errors, also called \approximate " string matching. The problem is to nd a pattern in a text, where the pattern and the text may have \errors". This problem has received a lot of attention in recent years because of its applicat ..."
Abstract

Cited by 22 (6 self)
 Add to MetaCart
(Show Context)
This thesis focuses on the problem of text retrieval allowing errors, also called \approximate " string matching. The problem is to nd a pattern in a text, where the pattern and the text may have \errors". This problem has received a lot of attention in recent years because of its applications in many areas, such as information retrieval, computational biology and signal processing, to name a few. The aim of this work is the development and analysis of novel algorithms to deal with the problem under various conditions, as well as a better understanding of the problem itself and its statistical behavior. Although our results are valid in many dierent areas, we focus our attention on typical text searching for information retrieval applications. This makes some ranges of values for the parameters of the problem more interesting than others. We have divided this presentation in two parts. The rst one deals with online approximate string matching, i.e. when there is no time or space to preprocess the text. These algorithms are the core of oline algorithms as well. Online searching is the area of the problem where better algorithms existed. We have obtained new bounds for the probability of an approximate match of a pattern in
Article Approximate String Matching with Compressed Indexes
, 2009
"... algorithms ..."
(Show Context)
All semilocal longest common subsequences in subquadratic time
 In Proceedings of CSR
, 2006
"... subquadratic time ..."
(Show Context)
Approximate Swapped Matching
"... Let a text string T of n symbols and a pattern string P of m symbols from alphabet \Sigma be given. A swapped version P 0 of P is a length m string derived from P by a series of local swaps, (i.e. p 0 ` / p `+1 and p 0 `+1 / p ` ) where each element can participate in no more than one swap. ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
Let a text string T of n symbols and a pattern string P of m symbols from alphabet \Sigma be given. A swapped version P 0 of P is a length m string derived from P by a series of local swaps, (i.e. p 0 ` / p `+1 and p 0 `+1 / p ` ) where each element can participate in no more than one swap. The Pattern Matching with Swaps problem is that of finding all locations i of T for which there exists a swapped version P 0 of P with an exact matching of P 0 in location i of T . Recently, some efficient algorithms were developed for this problem. Their time complexity is better than the best known algorithms for pattern matching with mismatches. However, the Approximate Pattern Matching with Swaps problem was not known to be solved faster than the pattern matching with mismatches problem. In the Approximate Pattern Matching with Swaps problem the output is, for every text location i where there is a swapped match of P , the number of swaps necessary to create the swapped vers...