Results 1  10
of
49
A Guided Tour to Approximate String Matching
 ACM COMPUTING SURVEYS
, 1999
"... We survey the current techniques to cope with the problem of string matching allowing errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining t ..."
Abstract

Cited by 598 (36 self)
 Add to MetaCart
We survey the current techniques to cope with the problem of string matching allowing errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms and their complexities. We present a number of experiments to compare the performance of the different algorithms and show which are the best choices according to each case. We conclude with some future work directions and open problems.
An Algorithm for Approximate Tandem Repeats
 In Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 684 of Lecture Notes in Computer Science
, 1993
"... A perfect single tandem repeat is defined as a nonempty string that can be divided into two identical substrings, e.g. abcabc. An approximate single tandem repeat is one in which the substrings are similar, but not identical, e.g. abcdaacd. ..."
Abstract

Cited by 88 (3 self)
 Add to MetaCart
(Show Context)
A perfect single tandem repeat is defined as a nonempty string that can be divided into two identical substrings, e.g. abcabc. An approximate single tandem repeat is one in which the substrings are similar, but not identical, e.g. abcdaacd.
A subquadratic sequence alignment algorithm for unrestricted scoring matrices
 SIAM J. Comput
"... Abstract The classical algorithm for computing the similarity between two sequences Our algorithm applies to both local and global alignment computations. The speedup is achieved by dividing the dynamic programming matrix into variable sized blocks, as induced by LempelZiv parsing of both string ..."
Abstract

Cited by 76 (5 self)
 Add to MetaCart
Abstract The classical algorithm for computing the similarity between two sequences Our algorithm applies to both local and global alignment computations. The speedup is achieved by dividing the dynamic programming matrix into variable sized blocks, as induced by LempelZiv parsing of both strings, and utilizing the inherent periodic nature of both strings. This leads to an O(n2/logn) algorithm for an input of constant alphabet size. For most texts, the time complexity is actually O(hn2/logn) where h _< 1 is the entropy of the text.
Faster Algorithms for String Matching with k Mismatches
"... The string matching with mismatches problem is that of finding the number of mismatches between pattern P of length m and every length m substring of the text T. Currently, the best algorithms for this problem are the following. The LandauVishkin algorithm finds all locations where the pattern ha ..."
Abstract

Cited by 67 (14 self)
 Add to MetaCart
The string matching with mismatches problem is that of finding the number of mismatches between pattern P of length m and every length m substring of the text T. Currently, the best algorithms for this problem are the following. The LandauVishkin algorithm finds all locations where the pattern has at most k errors (where k is part of the input) in time O(nk). The Abrahamson algorithm finds the number of mismatches at every location in time O(npm log m). We present an algorithm that is faster than both. Our algorithm finds all locations where the pattern has at most k errors in time O(npk log k). We also show an algorithm that solves the above problem in time O((n + nk 3 m) log k).
Overlap Matching
 Information and Computation
, 2001
"... We propose a new paradigm for string matching, namely structural matching. In structural matching, the text and pattern contents are not important. Rather, some areas in the text and patterns are singled out, say intervals. A "match" is a text location where a specified relation between ..."
Abstract

Cited by 33 (9 self)
 Add to MetaCart
(Show Context)
We propose a new paradigm for string matching, namely structural matching. In structural matching, the text and pattern contents are not important. Rather, some areas in the text and patterns are singled out, say intervals. A "match" is a text location where a specified relation between the text and pattern areas is satisfied. In particular we define the structural matching problem of Overlap (Parity) Matching. We seek the text locations where all overlaps of the given pattern and text intervals have even length. We show that this problem can be solved in time O(n log m), where the text length is n and the pattern length is m. As an application of overlap matching, we show how to reduce the String Matching with Swaps problem to the overlap matching problem. The String Matching with Swaps problem is the problem of string matching in the presence of local swaps. The best known deterministic upper bound for this problem was O(nm 1/3 log m log #) for a general alphabet #, wher...
A simple algorithm for detecting circular permutations in proteins
 Bioinformatics
, 1999
"... Results: A simple and efficient algorithm that runs in time N 2 is presented. The algorithm is based on duplication of one of the two sequences, and then performing modified version of the standard dynamic programming algorithms. While the algorithm is not guaranteed to find the optimal results, we ..."
Abstract

Cited by 29 (1 self)
 Add to MetaCart
Results: A simple and efficient algorithm that runs in time N 2 is presented. The algorithm is based on duplication of one of the two sequences, and then performing modified version of the standard dynamic programming algorithms. While the algorithm is not guaranteed to find the optimal results, we present data that indicate that in practice the algorithm performs very well. Availability: A Fortran program that calculates the optimal edit distance under circular permutation is available upon request from the authors. Contact:
Approximate Text Searching
, 1998
"... This thesis focuses on the problem of text retrieval allowing errors, also called "approximate" string matching. The problem is to nd a pattern in a text, where the pattern and the text may have "errors". This problem has received a lot of attention in recent years because of its ..."
Abstract

Cited by 25 (7 self)
 Add to MetaCart
(Show Context)
This thesis focuses on the problem of text retrieval allowing errors, also called "approximate" string matching. The problem is to nd a pattern in a text, where the pattern and the text may have "errors". This problem has received a lot of attention in recent years because of its applications in many areas, such as information retrieval, computational biology and signal processing, to name a few. The aim of this work is the development and analysis of novel algorithms to deal with the problem under various conditions, as well as a better understanding of the problem itself and its statistical behavior. Although our results are valid in many dierent areas, we focus our attention on typical text searching for information retrieval applications. This makes some ranges of values for the parameters of the problem more interesting than others. We have divided this presentation in two parts. The rst one deals with online approximate string matching, i.e. when there is no time or space to preprocess the text. These algorithms are the core of oline algorithms as well. Online searching is the area of the problem where better algorithms existed. We have obtained new bounds for the probability of an approximate match of a pattern in
Article Approximate String Matching with Compressed Indexes
, 2009
"... algorithms ..."
(Show Context)
All semilocal longest common subsequences in subquadratic time
 In Proceedings of CSR
, 2006
"... subquadratic time ..."
(Show Context)