Results 1  10
of
42
A Guided Tour to Approximate String Matching
 ACM Computing Surveys
, 1999
"... We survey the current techniques to cope with the problem of string matching allowing errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining t ..."
Abstract

Cited by 447 (38 self)
 Add to MetaCart
We survey the current techniques to cope with the problem of string matching allowing errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms and their complexities. We present a number of experiments to compare the performance of the different algorithms and show which are the best choices according to each case. We conclude with some future work directions and open problems. 1
An Algorithm for Approximate Tandem Repeats
 In Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 684 of Lecture Notes in Computer Science
, 1993
"... A perfect single tandem repeat is defined as a nonempty string that can be divided into two identical substrings, e.g. abcabc. An approximate single tandem repeat is one in which the substrings are similar, but not identical, e.g. abcdaacd. ..."
Abstract

Cited by 73 (2 self)
 Add to MetaCart
(Show Context)
A perfect single tandem repeat is defined as a nonempty string that can be divided into two identical substrings, e.g. abcabc. An approximate single tandem repeat is one in which the substrings are similar, but not identical, e.g. abcdaacd.
Faster Algorithms for String Matching with k Mismatches
 J. OF ALGORITHMS
, 2000
"... The string matching with mismatches problem is that of finding the number of mismatches between a pattern P of length m and every length m substring of the text T . Currently, the fastest algorithms for this problem are the following. The LandauVishkin algorithm finds all locations where the pat ..."
Abstract

Cited by 53 (12 self)
 Add to MetaCart
The string matching with mismatches problem is that of finding the number of mismatches between a pattern P of length m and every length m substring of the text T . Currently, the fastest algorithms for this problem are the following. The LandauVishkin algorithm finds all locations where the pattern has at most k errors (where k is part of the input) in time O(nk). The Abrahamson algorithm finds the number of mismatches at every location in time O(n p m log m). We present
Approximate Text Searching
, 1998
"... This thesis focuses on the problem of text retrieval allowing errors, also called \approximate " string matching. The problem is to nd a pattern in a text, where the pattern and the text may have \errors". This problem has received a lot of attention in recent years because of its applicat ..."
Abstract

Cited by 22 (6 self)
 Add to MetaCart
(Show Context)
This thesis focuses on the problem of text retrieval allowing errors, also called \approximate " string matching. The problem is to nd a pattern in a text, where the pattern and the text may have \errors". This problem has received a lot of attention in recent years because of its applications in many areas, such as information retrieval, computational biology and signal processing, to name a few. The aim of this work is the development and analysis of novel algorithms to deal with the problem under various conditions, as well as a better understanding of the problem itself and its statistical behavior. Although our results are valid in many dierent areas, we focus our attention on typical text searching for information retrieval applications. This makes some ranges of values for the parameters of the problem more interesting than others. We have divided this presentation in two parts. The rst one deals with online approximate string matching, i.e. when there is no time or space to preprocess the text. These algorithms are the core of oline algorithms as well. Online searching is the area of the problem where better algorithms existed. We have obtained new bounds for the probability of an approximate match of a pattern in
Pattern Matching with Swaps
, 1997
"... Let a text string T of n symbols and a pattern string P of m symbols from alphabet \Sigma be given. A swapped version T 0 of T is a length n string derived from T by a series of local swaps, (i.e. t 0 ` / t `+1 and t 0 `+1 / t ` ) where each element can participate in no more than one swap. ..."
Abstract

Cited by 20 (8 self)
 Add to MetaCart
Let a text string T of n symbols and a pattern string P of m symbols from alphabet \Sigma be given. A swapped version T 0 of T is a length n string derived from T by a series of local swaps, (i.e. t 0 ` / t `+1 and t 0 `+1 / t ` ) where each element can participate in no more than one swap. The Pattern Matching with Swaps problem is that of finding all locations i for which there exists a swapped version T 0 of T where there is an exact matching of P in location i of T 0 . It has been an open problem whether swapped matching can be done in less than O(mn) time. In this paper we show the first algorithm that solves the pattern matching with swaps problem in time o(mn). We present an algorithm whose time complexity is O(nm 1=3 log m log 2 min(m; j\Sigmaj)) for a general alphabet \Sigma. Key Words: Design and analysis of algorithms, combinatorial algorithms on words, pattern matching, pattern matching with swaps, nonstandard pattern matching. Department of Mathematics...
A simple algorithm for detecting circular permutations in proteins
 Bioinformatics
, 1999
"... Results: A simple and efficient algorithm that runs in time N 2 is presented. The algorithm is based on duplication of one of the two sequences, and then performing modified version of the standard dynamic programming algorithms. While the algorithm is not guaranteed to find the optimal results, we ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
Results: A simple and efficient algorithm that runs in time N 2 is presented. The algorithm is based on duplication of one of the two sequences, and then performing modified version of the standard dynamic programming algorithms. While the algorithm is not guaranteed to find the optimal results, we present data that indicate that in practice the algorithm performs very well. Availability: A Fortran program that calculates the optimal edit distance under circular permutation is available upon request from the authors. Contact:
Overlap Matching
 Information and Computation
, 2001
"... We propose a new paradigm for string matching, namely structural matching. In structural matching, the text and pattern contents are not important. Rather, some areas in the text and patterns are singled out, say intervals. A "match" is a text location where a specified relation between ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
(Show Context)
We propose a new paradigm for string matching, namely structural matching. In structural matching, the text and pattern contents are not important. Rather, some areas in the text and patterns are singled out, say intervals. A "match" is a text location where a specified relation between the text and pattern areas is satisfied. In particular we define the structural matching problem of Overlap (Parity) Matching. We seek the text locations where all overlaps of the given pattern and text intervals have even length. We show that this problem can be solved in time O(n log m), where the text length is n and the pattern length is m. As an application of overlap matching, we show how to reduce the String Matching with Swaps problem to the overlap matching problem. The String Matching with Swaps problem is the problem of string matching in the presence of local swaps. The best known deterministic upper bound for this problem was O(nm 1/3 log m log #) for a general alphabet #, wher...
All semilocal longest common subsequences in subquadratic time
 In Proceedings of CSR
, 2006
"... subquadratic time ..."
(Show Context)
Semilocal string comparison: Algorithmic techniques and applications
 Mathematics in Computer Science 1(4) (2008) 571–603 See also arXiv: 0707.3619
"... The longest common subsequence (LCS) problem is a classical problem in computer science. The semilocal LCS problem is a generalisation of the LCS problem, arising naturally in the context of string comparison. In this work, we present a number of algorithmic techniques related to the semilocal LCS ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
The longest common subsequence (LCS) problem is a classical problem in computer science. The semilocal LCS problem is a generalisation of the LCS problem, arising naturally in the context of string comparison. In this work, we present a number of algorithmic techniques related to the semilocal LCS problem, and give a number of algorithmic applications of these techniques. Summarising the presented results, we conclude that semilocal string comparison turns out to be a useful algorithmic plugin, which unifies, and often improves on, a number of previous approaches to various substring and subsequencerelated problems. Contents
Algorithms For Local Alignment With Length Constraints
 Preceedings of Latin American Theoretical Informatics (LATIN 02
, 2001
"... The local sequence alignment problem is the detection of similar subsequences in two given sequences of lengths n m. Unfortunately the common notion of local alignment suffers from some wellknown anomalies which result from not taking into account the lengths of the aligned subsequences. We int ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
The local sequence alignment problem is the detection of similar subsequences in two given sequences of lengths n m. Unfortunately the common notion of local alignment suffers from some wellknown anomalies which result from not taking into account the lengths of the aligned subsequences. We introduce the length restricted local alignment problem which includes as a constraint an upper limit T on the length of one of the subsequences to be aligned. We propose an efficient approximation algorithm using which we can find a solution satisfying the length bound, and whose score is within difference \Delta of the optimum score for any given positive integer \Delta in time O(nmT=\Delta) using O(mT=\Delta) space. We also introduce the cyclic local alignment problem and show how our idea can be applied to this case as well. This is a dual approach to the wellknown cyclic edit distance problem.