Results 1  10
of
817
A Guided Tour to Approximate String Matching
 ACM COMPUTING SURVEYS
, 1999
"... We survey the current techniques to cope with the problem of string matching allowing errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining t ..."
Abstract

Cited by 584 (38 self)
 Add to MetaCart
We survey the current techniques to cope with the problem of string matching allowing errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms and their complexities. We present a number of experiments to compare the performance of the different algorithms and show which are the best choices according to each case. We conclude with some future work directions and open problems.
An Introduction to Machine Translation
, 1992
"... Abstract. In the last ten years there has been a significant amount of research in Machine Translation within a “new ” paradigm of empirical approaches, often labelled collectively as “Examplebased” approaches. The first manifestation of this approach caused some surprise and hostility among observ ..."
Abstract

Cited by 406 (9 self)
 Add to MetaCart
(Show Context)
Abstract. In the last ten years there has been a significant amount of research in Machine Translation within a “new ” paradigm of empirical approaches, often labelled collectively as “Examplebased” approaches. The first manifestation of this approach caused some surprise and hostility among observers more used to different ways of working, but the techniques were quickly adopted and adapted by many researchers, often creating hybrid systems. This paper reviews the various research efforts within this paradigm reported to date, and attempts a categorisation of different manifestations of the general approach.
Simple fast algorithms for the editing distance between trees and related problems
 SIAM J. COMPUT
, 1989
"... Ordered labeled trees are trees in which the lefttoright order among siblings is. significant. The distance between two ordered trees is considered to be the weighted number of edit operations (insert, delete, and modify) to transform one tree to another. The problem of approximate tree matching i ..."
Abstract

Cited by 405 (12 self)
 Add to MetaCart
(Show Context)
Ordered labeled trees are trees in which the lefttoright order among siblings is. significant. The distance between two ordered trees is considered to be the weighted number of edit operations (insert, delete, and modify) to transform one tree to another. The problem of approximate tree matching is also considered. Specifically, algorithms are designed to answer the following kinds of questions: 1. What is the distance between two trees? 2. What is the minimum distance between T and T when zero or more subtrees can be removed from T2 3. Let the pruning of a tree at node n mean removing all the descendants of node n. The analogous question for prunings as for subtrees is answered. A dynamic programming algorithm is presented to solve the three questions in sequential time O(I Tll x IT2lxmin (depth ( Tt), leaves ( T)) x min (depth(T2), leaves(T2))) and space O(Ir, x lT21) compared with o(I T,I IT=I x(depth(T)): x (depth(T2))) for the best previous published algorithm due to Tai [J. Assoc. Comput. Mach., 26 (1979), pp. 422433]. Further, the algorithm presented here can be parallelized to give
Optimal alignments in linear space
 CABIOS
, 1988
"... Space, not time, is often the limiting factor when computing optimal sequence alignments, and a number of recent papers in the biology literature have proposed spacesaving strategies. However, a 1975 computer science paper by Hirschberg presented a method that is superior to the newer proposals, bo ..."
Abstract

Cited by 285 (4 self)
 Add to MetaCart
(Show Context)
Space, not time, is often the limiting factor when computing optimal sequence alignments, and a number of recent papers in the biology literature have proposed spacesaving strategies. However, a 1975 computer science paper by Hirschberg presented a method that is superior to the newer proposals, both in theory and in practice. The goal of this note is to give Hirschberg’s idea the visibility it deserves by developing a linearspace version of Gotoh’s algorithm, which accommodates affine gap penalties. A portable Csoftware package implementing this algorithm is available on the BIONET free of charge.
Algorithms for the longest common subsequence problem
 J. ACM
, 1977
"... AaS~ACT Two algorithms are presented that solve the longest common subsequence problem The first algorithm is applicable in the general case and requires O(pn + n log n) time where p is the length of the longest common subsequence The second algorithm requires time bounded by O(p(m + 1 p)log n) In ..."
Abstract

Cited by 221 (2 self)
 Add to MetaCart
(Show Context)
AaS~ACT Two algorithms are presented that solve the longest common subsequence problem The first algorithm is applicable in the general case and requires O(pn + n log n) time where p is the length of the longest common subsequence The second algorithm requires time bounded by O(p(m + 1 p)log n) In the common speoal case where p is close to m, this algorithm takes much less time than n ~ KEY WORDS AND PHRASES ' subsequence, common subsequence, algorithm CR CATEOORIES 3 73, 3 79, 5 25, 5 39
An O(ND) Difference Algorithm and Its Variations
 Algorithmica
, 1986
"... The problems of finding a longest common subsequence of two sequences A and B and a shortest edit script for transforming A into B have long been known to be dual problems. In this paper, they are shown to be equivalent to finding a shortest/longest path in an edit graph. Using this perspective, a s ..."
Abstract

Cited by 209 (4 self)
 Add to MetaCart
(Show Context)
The problems of finding a longest common subsequence of two sequences A and B and a shortest edit script for transforming A into B have long been known to be dual problems. In this paper, they are shown to be equivalent to finding a shortest/longest path in an edit graph. Using this perspective, a simple O(ND) time and space algorithm is developed where N is the sum of the lengths of A and B and D is the size of the minimum edit script for A and B. The algorithm performs well when differences are small (sequences are similar) and is consequently fast in typical applications. The algorithm is shown to have O(N +D expectedtime performance under a basic stochastic model. A refinement of the algorithm requires only O(N) space, and the use of suffix trees leads to an O(NlgN +D ) time variation.
Recognition of Shapes by Editing Their Shock Graphs
 Proc. Int’l Conf. Computer Vision
, 2001
"... Abstract—This paper presents a novel framework for the recognition of objects based on their silhouettes. The main idea is to measure the distance between two shapes as the minimum extent of deformation necessary for one shape to match the other. Since the space of deformations is very highdimensio ..."
Abstract

Cited by 204 (8 self)
 Add to MetaCart
(Show Context)
Abstract—This paper presents a novel framework for the recognition of objects based on their silhouettes. The main idea is to measure the distance between two shapes as the minimum extent of deformation necessary for one shape to match the other. Since the space of deformations is very highdimensional, three steps are taken to make the search practical: 1) define an equivalence class for shapes based on shockgraph topology, 2) define an equivalence class for deformation paths based on shockgraph transitions, and 3) avoid complexityincreasing deformation paths by moving toward shockgraph degeneracy. Despite these steps, which tremendously reduce the search requirement, there still remain numerous deformation paths to consider. To that end, we employ an editdistance algorithm for shock graphs that finds the optimal deformation path in polynomial time. The proposed approach gives intuitive correspondences for a variety of shapes and is robust in the presence of a wide range of visual transformations. The recognition rates on two distinct databases of 99 and 216 shapes each indicate highly successful within category matches (100 percent in top three matches), which render the framework potentially usable in a range of shapebased recognition applications. Index Terms—Shape deformation, shock graphs, graph matching, edit distance, shape matching, object recognition, dynamic programming. æ 1
A PATTERN MATCHING MODEL FOR MISUSE INTRUSION DETECTION
"... This paper describes a generic model of matching that can be usefully applied to misuse intrusion detection. The model is based on Colored Petri Nets. Guards define the context in which signatures are matched. The notion of start and final states, and paths between them define the set of event seque ..."
Abstract

Cited by 191 (7 self)
 Add to MetaCart
(Show Context)
This paper describes a generic model of matching that can be usefully applied to misuse intrusion detection. The model is based on Colored Petri Nets. Guards define the context in which signatures are matched. The notion of start and final states, and paths between them define the set of event sequences matched by the net. Partial order matching can also be specified in this model. The main benefits of the model are its generality, portability and flexibility.
Approximate string matching
 ACM Computing Surveys
, 1980
"... Approximate matching of strings is reviewed with the aim of surveying techniques suitable for finding an item in a database when there may be a spelling mistake or other error in the keyword. The methods found are classified as either equivalence or similarity problems. Equivalence problems are seen ..."
Abstract

Cited by 158 (0 self)
 Add to MetaCart
Approximate matching of strings is reviewed with the aim of surveying techniques suitable for finding an item in a database when there may be a spelling mistake or other error in the keyword. The methods found are classified as either equivalence or similarity problems. Equivalence problems are seen to be readily solved using canonical forms. For sinuiarity problems difference measures are surveyed, with a full description of the wellestablmhed dynamic programming method relating this to the approach using probabilities and likelihoods. Searches for approximate matches in large sets using a difference function are seen to be an open problem still, though several promising ideas have been suggested. Approximate matching (error correction) during parsing is briefly reviewed.