Results 1 
9 of
9
Pattern Recognition of Strings With Substitutions, Insertions, Deletions and Generalized Transpositions
 Pattern Recognition
"... We study the problem of recognizing a string Y which is the noisy version of some unknown string X * chosen from a finite dictionary, H. The traditional case which has been extensively studied in the literature is the one in which Y contains substitution, insertion and deletion (SID) errors. Altho ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
(Show Context)
We study the problem of recognizing a string Y which is the noisy version of some unknown string X * chosen from a finite dictionary, H. The traditional case which has been extensively studied in the literature is the one in which Y contains substitution, insertion and deletion (SID) errors. Although some work has been done to extend the traditional set of edit operations to include the straightforward transposition of adjacent characters 2 [14] the problem is unsolved when the transposed characters are themselves subsequently substituted, as is typical in cursive and typewritten script, in molecular biology and in noisy chaincoded boundaries. In this paper we present the first reported solution to the analytic problem of editing one string X to another, Y using these four edit operations. A scheme for obtaining the optimal edit operations has also been given. Both these solutions are optimal for the infinite alphabet case. Using these algorithms we present a syntactic pattern rec...
The Normalized String Editing Problem Revisited
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1996
"... Marzal and Vidal [8] recently considered the problem of computing the normalized edit distance between two strings, and reported experimental results which demonstrated the use of the measure to recognize handwritten characters. Their paper formulated the theoretical properties of the measure and de ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
Marzal and Vidal [8] recently considered the problem of computing the normalized edit distance between two strings, and reported experimental results which demonstrated the use of the measure to recognize handwritten characters. Their paper formulated the theoretical properties of the measure and developed two algorithms to compute it. In this short communication we shall demonstrate how this measure is related to an auxiliary measure already defined in the literature  the interstring constrained edit distance [10,11,15]. Since the normalized edit distance can be computed efficiently using the latter, the analytic and experimental results reported in [8] can be obtained just as accurately, but more efficiently, using the strategies presented here. I. PROBLEM STATEMENT In the comparison of text patterns, phonemes and biological macromolecules a question that has attracted much interest is that of quantifying the dissimilarity between strings. A review of such distance measures and ...
A Formal Theory for Optimal and Information Theoretic Syntactic Pattern Recognition
"... In this paper we present a foundational basis for optimal and information theoretic syntactic pattern recognition. We do this by developing a rigorous model, M * , for channels which permit arbitrarily distributed substitution, deletion and insertion syntactic errors. More explicitly, if A is any ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
In this paper we present a foundational basis for optimal and information theoretic syntactic pattern recognition. We do this by developing a rigorous model, M * , for channels which permit arbitrarily distributed substitution, deletion and insertion syntactic errors. More explicitly, if A is any finite alphabet and A * the set of words over A, we specify a stochastically consistent scheme by which a string U A * can be transformed into any Y A * by means of arbitrarily distributed substitution, deletion and insertion operations. The scheme is shown to be Functionally Complete and stochastically consistent. Apart from the synthesis aspects, we also deal with the analysis of such a model and derive a technique by which Pr[YU], the probability of receiving Y given that U was transmitted, can be computed in cubic time using dynamic programming. One of the salient features of this scheme is that it demonstrates how dynamic programming can be applied to evaluate quantities involv...
Noisy Subsequence Recognition Using Constrained String Editing Involving Substitutions, Insertions, Deletions and Generalized Transpositions
 In ICSC
, 1994
"... . We consider a problem which can greatly enhance the areas of cursive script recognition and the recognition of printed character sequences. This problem involves recognizing words/strings by processing their noisy subsequences. Let X * be any unknown word from a finite dictionary H. Let U be a ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
. We consider a problem which can greatly enhance the areas of cursive script recognition and the recognition of printed character sequences. This problem involves recognizing words/strings by processing their noisy subsequences. Let X * be any unknown word from a finite dictionary H. Let U be any arbitrary subsequence of X * . We study the problem of estimating X * by processing Y, a noisy version of U. Y contains substitution, insertion, deletion and generalized transposition errors  the latter occurring when transposed characters are themselves subsequently substituted. We solve the noisy subsequence recognition problem by defining and using the constrained edit distance between X H and Y subject to any arbitrary edit constraint involving the number and type of edit operations to be performed. An algorithm to compute this constrained edit distance has been presented. Using these algorithms we present a syntactic Pattern Recognition (PR) scheme which corrects noisy tex...
G.Álvarez. A Method for Clustering Web Attacks using Edit Distance. http://132.236.180.11/PS_cache/cs/pdf/0304/0304007.pdf (visited 29.nov.2005
"... Abstract. Cluster analysis often serves as the initial step in the process of data classification. In this paper, the problem of clustering different length input data is considered. The edit distance as the minimum number of elementary edit operations needed to transform one vector into another is ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Cluster analysis often serves as the initial step in the process of data classification. In this paper, the problem of clustering different length input data is considered. The edit distance as the minimum number of elementary edit operations needed to transform one vector into another is used. A heuristic for clustering unequal length vectors, analogue to the well known kmeans algorithm is described and analyzed. This heuristic determines cluster centroids expanding shorter vectors to the lengths of the longest ones in each cluster in a specific way. It is shown that the time and space complexities of the heuristic are linear in the number of input vectors. Experimental results on real data originating from a system for classification of Web attacks are given. 1
A New TwoStage Search Procedure for Misuse Detection
"... A new twostage indexless search procedure is presented that makes use of the constrained edit distance in IDS misuse detection attack database search. The procedure consists of a preselection phase, in which the original dataset is reduced and the exhaustive search phase for the database records s ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
A new twostage indexless search procedure is presented that makes use of the constrained edit distance in IDS misuse detection attack database search. The procedure consists of a preselection phase, in which the original dataset is reduced and the exhaustive search phase for the database records selected in the first phase. The maximum number of consecutive deletions represents the constraint. Besides eliminating the need for finer exhaustive search in the attack database records in which the detected subsequence is too distorted, the new search procedure also enables better control over the search process in the case of deliberate distortion of the attack strings. Experimental results obtained on the SNORT signature files show that the proposed method offers average search data set reduction in the typical cases of more than 70 % compared to the method that uses the unconstrained edit distance.
Adjusting Fuzzy Automata for String Similarity Meassuring
"... joserra63unavarra.e~ In this paper1, we introduce a fuzzy automaton for computing the similarity between pairs of strings and a genetic method for adjusting its parameters. The fuzzy automaton models the edit operations needed to transform any string into another one. The selection of appropriate fu ..."
Abstract
 Add to MetaCart
joserra63unavarra.e~ In this paper1, we introduce a fuzzy automaton for computing the similarity between pairs of strings and a genetic method for adjusting its parameters. The fuzzy automaton models the edit operations needed to transform any string into another one. The selection of appropriate fuzzy operations and fuzzy membership values for the transitions leads to improve the system performance for a particular application.
Fuzzy lexical matching
, 2012
"... Being able to automatically correct spelling errors is useful in cases where the set of documents is too vast to involve human interaction. In this bachelor's thesis, we investigate an implementation that attempts to perform such corrections using a lexicon and edit distance measure. We compare ..."
Abstract
 Add to MetaCart
Being able to automatically correct spelling errors is useful in cases where the set of documents is too vast to involve human interaction. In this bachelor's thesis, we investigate an implementation that attempts to perform such corrections using a lexicon and edit distance measure. We compare the familiar Levenshtein and DamerauLevenshtein distances to modifications where each edit operation is assigned an individual weight. We find that the primary benefit of using this form of edit distance over the original is not a higher rate of correction, but a lower susceptibility to false friends. However, deriving the correct weights for each edit operation turns out to be a harder problem than anticipated. While a weighted edit distance can theoretically be implemented effectively, a deeper analysis of the costs of edit operations is necessary to make such an approach practical.
NORTH HOLLAND String Alignment With Substitution, Insertion, Deletion, Squashing, and Expansion Operations*
"... Let X and Y be any two strings of finite length. The problem of transforming X to Y using the edit operations of substitution, deletion, and insertion has been extensively studied in the literature. The problem can be solved in quadratic time if the edit operations are extended to include the operat ..."
Abstract
 Add to MetaCart
(Show Context)
Let X and Y be any two strings of finite length. The problem of transforming X to Y using the edit operations of substitution, deletion, and insertion has been extensively studied in the literature. The problem can be solved in quadratic time if the edit operations are extended to include the operation of transposition of adjacent characters, and is NPcomplete if the characters can be edited repeatedly. In this paper we consider the problem of transforming X to Y when the set of edit operations is extended to include the squashing and expansion operations. Whereas in the squashing operation two (or more) contiguous characters of X can be transformed into a single character of Y, in the expansion operation a single character in X may be expanded into two or more contiguous characters of Y. These operations are typically found in the recognition of cursive script. A quadratic time solution to the problem has been presented. This solution is optimal for the infinitealphabet case. The strategy to compute the sequence of edit operations is also presented. 1.