Results 1  10
of
19
On the Editing Distance between Undirected Acyclic Graphs
, 1995
"... We consider the problem of comparing CUAL graphs (Connected, Undirected, Acyclic graphs with nodes being Labeled). This problem is motivated by the study of information retrieval for biochemical and molecular databases. Suppose we define the distance between two CUAL graphs G1 and G2 to be the weig ..."
Abstract

Cited by 91 (7 self)
 Add to MetaCart
(Show Context)
We consider the problem of comparing CUAL graphs (Connected, Undirected, Acyclic graphs with nodes being Labeled). This problem is motivated by the study of information retrieval for biochemical and molecular databases. Suppose we define the distance between two CUAL graphs G1 and G2 to be the weighted number of edit operations (insert node, delete node and relabel node) to transform G1 to G2. By reduction from exact cover by 3sets, one can show that finding the distance between two CUAL graphs is NPcomplete. In view of the hardness of the problem, we propose a constrained distance metric, called the degree2 distance, by requiring that any node to be inserted (deleted) have no more than 2 neighbors. With this metric, we present an efficient algorithm to solve the problem. The algorithm runs in time O(N_1 N_2 D&sup2;) for general weighting edit operations and in time O(N_1 N_2 D &radic;D log D) for integral weighting edit operations, where N_i, i = 1, 2, is the number of nodes in G_i, D = min{d_1, d_2} and d_i is the maximum degree of G_i.
Large Vocabulary Recognition of Online Handwritten Cursive Words
, 1995
"... A critical feature of any computer system is its interface with the user. This has led to the development of user interface technologies such as mouse, touchscreen and penbased input devices. Since handwriting is one of the most familiar communication media, penbased interfaces combined with automa ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
(Show Context)
A critical feature of any computer system is its interface with the user. This has led to the development of user interface technologies such as mouse, touchscreen and penbased input devices. Since handwriting is one of the most familiar communication media, penbased interfaces combined with automatic handwriting recognition offers a very easy and natural input method. Penbased interfaces are also essential in mobile computing because they are scalable. Recent advances in penbased hardware and wireless communication have been influential factors in the renewed interest in online recognition systems. Online handwriting recognition is fundamentally a pattern classification task; the objective is to take an input pattern, the handwritten signal collected online via a digitizing device, and classify it as one of a prespecified set of words (i.e., the system's lexicon). Because exact recognition is very difficult, a lexicon is used to constrain the recognition output to a known vocab...
A Computational Theory of Visual Word Recognition
, 1988
"... A computational theory of the visual recognition of words of text is developed. The theory, based on previous studies of how people read, includes three stages: hypothesis generation, hypothesis testing, and global contextual analysis. Hypothesis generation uses gross visual features, such as those ..."
Abstract

Cited by 16 (6 self)
 Add to MetaCart
A computational theory of the visual recognition of words of text is developed. The theory, based on previous studies of how people read, includes three stages: hypothesis generation, hypothesis testing, and global contextual analysis. Hypothesis generation uses gross visual features, such as those that could be extracted from the peripheral presentation of a word, to provide expectations about word identity. Hypothesis testing integrates the information
determined by hypothesis generation with more detailed features that are extracted from the word image. Global contextual analysis provides syntactic and semantic information that inﬂuences hypothesis testing.
Algorithmic realization of the computational theory also consists of three stages. Hypothesis generation is implemented by extracting simple features from an input word and using those features to ﬁnd a set of dictionary words with those features in common. Hypothesis testing uses this set of words to drive further selective image analysis that matches the input to one of the members of this set. This is done with a tree of feature tests that can be executed in several different ways to recognize an input word. Global contextual analysis is implemented with a process that uses knowledge of typical wordclass transitions to improve the
performance of the hypothesis testing stage. This is executable in parallel with hypothesis testing.
This methodology is in sharp contrast to conventional machine reading algorithms which usually segment a word into characters and recognize the individual characters. Thus, a word decision is arrived at as a composite of character decisions. The algorithm presented here avoids the segmentation stage and does not require an exhaustive analysis of each character and thus is a character recognition algorithm.
Statistical projections show the viability of all three stages of the proposed approach. Experiments with images of text show that the methodology performs well in difﬁcult
situations, such as touching and overlapping characters.
Pattern Recognition of Strings With Substitutions, Insertions, Deletions and Generalized Transpositions
 Pattern Recognition
"... We study the problem of recognizing a string Y which is the noisy version of some unknown string X * chosen from a finite dictionary, H. The traditional case which has been extensively studied in the literature is the one in which Y contains substitution, insertion and deletion (SID) errors. Altho ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
(Show Context)
We study the problem of recognizing a string Y which is the noisy version of some unknown string X * chosen from a finite dictionary, H. The traditional case which has been extensively studied in the literature is the one in which Y contains substitution, insertion and deletion (SID) errors. Although some work has been done to extend the traditional set of edit operations to include the straightforward transposition of adjacent characters 2 [14] the problem is unsolved when the transposed characters are themselves subsequently substituted, as is typical in cursive and typewritten script, in molecular biology and in noisy chaincoded boundaries. In this paper we present the first reported solution to the analytic problem of editing one string X to another, Y using these four edit operations. A scheme for obtaining the optimal edit operations has also been given. Both these solutions are optimal for the infinite alphabet case. Using these algorithms we present a syntactic pattern rec...
Generalizing Edit Distance To Incorporate Domain Information: Handwritten text recognition as a case study
 Pattern Recognition
, 1996
"... In this paper the DamerauLevenshtein string difference metric is generalized in two ways to more accurately compensate for the types of errors that are present in the script recognition domain. First, the basic dynamic programming method for computing such a measure is extended to allow for merges, ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
In this paper the DamerauLevenshtein string difference metric is generalized in two ways to more accurately compensate for the types of errors that are present in the script recognition domain. First, the basic dynamic programming method for computing such a measure is extended to allow for merges, splits and twoletter substitutions. Second, edit operations are refined into categories according to the effect they have on the visual "appearance" of words. A set of recognizerindependent constraints is developed to reflect the severity of the information lost due to each operation. These constraints are solved to assign specific costs to the operations. Experimental results on 2,335 corrupted strings and a lexicon of 21,299 words show higher correcting rates than with the original form. Keywords: string distance, string matching, spelling error correction, word recognition and correction, text editing, script recognition and postprocessing 1 INTRODUCTION Since the goal of text recog...
The Normalized String Editing Problem Revisited
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1996
"... Marzal and Vidal [8] recently considered the problem of computing the normalized edit distance between two strings, and reported experimental results which demonstrated the use of the measure to recognize handwritten characters. Their paper formulated the theoretical properties of the measure and de ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
Marzal and Vidal [8] recently considered the problem of computing the normalized edit distance between two strings, and reported experimental results which demonstrated the use of the measure to recognize handwritten characters. Their paper formulated the theoretical properties of the measure and developed two algorithms to compute it. In this short communication we shall demonstrate how this measure is related to an auxiliary measure already defined in the literature  the interstring constrained edit distance [10,11,15]. Since the normalized edit distance can be computed efficiently using the latter, the analytic and experimental results reported in [8] can be obtained just as accurately, but more efficiently, using the strategies presented here. I. PROBLEM STATEMENT In the comparison of text patterns, phonemes and biological macromolecules a question that has attracted much interest is that of quantifying the dissimilarity between strings. A review of such distance measures and ...
String Taxonomy Using Learning Automata
 IEEE Transactions on Systems, Man and Cybernetics
, 1997
"... A typical syntactic pattern recognition (PR) problem involves comparing a noisy string with every element of a dictionary, H. The problem of classification can be greatly simplified if the dictionary is partitioned into a set of subdictionaries. In this case, the classification can be hierarchical ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
A typical syntactic pattern recognition (PR) problem involves comparing a noisy string with every element of a dictionary, H. The problem of classification can be greatly simplified if the dictionary is partitioned into a set of subdictionaries. In this case, the classification can be hierarchical  the noisy string is first compared to a representative element of each subdictionary and the closest match within the subdictionary is subsequently located. Indeed, the entire problem of subdividing a set of strings into subsets where each subset contains "similar" strings has been referred to as the "String Taxonomy Problem". To our knowledge there is no reported solution to this problem (see footnote on Page 2). In this paper we shall present a learningautomaton based solution to string taxonomy. The solution utilizes the Object Migrating Automaton (OMA) whose power in clustering objects and images [33,35] has been reported. The power of the scheme for string taxonomy has been demons...
A Formal Theory for Optimal and Information Theoretic Syntactic Pattern Recognition
"... In this paper we present a foundational basis for optimal and information theoretic syntactic pattern recognition. We do this by developing a rigorous model, M * , for channels which permit arbitrarily distributed substitution, deletion and insertion syntactic errors. More explicitly, if A is any ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
In this paper we present a foundational basis for optimal and information theoretic syntactic pattern recognition. We do this by developing a rigorous model, M * , for channels which permit arbitrarily distributed substitution, deletion and insertion syntactic errors. More explicitly, if A is any finite alphabet and A * the set of words over A, we specify a stochastically consistent scheme by which a string U A * can be transformed into any Y A * by means of arbitrarily distributed substitution, deletion and insertion operations. The scheme is shown to be Functionally Complete and stochastically consistent. Apart from the synthesis aspects, we also deal with the analysis of such a model and derive a technique by which Pr[YU], the probability of receiving Y given that U was transmitted, can be computed in cubic time using dynamic programming. One of the salient features of this scheme is that it demonstrates how dynamic programming can be applied to evaluate quantities involv...
Noisy Subsequence Recognition Using Constrained String Editing Involving Substitutions, Insertions, Deletions and Generalized Transpositions
 In ICSC
, 1994
"... . We consider a problem which can greatly enhance the areas of cursive script recognition and the recognition of printed character sequences. This problem involves recognizing words/strings by processing their noisy subsequences. Let X * be any unknown word from a finite dictionary H. Let U be a ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
. We consider a problem which can greatly enhance the areas of cursive script recognition and the recognition of printed character sequences. This problem involves recognizing words/strings by processing their noisy subsequences. Let X * be any unknown word from a finite dictionary H. Let U be any arbitrary subsequence of X * . We study the problem of estimating X * by processing Y, a noisy version of U. Y contains substitution, insertion, deletion and generalized transposition errors  the latter occurring when transposed characters are themselves subsequently substituted. We solve the noisy subsequence recognition problem by defining and using the constrained edit distance between X H and Y subject to any arbitrary edit constraint involving the number and type of edit operations to be performed. An algorithm to compute this constrained edit distance has been presented. Using these algorithms we present a syntactic Pattern Recognition (PR) scheme which corrects noisy tex...
Optimal and Information Theoretic Syntactic Pattern Recognition for Traditional Errors
 In Advances in Structural and Syntactic Pattern Recognition
, 1996
"... In this paper we present a foundational basis for optimal and information theoretic syntactic pattern recognition. We do this by developing a rigorous model, M * , for channels which permit arbitrarily distributed substitution, deletion and insertion syntactic errors. More explicitly, if A is any ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
In this paper we present a foundational basis for optimal and information theoretic syntactic pattern recognition. We do this by developing a rigorous model, M * , for channels which permit arbitrarily distributed substitution, deletion and insertion syntactic errors. More explicitly, if A is any finite alphabet and A * the set of words over A, we specify a stochastically consistent scheme by which a string U A * can be transformed into any Y A * by means of arbitrarily distributed substitution, deletion and insertion operations. The scheme is shown to be Functionally Complete and stochastically consistent. Apart from the synthesis aspects, we also deal with the analysis of such a model and derive a technique by which Pr[YU], the probability of receiving Y given that U was transmitted, can be computed in cubic time using dynamic programming. Experimental results which involve dictionaries with strings of lengths between 7 and 14 with an overall average noise of 39.75 % demons...