Results 1  10
of
14
Approximate string matching
 ACM Computing Surveys
, 1980
"... Approximate matching of strings is reviewed with the aim of surveying techniques suitable for finding an item in a database when there may be a spelling mistake or other error in the keyword. The methods found are classified as either equivalence or similarity problems. Equivalence problems are seen ..."
Abstract

Cited by 141 (0 self)
 Add to MetaCart
Approximate matching of strings is reviewed with the aim of surveying techniques suitable for finding an item in a database when there may be a spelling mistake or other error in the keyword. The methods found are classified as either equivalence or similarity problems. Equivalence problems are seen to be readily solved using canonical forms. For sinuiarity problems difference measures are surveyed, with a full description of the wellestablmhed dynamic programming method relating this to the approach using probabilities and likelihoods. Searches for approximate matches in large sets using a difference function are seen to be an open problem still, though several promising ideas have been suggested. Approximate matching (error correction) during parsing is briefly reviewed.
Fast String Correction with LevenshteinAutomata
 INTERNATIONAL JOURNAL OF DOCUMENT ANALYSIS AND RECOGNITION
, 2002
"... The Levenshteindistance between two words is the minimal number of insertions, deletions or substitutions that are needed to transform one word into the other. Levenshteinautomata of degree n for a word W are defined as finite state automata that regognize the set of all words V where the Levensht ..."
Abstract

Cited by 28 (5 self)
 Add to MetaCart
The Levenshteindistance between two words is the minimal number of insertions, deletions or substitutions that are needed to transform one word into the other. Levenshteinautomata of degree n for a word W are defined as finite state automata that regognize the set of all words V where the Levenshteindistance between V and W does not exceed n. We show how to compute, for any fixed bound n and any input word W , a deterministic Levenshteinautomaton of degree n for W in time linear in the length of W . Given an electronic dictionary that is implemented in the form of a trie or a finite state automaton, the Levenshteinautomaton for W can be used to control search in the lexicon in such a way that exactly the lexical words V are generated where the Levenshteindistance between V and W does not exceed the given bound. This leads to a very fast method for correcting corrupted input words of unrestricted text using large electronic dictionaries. We then introduce a second method that avoids the explicit computation of Levenshteinautomata and leads to even improved eciency. We also describe how to extend both methods to variants of the Levenshteindistance where further primitive edit operations (transpositions, merges and splits) may be used.
Decoding substitution ciphers by means of word matching with application to OCR
 IEEE Transactions on Pattern Analysis and Machine Intelligence 9
, 1987
"... the clinical results indicated that no change occurred between these two nights. Fig. 1 also shows that these two nights are close to each other. V. CONCLUSIONS AND DISCUSSION This study was performed to explore to what extend Markovian modeling of sleep patterns, coupled with pattern recognition te ..."
Abstract

Cited by 27 (8 self)
 Add to MetaCart
(Show Context)
the clinical results indicated that no change occurred between these two nights. Fig. 1 also shows that these two nights are close to each other. V. CONCLUSIONS AND DISCUSSION This study was performed to explore to what extend Markovian modeling of sleep patterns, coupled with pattern recognition techniques, can be used to describe normal and abnormal sleep patterns, and to detect the sleep changes between different nights for the same abnormal subject. The latter may be indicative of the degree of improvement via a treatment procedure. The comparison of the transition probability matrices was done using a X2clustering and a correspondence analysis approach. Most of the normals fell into one cluster, whereas the abnormals were more dispersed. Particularly, the correspondence analysis not only indicated the distances between the normal and abnormal sleep patterns,
Approximate Text Searching
, 1998
"... This thesis focuses on the problem of text retrieval allowing errors, also called \approximate " string matching. The problem is to nd a pattern in a text, where the pattern and the text may have \errors". This problem has received a lot of attention in recent years because of its applicat ..."
Abstract

Cited by 22 (6 self)
 Add to MetaCart
(Show Context)
This thesis focuses on the problem of text retrieval allowing errors, also called \approximate " string matching. The problem is to nd a pattern in a text, where the pattern and the text may have \errors". This problem has received a lot of attention in recent years because of its applications in many areas, such as information retrieval, computational biology and signal processing, to name a few. The aim of this work is the development and analysis of novel algorithms to deal with the problem under various conditions, as well as a better understanding of the problem itself and its statistical behavior. Although our results are valid in many dierent areas, we focus our attention on typical text searching for information retrieval applications. This makes some ranges of values for the parameters of the problem more interesting than others. We have divided this presentation in two parts. The rst one deals with online approximate string matching, i.e. when there is no time or space to preprocess the text. These algorithms are the core of oline algorithms as well. Online searching is the area of the problem where better algorithms existed. We have obtained new bounds for the probability of an approximate match of a pattern in
Fast Approximate Search in Large Dictionaries
 COMPUTATIONAL LINGUISTICS
, 2004
"... The need to correct garbled strings arises in many areas of natural language processing. If a dictionary is available that covers all possible input tokens, a natural set of candidates for correcting an erroneous input P is the set of all words in the dictionary for which the Levenshtein distance to ..."
Abstract

Cited by 14 (4 self)
 Add to MetaCart
The need to correct garbled strings arises in many areas of natural language processing. If a dictionary is available that covers all possible input tokens, a natural set of candidates for correcting an erroneous input P is the set of all words in the dictionary for which the Levenshtein distance to P does not exceed a given (small) bound k. In this article we describe methods for efficiently selecting such candidate sets. After introducing as a starting point a basic correction method based on the concept of a "universal Levenshtein automaton," we show how two filtering methods known from the field of approximate text search can be used to improve the basic procedure in a significant way. The first method, which uses standard dictionaries plus dictionaries with reversed words, leads to very short correction times for most classes of input strings. Our evaluation results demonstrate that correction times for fixeddistance bounds depend on the expected number of correction candidates, which decreases for longer input words. Similarly the choice of an optimal filtering method depends on the length of the input words.
Approximate personal namematching through finitestate graphs
 Journal of the American Society for Information Science and Technology
, 2007
"... ..."
(Show Context)
Multilingual StringtoString Correction in Grif, a structured editor
, 1992
"... : This paper describes the integration of a spelling corrector into the structured editor Grif. This corrector is based on the Levenshtein metric concept which is particularly efficient for string correction. This method can be implemented efficiently and can produce good results with short response ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
: This paper describes the integration of a spelling corrector into the structured editor Grif. This corrector is based on the Levenshtein metric concept which is particularly efficient for string correction. This method can be implemented efficiently and can produce good results with short response time on a new RISC workstation even with large dictionaries. The integration within Grif enables checking of textual content of structured documents where large vocabularies are required. Thanks to an attribute language the editor can automatically adapt the correction to the language and can apply a specific word recognition algorithmand dictionaries, thus allowing checking and correcting of multilingual documents. Keywords: spelling correction, integration, multilingualism, structured documents. Introduction Grif is an interactive system for the production of complex documents. It is essentially intended for handling structured documents [Furuta88] [Andr'e89] [Quint90]. Thus it is well...
Fast retrieval of electronic messages that contain mistyped words or spelling errors
 IEEE hnsactions on System, Man and Cybernetics
, 1996
"... Abstract—This paper presents an index structure for retrieving electronic messages that contain mistyped words or spelling errors. Given a query string (e.g., a search key), we want to find those messages that approximately contain the query, i.e., certain inserts, deletes and mismatches are allowed ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract—This paper presents an index structure for retrieving electronic messages that contain mistyped words or spelling errors. Given a query string (e.g., a search key), we want to find those messages that approximately contain the query, i.e., certain inserts, deletes and mismatches are allowed when matching the query with a word (or phrase) in the messages. Our approach is to store the messages sequentially in a database and hash their “fingerprints ” into a number of “fingerprint files. ” When the query is given, its fingerprints are also hashed into the files and a histogram of votes is constructed on the messages. We derive a lower bound, based on which one can prune a large number of nonqualifying messages (i.e., those whose votes are below the lower bound) during searching. The paper presents some experimental results, which demonstrate the effectiveness of the index structure and the lower bound. I.
© 1988 by Kluwer Academic Publishers'. Computerized Correction of Phonographic Errors
"... Abstract: When computers are confronted with text (C.A.I., ..."
© Pattern Recognition Society. THE USE OF CONTEXT IN PATTERN RECOGNITION*'t"
, 1977
"... Abstract The importance of contextual information, at various different levels, for the satisfactory solution of pattern recognition problems is illustrated by examples. A tutorial survey of techniques for using contextual information in pattern recognition is presented. Emphasis is placed on the p ..."
Abstract
 Add to MetaCart
Abstract The importance of contextual information, at various different levels, for the satisfactory solution of pattern recognition problems is illustrated by examples. A tutorial survey of techniques for using contextual information in pattern recognition is presented. Emphasis is placed on the problems of image classification and text recognition, where the text is in the form of machine and handprinted characters, cursive script, and speech. The related problems of scene analysis, natural language understanding, and errorcorrecting compilers are only lightly touched upon. Character recognition Speech recognition Pattern recognition correction Spelling correction Image classification language understanding Context Artificial intelligence