Results 1 
9 of
9
Fast String Correction with LevenshteinAutomata
 INTERNATIONAL JOURNAL OF DOCUMENT ANALYSIS AND RECOGNITION
, 2002
"... The Levenshteindistance between two words is the minimal number of insertions, deletions or substitutions that are needed to transform one word into the other. Levenshteinautomata of degree n for a word W are defined as finite state automata that regognize the set of all words V where the Levensht ..."
Abstract

Cited by 28 (5 self)
 Add to MetaCart
The Levenshteindistance between two words is the minimal number of insertions, deletions or substitutions that are needed to transform one word into the other. Levenshteinautomata of degree n for a word W are defined as finite state automata that regognize the set of all words V where the Levenshteindistance between V and W does not exceed n. We show how to compute, for any fixed bound n and any input word W , a deterministic Levenshteinautomaton of degree n for W in time linear in the length of W . Given an electronic dictionary that is implemented in the form of a trie or a finite state automaton, the Levenshteinautomaton for W can be used to control search in the lexicon in such a way that exactly the lexical words V are generated where the Levenshteindistance between V and W does not exceed the given bound. This leads to a very fast method for correcting corrupted input words of unrestricted text using large electronic dictionaries. We then introduce a second method that avoids the explicit computation of Levenshteinautomata and leads to even improved eciency. We also describe how to extend both methods to variants of the Levenshteindistance where further primitive edit operations (transpositions, merges and splits) may be used.
Decoding substitution ciphers by means of word matching with application to OCR
 IEEE Transactions on Pattern Analysis and Machine Intelligence 9
, 1987
"... the clinical results indicated that no change occurred between these two nights. Fig. 1 also shows that these two nights are close to each other. V. CONCLUSIONS AND DISCUSSION This study was performed to explore to what extend Markovian modeling of sleep patterns, coupled with pattern recognition te ..."
Abstract

Cited by 27 (8 self)
 Add to MetaCart
the clinical results indicated that no change occurred between these two nights. Fig. 1 also shows that these two nights are close to each other. V. CONCLUSIONS AND DISCUSSION This study was performed to explore to what extend Markovian modeling of sleep patterns, coupled with pattern recognition techniques, can be used to describe normal and abnormal sleep patterns, and to detect the sleep changes between different nights for the same abnormal subject. The latter may be indicative of the degree of improvement via a treatment procedure. The comparison of the transition probability matrices was done using a X2clustering and a correspondence analysis approach. Most of the normals fell into one cluster, whereas the abnormals were more dispersed. Particularly, the correspondence analysis not only indicated the distances between the normal and abnormal sleep patterns,
Fast Approximate Search in Large Dictionaries
 COMPUTATIONAL LINGUISTICS
, 2004
"... The need to correct garbled strings arises in many areas of natural language processing. If a dictionary is available that covers all possible input tokens, a natural set of candidates for correcting an erroneous input P is the set of all words in the dictionary for which the Levenshtein distance to ..."
Abstract

Cited by 14 (4 self)
 Add to MetaCart
The need to correct garbled strings arises in many areas of natural language processing. If a dictionary is available that covers all possible input tokens, a natural set of candidates for correcting an erroneous input P is the set of all words in the dictionary for which the Levenshtein distance to P does not exceed a given (small) bound k. In this article we describe methods for efficiently selecting such candidate sets. After introducing as a starting point a basic correction method based on the concept of a "universal Levenshtein automaton," we show how two filtering methods known from the field of approximate text search can be used to improve the basic procedure in a significant way. The first method, which uses standard dictionaries plus dictionaries with reversed words, leads to very short correction times for most classes of input strings. Our evaluation results demonstrate that correction times for fixeddistance bounds depend on the expected number of correction candidates, which decreases for longer input words. Similarly the choice of an optimal filtering method depends on the length of the input words.
Approximate personal namematching through finitestate graphs
 Journal of the American Society for Information Science and Technology
, 2006
"... This article shows how finitestate methods can be employed in a new and different task: the conflation of personal name variants in standard forms. In bibliographic databases and citation index systems, variant forms create problems of inaccuracy that affect information retrieval, the quality of in ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
This article shows how finitestate methods can be employed in a new and different task: the conflation of personal name variants in standard forms. In bibliographic databases and citation index systems, variant forms create problems of inaccuracy that affect information retrieval, the quality of information from databases, and the citation statistics used for the evaluation of scientists’ work. A number of approximate string matching techniques have been developed to validate variant forms, based on similarity and equivalence relations. We classify the personal name variants as nonvalid and valid forms. In establishing an equivalence relation between valid variants and the standard form of its equivalence class, we defend the application of finitestate transducers. The process of variant identification requires the elaboration of: (a) binary matrices and (b) finitestate graphs. This procedure was tested on samples of author names from bibliographic records, selected from the Library and Information Science Abstracts and Science Citation Index Expanded databases. The evaluation involved calculating the measures of precision and recall, based on completeness and accuracy. The results demonstrate the usefulness of this approach, although it should be complemented with methods based on similarity relations for the recognition of spelling variants and misspellings.
Multilingual StringtoString Correction in Grif, a structured editor
, 1992
"... : This paper describes the integration of a spelling corrector into the structured editor Grif. This corrector is based on the Levenshtein metric concept which is particularly efficient for string correction. This method can be implemented efficiently and can produce good results with short response ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
: This paper describes the integration of a spelling corrector into the structured editor Grif. This corrector is based on the Levenshtein metric concept which is particularly efficient for string correction. This method can be implemented efficiently and can produce good results with short response time on a new RISC workstation even with large dictionaries. The integration within Grif enables checking of textual content of structured documents where large vocabularies are required. Thanks to an attribute language the editor can automatically adapt the correction to the language and can apply a specific word recognition algorithmand dictionaries, thus allowing checking and correcting of multilingual documents. Keywords: spelling correction, integration, multilingualism, structured documents. Introduction Grif is an interactive system for the production of complex documents. It is essentially intended for handling structured documents [Furuta88] [Andr'e89] [Quint90]. Thus it is well...
Fast retrieval of electronic messages that contain mistyped words or spelling errors
 IEEE hnsactions on System, Man and Cybernetics
, 1996
"... Abstract—This paper presents an index structure for retrieving electronic messages that contain mistyped words or spelling errors. Given a query string (e.g., a search key), we want to find those messages that approximately contain the query, i.e., certain inserts, deletes and mismatches are allowed ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract—This paper presents an index structure for retrieving electronic messages that contain mistyped words or spelling errors. Given a query string (e.g., a search key), we want to find those messages that approximately contain the query, i.e., certain inserts, deletes and mismatches are allowed when matching the query with a word (or phrase) in the messages. Our approach is to store the messages sequentially in a database and hash their “fingerprints ” into a number of “fingerprint files. ” When the query is given, its fingerprints are also hashed into the files and a histogram of votes is constructed on the messages. We derive a lower bound, based on which one can prune a large number of nonqualifying messages (i.e., those whose votes are below the lower bound) during searching. The paper presents some experimental results, which demonstrate the effectiveness of the index structure and the lower bound. I.
Aspelling Checker
"... One of my concerns when conducting the assessment was that the carer was a single female parent and the child who was to be fostered was a teenage male. My concern was that the carer would be able, firstly to be able to protect the child and fulfill the needs of the young person ..."
Abstract
 Add to MetaCart
One of my concerns when conducting the assessment was that the carer was a single female parent and the child who was to be fostered was a teenage male. My concern was that the carer would be able, firstly to be able to protect the child and fulfill the needs of the young person
Approximate Text Searching
, 1998
"... This thesis focuses on the problem of text retrieval allowing errors, also called "approximate" string matching. The problem is to find a pattern in a text, where the pattern and the text may have "errors". This problem has received a lot of attention in recent years because of its applications in m ..."
Abstract
 Add to MetaCart
This thesis focuses on the problem of text retrieval allowing errors, also called "approximate" string matching. The problem is to find a pattern in a text, where the pattern and the text may have "errors". This problem has received a lot of attention in recent years because of its applications in many areas, such as information retrieval, computational biology and signal processing, to name a few. The aim of this work is the development and analysis of novel algorithms to deal with the problem under various conditions, as well as a better understanding of the problem itself and its statistical behavior. Although our results are valid in many different areas, we focus our attention on typical text searching for information retrieval applications. This makes some ranges of values for the parameters of the problem more interesting than others. We have divided this presentation in two parts. The first one deals with online approximate string matching, i.e. when there is no time or space ...
© 1988 by Kluwer Academic Publishers'. Computerized Correction of Phonographic Errors
"... Abstract: When computers are confronted with text (C.A.I., ..."