Results 1 -
7 of
7
Finding Approximate Matches in Large Lexicons
- SOFTWARE - PRACTICE AND EXPERIENCE
, 1995
"... Approximate string matching is used for spelling correction and personal name matching. In this paper we show how to use string matching techniques in conjunction with lexicon indexes to find approximate matches in a large lexicon. We test several lexicon indexing techniques, including n-grams and p ..."
Abstract
-
Cited by 27 (5 self)
- Add to MetaCart
Approximate string matching is used for spelling correction and personal name matching. In this paper we show how to use string matching techniques in conjunction with lexicon indexes to find approximate matches in a large lexicon. We test several lexicon indexing techniques, including n-grams and permuted lexicons, and several string matching techniques, including string similarity measures and phonetic coding. We propose methods for combining these techniques, and show experimentally that these combinations yield good retrieval effectiveness while keeping index size and retrieval time low. Our experiments also suggest that, in contrast to previous claims, phonetic codings are markedly inferior to string distance measures, which are demonstrated to be suitable for both spelling correction and personal name matching. KEY WORDS: pattern matching; string indexing; approximate matching; compressed inverted files; Soundex
Contextual Spelling Correction Using Latent Semantic Analysis
- In Proc. 5th Conference on Applied Natural Language Processing
, 1997
"... Contextual spelling errors are defined as the use of an incorrect, though valid, word in a particular sentence or context. Traditional spelling checkers flag misspelled words, but they do not typically attempt to identify words that are used incorrectly in a sentence. We explore the use of Lat ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
Contextual spelling errors are defined as the use of an incorrect, though valid, word in a particular sentence or context. Traditional spelling checkers flag misspelled words, but they do not typically attempt to identify words that are used incorrectly in a sentence. We explore the use of Latent Semantic Analysis for correcting these incorrectly used words and the results are compared to earlier work based on a Bayesian classifier.
A Natural Language Parser With Interleaved Spelling Correction Supporting Lexical Functional Grammar And Ill-Formed Input
, 1994
"... xiii I. INTRODUCTION 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Outline of This Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Summary of the Research . . . . . . . . . . . . . . . . . . . . . . . . . ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
xiii I. INTRODUCTION 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Outline of This Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Summary of the Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 II. REVIEW OF RELEVANT RESEARCH 5 2.1 Syntactic Ambiguity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 The Parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.1 Bottom Up and Top Down Parsers. . . . . . . . . . . . . . . . . . . . 7 2.2.2 Ellipsis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.3 Syntactic Error Recovery. . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.4 Chart Parser. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2....
Automatic Expansion of Abbreviations by using Context and Character Information ⋆ Abstract
"... Unknown words such as proper nouns, abbreviations, and acronyms are a major obstacle in text processing. Abbreviations, in particular, are difficult to read/process because they are often domain-specific. In this paper, we propose a method for automatic expansion of abbreviations by using context an ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Unknown words such as proper nouns, abbreviations, and acronyms are a major obstacle in text processing. Abbreviations, in particular, are difficult to read/process because they are often domain-specific. In this paper, we propose a method for automatic expansion of abbreviations by using context and character information. In previous studies dictionaries were used to search for abbreviation expansion candidates (candidates words for original form of abbreviations) to expand abbreviations. We use a corpus with few abbreviations from the same field instead of a dictionary. We calculate the adequacy of abbreviation expansion candidates based on the similarity between the context of the target abbreviation and that of its expansion candidate. The similarity is calculated using a vector space model in which each vector element consists of words surrounding the target abbreviation and those of its expansion candidate. Experiments using approximately 10,000 documents in the field of aviation showed that the accuracy of the proposed method is 10 % higher than that of previously developed methods.
The New C Standard: Sentence 782
"... This is "sentence 782" extracted from the book "The New C Standard: An Economic and Cultural Commentary" ..."
Abstract
- Add to MetaCart
This is "sentence 782" extracted from the book "The New C Standard: An Economic and Cultural Commentary"
Statistics and Graphotactical Rules in Finding OCR-errors
, 2000
"... This thesis describes two experiments in nding errors in optically scanned Swedish without relying on a lexicon. First, statistics were used to nd unexpectedly frequent trigrams and correction rules for these cases were created. The rules were then tested and compared to a hand corrected version of ..."
Abstract
- Add to MetaCart
This thesis describes two experiments in nding errors in optically scanned Swedish without relying on a lexicon. First, statistics were used to nd unexpectedly frequent trigrams and correction rules for these cases were created. The rules were then tested and compared to a hand corrected version of the test text. Secondly, Bengt Sigurd's model of Swedish phonotax was used to detect words with phonotactically illegal beginning or end.
A Parser For Real-Time Speech Synthesis Of Conversational Texts
- In Proceedings of the ACL Conference on Applied Natural Language Processing
, 1992
"... In this paper, we concern ourselves with an application of text-to-speech for speech-impaired, deaf, and hard of hearing people. The application is unusual because it requires real-time synthesis of unedited, spontaneously generated conversational texts transmitted via a Telecommunications Device fo ..."
Abstract
- Add to MetaCart
In this paper, we concern ourselves with an application of text-to-speech for speech-impaired, deaf, and hard of hearing people. The application is unusual because it requires real-time synthesis of unedited, spontaneously generated conversational texts transmitted via a Telecommunications Device for the Deaf (TDD). We describe a parser that we have implemented as a front end for a version of the Bell Laboratories text-to-speech synthesizer (Olive and Liberman 1985). The parser prepares TDD texts for synthesis by (a) performing lexical regularization of abbreviations and some non-standard forms, and (b) identifying prosodic phrase boundaries. Rules for identifying phrase boundaries are derived from the prosodic phrase grammar described in-Bachenko and Fitzpatrick (1990). Following the parent analysis, these rules use a mix of syntactic and phonological factors to identify phrase boundaries but, unlike the parent system, they forgo building any hierarchical structure in order to bypass the need for a stacking mechamsm; this permits the system to operate in near real time. As a component of the text-to-speech system, the parser has undergone rigorous testing during a successful three-month field trial at an AT&T telecommunications center in California. In addition, laboratory evaluations indicate that the parser's performance compares favorably with human judgments about phrasing.

