• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Correcting Real-Word Spelling Errors by Restoring Lexical Cohesion (2001)

by Graeme Hirst, Alexander Budanitsky
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 29
Next 10 →

Distributional measures of concept-distance: A task-oriented evaluation

by Saif Mohammad, Graeme Hirst - IN PROCEEDINGS OF THE CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP-2006 , 2006
"... We propose a framework to derive the distance between concepts from distributional measures of word co-occurrences. We use the categories in a published thesaurus as coarse-grained concepts, allowing all possible distance values to be stored in a concept–concept matrix roughly.01 % the size of that ..."
Abstract - Cited by 14 (4 self) - Add to MetaCart
We propose a framework to derive the distance between concepts from distributional measures of word co-occurrences. We use the categories in a published thesaurus as coarse-grained concepts, allowing all possible distance values to be stored in a concept–concept matrix roughly.01 % the size of that created by existing measures. We show that the newly proposed concept-distance measures outperform traditional distributional word-distance measures in the tasks of (1) ranking word pairs in order of semantic distance, and (2) correcting realword spelling errors. In the latter task, of all the WordNet-based measures, only that proposed by Jiang and Conrath outperforms the best distributional conceptd-istance measures.

Real-word spelling correction with trigrams: A reconsideration of the Mays, Damerau, and Mercer model. http://ftp.cs.toronto.edu/ pub/gh/WilcoxOHearn-etal-2006.pdf

by Graeme Hirst, Er Budanitsky , 2006
"... The trigram-based noisy-channel model of real-word spelling-error correction that was presented by Mays, Damerau, and Mercer in 1991 has never been adequately evaluated or compared with other methods. We analyze the advantages and limitations of the method, and present a new evaluation that enables ..."
Abstract - Cited by 10 (0 self) - Add to MetaCart
The trigram-based noisy-channel model of real-word spelling-error correction that was presented by Mays, Damerau, and Mercer in 1991 has never been adequately evaluated or compared with other methods. We analyze the advantages and limitations of the method, and present a new evaluation that enables a meaningful comparison with the WordNet-based method of Hirst and Budanitsky. The trigram method is found to be superior, even on content words. We then improve the method further and experiment with a new variation that optimizes over fixed-length windows instead of over sentences. 1

Real-word spelling correction using google web 1tn-gram data set

by Aminul Islam, Diana Inkpen - In CIKM , 2009
"... We present a method for detecting and correcting multiple real-word spelling errors using the Google Web 1T 3-gram data set and a normalized and modified version of the Longest Common Subsequence (LCS) string matching algorithm. Our method is focused mainly on how to improve the detection recall (th ..."
Abstract - Cited by 7 (0 self) - Add to MetaCart
We present a method for detecting and correcting multiple real-word spelling errors using the Google Web 1T 3-gram data set and a normalized and modified version of the Longest Common Subsequence (LCS) string matching algorithm. Our method is focused mainly on how to improve the detection recall (the fraction of errors correctly detected) and the correction recall (the fraction of errors correctly amended), while keeping the respective precisions (the fraction of detections or amendments that are correct) as high as possible. Evaluation results on a standard data set show that our method outperforms two other methods on the same task. 1

Measuring Semantic Relatedness Using People and WordNet

by Beata Beigman Klebanov
"... In this paper, we (1) propose a new dataset for testing the degree of relatedness between pairs of words; (2) propose a new WordNet-based measure of relatedness, and evaluate it on the new dataset. ..."
Abstract - Cited by 6 (1 self) - Add to MetaCart
In this paper, we (1) propose a new dataset for testing the degree of relatedness between pairs of words; (2) propose a new WordNet-based measure of relatedness, and evaluate it on the new dataset.

On Detection of Malapropisms by Multistage Collocation Testing

by Igor A. Bolshakov, Er Gelbukh - In: Proc. NLDB-2003, 8th International Workshop on Applications of Natural Language to Information Systems, June 23–25, 2003, Burg, Germany.Bonner Köllen Verlag , 2003
"... www.gelbukh.com Abstract: Malapropism is a (real-word) error in a text consisting in unintended replacement of one content word by another existing content word similar in sound but semantically incompatible with the context and thus destructing text cohesion, e.g.: they travel around the word. We p ..."
Abstract - Cited by 6 (5 self) - Add to MetaCart
www.gelbukh.com Abstract: Malapropism is a (real-word) error in a text consisting in unintended replacement of one content word by another existing content word similar in sound but semantically incompatible with the context and thus destructing text cohesion, e.g.: they travel around the word. We present an algorithm of malapropism detection and correction based on evaluating the cohesion. As a measure of semantic compatibility of words we consider their ability to form syntactically linked and semantically admissible word combinations (collocations), e.g: travel (around the) world. With this, text cohesion at a content word is measured as the number of collocations it forms with the words in its immediate context. We detect malapropisms as words forming no collocations in the context. To test whether two words can form a collocation, we consider two types of resources: a collocation DB and an Internet search engine, e.g., Google. We illustrate the proposed method by classifying, tracing, and evaluating several English malapropisms. 1

Lexical Normalisation of Short Text Messages: Makn Sens a #twitter

by Bo Han, Timothy Baldwin
"... Twitter provides access to large volumes of data in real time, but is notoriously noisy, hampering its utility for NLP. In this paper, we target out-of-vocabulary words in short text messages and propose a method for identifying and normalising ill-formed words. Our method uses a classifier to detec ..."
Abstract - Cited by 3 (2 self) - Add to MetaCart
Twitter provides access to large volumes of data in real time, but is notoriously noisy, hampering its utility for NLP. In this paper, we target out-of-vocabulary words in short text messages and propose a method for identifying and normalising ill-formed words. Our method uses a classifier to detect ill-formed words, and generates correction candidates based on morphophonemic similarity. Both word similarity and context are then exploited to select the most probable correction candidate for the word. The proposed method doesn’t require any annotations, and achieves state-of-the-art performance over an SMS corpus and a novel dataset based on Twitter. 1

Machine Learning Approach for Context-Sensitive Error Detection

by Hisham Al-mubaid, Shashanka Nagula - Proc. Int’l Conf. Intelligent Computing and Information Systems (ICICIS ’05 , 2005
"... Context-sensitive spelling errors are those errors resulting from mistyping or mispronouncing a word, and the resulting misspelled word is a valid language/dictionary word. For example, “This building is bigger then our building”: The word ‘then ’ here is a context-sensitive spelling error and the i ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Context-sensitive spelling errors are those errors resulting from mistyping or mispronouncing a word, and the resulting misspelled word is a valid language/dictionary word. For example, “This building is bigger then our building”: The word ‘then ’ here is a context-sensitive spelling error and the intended word is ‘than’. This paper describes an effective approach for detecting context-sensitive spelling errors. Detecting and correcting context-sensitive spelling errors is a very difficult and important problem that needs careful consideration. Working with this problem will involve facing the very difficult problem of natural language semantics. The proposed approach is a machine-learning-based approach. The approach has been fully implemented and evaluated with a large number of experiments. The results reported in this paper are encouraging and show that the method is effective. Overall, the method is capable of detecting context-sensitive errors with an accuracy in the range of ~86 %- ~95%. Keywords: Natural language processing, Computational Linguistics, Context-sensitive Errors, Machine Learning.

Language Independent Text Correction using Finite State Automata

by Ahmed Hassan, Sara Noeman, Hany Hassan
"... Many natural language applications, like machine translation and information extraction, are required to operate on text with spelling errors. Those spelling mistakes have to be corrected automatically to avoid deteriorating the performance of such applications. In this work, we introduce a novel ap ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Many natural language applications, like machine translation and information extraction, are required to operate on text with spelling errors. Those spelling mistakes have to be corrected automatically to avoid deteriorating the performance of such applications. In this work, we introduce a novel approach for automatic correction of spelling mistakes by deploying finite state automata to propose candidates corrections within a specified edit distance from the misspelled word. After choosing candidate corrections, a language model is used to assign scores the candidate corrections and choose best correction in the given context. The proposed approach is language independent and requires only a dictionary and text data for building a language model. The approach have been tested on both Arabic and English text and achieved accuracy of 89%. 1

Improved Natural Language Learning via Variance-Regularization Support Vector Machines

by Shane Bergsma, Dekang Lin, Dale Schuurmans
"... We present a simple technique for learning better SVMs using fewer training examples. Rather than using the standard SVM regularization, we regularize toward low weight-variance. Our new SVM objective remains a convex quadratic function of the weights, and is therefore computationally no harder to o ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
We present a simple technique for learning better SVMs using fewer training examples. Rather than using the standard SVM regularization, we regularize toward low weight-variance. Our new SVM objective remains a convex quadratic function of the weights, and is therefore computationally no harder to optimize than a standard SVM. Variance regularization is shown to enable dramatic improvements in the learning rates of SVMs on three lexical disambiguation tasks. 1

Using the Structure of a Conceptual Network in

by Computing Semantic Relatedness, Iryna Gurevych - In Proceedings of IJCNLP’05 , 2005
"... We present a new method for computing semantic relatedness of concepts. ..."
Abstract - Add to MetaCart
We present a new method for computing semantic relatedness of concepts.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University