## A mixed trigrams approach for context sensitive spell checking (2007)

Citations: | 2 - 1 self |

### BibTeX

@MISC{Fossati07amixed,

author = {Davide Fossati and Barbara Di Eugenio},

title = {A mixed trigrams approach for context sensitive spell checking},

year = {2007}

}

### OpenURL

### Abstract

Abstract. This paper addresses the problem of real-word spell checking, i.e., the detection and correction of typos that result in real words of the target language. This paper proposes a methodology based on a mixed trigrams language model. The model has been implemented, trained, and tested with data from the Penn Treebank. The approach has been evaluated in terms of hit rate, false positive rate, and coverage. The experiments show promising results with respect to the hit rates of both detection and correction, even though the false positive rate is still high. 1

### Citations

742 |
Foundations of statistical natural language processing
- Manning, Schutze
- 1999
(Show Context)
Citation Context ...grams. The resulting formula is: argmax E n∏ P (wi|ei)P (ei|ei−1ei−2) i=1 In the previous formula, the variables wi are words, and the variables ei are either words or POS tags. The Viterbi algorithm =-=[17]-=- can be used to efficiently compute the sequence E. Figure 1 provides an intuitive example of how the detection process works. 2.5 Conditional Probability Estimation for the Central Word In the previo... |

354 |
Techniques for automatically correcting words in text
- KUKICH
- 1992
(Show Context)
Citation Context ...o Kukich, the problem of spell checking can be classified in three categories of increasing difficulty: non-word error detection, isolated-word error correction, and context-dependent word correction =-=[1]-=-. The real-word errors detection and correction task, focus of this paper, belongs to the third category.Such errors are the most difficult to detect and correct, because they cannot be revealed just... |

230 |
A method for disambiguating word senses in a large corpus
- Gale, Church, et al.
- 1993
(Show Context)
Citation Context ...ach sentence and checking for grammatical anomalies. More recently, some statistical methods have been tried, including the usage of word n-gram models [3, 4], POS tagging [5–7], Bayesian classifiers =-=[8]-=-, decision lists [9], Bayesian hybrid methods [10], a combination of POS and Bayesian methods [7], and Latent Semantic Analysis [11]. The main problem with word n-grams is data sparseness, even with a... |

207 | A technique for computer detection and correction of spelling errors - DAMERAU - 1964 |

149 | Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French
- Yarowsky
- 1994
(Show Context)
Citation Context ...cking for grammatical anomalies. More recently, some statistical methods have been tried, including the usage of word n-gram models [3, 4], POS tagging [5–7], Bayesian classifiers [8], decision lists =-=[9]-=-, Bayesian hybrid methods [10], a combination of POS and Bayesian methods [7], and Latent Semantic Analysis [11]. The main problem with word n-grams is data sparseness, even with a fairly large amount... |

99 | The computational analysis of English. A corpus-based approach - Garside, Leech, et al. - 1987 |

63 | Automatic spelling correction in scientific and scholarly text - Pollock, Zamora - 1984 |

56 | A Bayesian hybrid method for context-sensitive spelling correction
- Golding
- 1995
(Show Context)
Citation Context ...es. More recently, some statistical methods have been tried, including the usage of word n-gram models [3, 4], POS tagging [5–7], Bayesian classifiers [8], decision lists [9], Bayesian hybrid methods =-=[10]-=-, a combination of POS and Bayesian methods [7], and Latent Semantic Analysis [11]. The main problem with word n-grams is data sparseness, even with a fairly large amount of training data. In fact, a ... |

54 | Combining trigram-based and feature-based methods for context-sensitive spelling correction
- Golding, Schabes
- 1996
(Show Context)
Citation Context ... been tried, including the usage of word n-gram models [3, 4], POS tagging [5–7], Bayesian classifiers [8], decision lists [9], Bayesian hybrid methods [10], a combination of POS and Bayesian methods =-=[7]-=-, and Latent Semantic Analysis [11]. The main problem with word n-grams is data sparseness, even with a fairly large amount of training data. In fact, a recent study [4] reported better performances u... |

45 |
Context based spelling correction
- Mays, Damerau, et al.
- 1991
(Show Context)
Citation Context ...approaches [2] try to detect errors by parsing each sentence and checking for grammatical anomalies. More recently, some statistical methods have been tried, including the usage of word n-gram models =-=[3, 4]-=-, POS tagging [5–7], Bayesian classifiers [8], decision lists [9], Bayesian hybrid methods [10], a combination of POS and Bayesian methods [7], and Latent Semantic Analysis [11]. The main problem with... |

29 | A statistical approach to automatic OCR error correction in context
- Tong, Evans
- 1996
(Show Context)
Citation Context ...ontextual spell checking have been also studied is Optical Character Recognition (OCR). For this application, Markov Model based approaches using letter n-grams have been shown to be quite successful =-=[12]-=-. 2 A Mixed Trigrams Approach This paper proposes a statistical method based on a language model that is a combination of the word-trigrams model and the POS-trigrams model, called mixed trigrams mode... |

23 | Contextual spelling correction using Latent Semantic Analysis
- Jones, Martin
- 1997
(Show Context)
Citation Context ...f word n-gram models [3, 4], POS tagging [5–7], Bayesian classifiers [8], decision lists [9], Bayesian hybrid methods [10], a combination of POS and Bayesian methods [7], and Latent Semantic Analysis =-=[11]-=-. The main problem with word n-grams is data sparseness, even with a fairly large amount of training data. In fact, a recent study [4] reported better performances using word bigrams rather than word ... |

19 |
The EPISTLE text-critiquing system
- Heidorn, Jensen, et al.
- 1982
(Show Context)
Citation Context ...tionary lookup, but can be discovered only taking context into account. Different approaches to tackle the issue of real-word spell checking have been presented in the literature. Symbolic approaches =-=[2]-=- try to detect errors by parsing each sentence and checking for grammatical anomalies. More recently, some statistical methods have been tried, including the usage of word n-gram models [3, 4], POS ta... |

5 | Spellchecking by computer - Mitton - 1996 |

2 |
Context-based detection of ‘real word’ typographical errors using markov models
- Berlinsky-Schine
- 2004
(Show Context)
Citation Context ...approaches [2] try to detect errors by parsing each sentence and checking for grammatical anomalies. More recently, some statistical methods have been tried, including the usage of word n-gram models =-=[3, 4]-=-, POS tagging [5–7], Bayesian classifiers [8], decision lists [9], Bayesian hybrid methods [10], a combination of POS and Bayesian methods [7], and Latent Semantic Analysis [11]. The main problem with... |