@MISC{Simard98automaticinsertion, author = {Michel Simard}, title = {Automatic Insertion of Accents in French Text}, year = {1998} }

Bookmark

OpenURL

Abstract

Automatic accent insertion (AAI ) is the problem of re-inserting accents (diacritics) into a text where they are missing. Unaccented French texts are still quite common in electronic media, as a result of a long history of character encoding problems and the lack of well-established conventions for typing accented characters on computer keyboards. We present an AAI method for French, based on a stochastic language model. This method was implemented into a program and C library of functions, which are now commercially available. Our experiments show that French text processed with this program contains less than one accent error per 130 words. We also show how our AAI method can be used to do on-the-fly accent insertions within a word-processing environment, which makes it possible to write in French without having to type accents. A prototype of such a system was integrated into the Emacs editor, and is now available to all students and employees of the Universit'e de Montr'eal's compu...

...alculation of this equation requires a number of calculation that is exponential in the length of the sequence. However 1 there exists an algorithm that computes the value of P(w) in polynomial time (=-=Rabiner and Juang, 1986-=-). To find the sequence of hypotheses that maximizes the probability of the text, each individual combination of hypotheses is examined. Because the number of possible combinations grows exponentially...

...e parameters of the HMM were first estimated by direct frequency counts on a 60 000 words, handtagged extract of the Canadian Hansard. The parameters were then refined, using Baum-Welch reestimation (=-=Baum, 1972-=-), on a 3 million word (untagged) corpus consisting of equal parts of Hansards, Canadian National Defense docmnents and French press revues (Radio~ France International). 2.4 Performance Evaluation On...