Results 1 -
4 of
4
A Stochastic Finite-State Word-Segmentation Algorithm For Chinese
- Computational Linguistics
, 1996
"... Chinese text into dictionary entries and productively derived words, and providing pronunciations for these words; the method incorporates a class-based model in its treatment of personal names. We also evaluate the system's performance, taking into account the fact that people often do not agree on ..."
Abstract
-
Cited by 99 (9 self)
- Add to MetaCart
Chinese text into dictionary entries and productively derived words, and providing pronunciations for these words; the method incorporates a class-based model in its treatment of personal names. We also evaluate the system's performance, taking into account the fact that people often do not agree on a single seg- mentation.
Error-Driven Learning of Chinese Word Segmentation
, 1998
"... Palmer ([4]) demonstrated how Brill's Transformation-based Error-Driven Learning can be applied to word segmentation in various languages. We present experimental results which show that such algorithms can achieve satisfactory performance even with a a very simple initial state annotator We also pr ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Palmer ([4]) demonstrated how Brill's Transformation-based Error-Driven Learning can be applied to word segmentation in various languages. We present experimental results which show that such algorithms can achieve satisfactory performance even with a a very simple initial state annotator We also present two preliminary studies, which suggest that even higher performancemight be achieved if simple morphological information is available to the system, and that segmentation performance might actually be improved by combining segmentation with rudimentary part-of-speech tagging. 1 Introduction Chinese word segmentation is an interesting, but difficult problem. The difficulties include the following: - "word" is not a very well-defined concept in the context of Chinese: linguists do not have generally accepted guidelines, and in experiments native speakers show only about 75 % agreement on the "correct" segmentation. - Even if we have guidelines, the problem does not become trivial. The b...
Issues in Text-to-Speech Conversion for Mandarin
, 1996
"... Research on text-to-speech (TTS) conversion for Mandarin Chinese is a much younger enterprise than comparable research for English or other European languages. Nonetheless, impressive progress has been made over the last couple of decades, and Mandarin Chinese systems now exist which approach, or in ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
Research on text-to-speech (TTS) conversion for Mandarin Chinese is a much younger enterprise than comparable research for English or other European languages. Nonetheless, impressive progress has been made over the last couple of decades, and Mandarin Chinese systems now exist which approach, or in some ways even surpass in quality available systems for English. This article has two goals. The first is to summarize the published literature on Mandarin synthesis, with a view to clarifying the similarities or differences among the various efforts. One property shared by a great many systems is the dependence on the syllable as the basic unit of synthesis. We shall argue that this property stems both from the accidental fact that Mandarin has a small number of syllable types, and from traditional Sinological views of the linguistic structure of Chinese. Despite the popularity of the syllable, though, there are problems with using it as the basic synthesis unit, as we shall show. The seco...
An Iterative Algorithm to Build Chinese Language Models
, 1996
"... We present an iterative procedure to build a Chinese language model (LM). We seg- ment Chinese text into words based on a word-based Chinese language model. How- ever, the construction of a Chinese LM itself requires word boundaries. To get out of the chicken-and-egg problem, we propose an iterative ..."
Abstract
- Add to MetaCart
We present an iterative procedure to build a Chinese language model (LM). We seg- ment Chinese text into words based on a word-based Chinese language model. How- ever, the construction of a Chinese LM itself requires word boundaries. To get out of the chicken-and-egg problem, we propose an iterative procedure that alternates two operations: segmenting text into words and building an LM. Starting with an initial segmented corpus and an LM based upon it, we use a Viterbi-liek algorithm to seg- ment another set of data. Then, we build an LM based on the second set and use the resulting LM to segment again the first corpus. The alternating procedure provides a self-organized way for the segmenter to de- tect automatically unseen words and correct segmentation errors. Our preliminary experiment shows that the alternating procedure not only improves the accuracy of our segmentation, but discovers unseen words surprisingly well. The resulting word-based LM has a perplexity of 188 for a general Chinese corpus.

