Results 1 -
3 of
3
Mostly-Unsupervised Statistical Segmentation of Japanese: Applications to Kanji
, 2000
"... Given the lack of word delimiters in written Japanese, word segmentation is generally considered a crucial first step in processing Japanese texts. Typical Japanese segmentation algorithms rely either on a lexicon and grammar or on pre-segmented data. In contrast, we introduce a novel statistical me ..."
Abstract
-
Cited by 32 (1 self)
- Add to MetaCart
Given the lack of word delimiters in written Japanese, word segmentation is generally considered a crucial first step in processing Japanese texts. Typical Japanese segmentation algorithms rely either on a lexicon and grammar or on pre-segmented data. In contrast, we introduce a novel statistical method utilizing unsegmented training data, with performance on kanji sequences comparable to and sometimes surpassing that of morphological analyzers over a variety of error metrics.
Unsupervised statistical segmentation of Japanese Kanji strings
- JOURNAL OF NATURAL LANGUAGE ENGINEERING
, 1999
"... Word segmentation is an important issue in Japanese language processing because Japanese is written without space delimiters between words. We propose a simple dictionary-less method to segment Japanese kanji sequences into words based solely on character n-gram counts from an unannotated corpus. ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Word segmentation is an important issue in Japanese language processing because Japanese is written without space delimiters between words. We propose a simple dictionary-less method to segment Japanese kanji sequences into words based solely on character n-gram counts from an unannotated corpus. The performance was often better than that of rule-based morphological analyzers over a variety of both standard and novel error metrics.
Japanese LVCSR On The Spontaneous Scheduling Task With Janus-3
, 1997
"... This paper presents our findings during the development of the recognition engine for the Japanese part of the VERBMOBIL speech-to-speech translation project. We describe an efficient method to bootstrap a large vocabulary speech recognizer for spontaneously spoken Japanese speech from a German reco ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper presents our findings during the development of the recognition engine for the Japanese part of the VERBMOBIL speech-to-speech translation project. We describe an efficient method to bootstrap a large vocabulary speech recognizer for spontaneously spoken Japanese speech from a German recognizer and show that the amount of effort in developing the system could be reduced by using this rapid cross language bootstrapping technique. The Japanese recognizer is integrated into the VERBMOBIL system and shows very promising results achieving 9.3% word error rate. 1. INTRODUCTION The overall goal of the first phase of the VERBMOBIL project is to build a speech-to-speech translation system from both German and Japanese spontaneously spoken input speech to English, German and Japanese output in an appointment scenario [1]. The Japanese recognizer described in this paper is beeing designed to be part of this translation system. Unlike Japanese dictation systems [2] there is no need fo...

