Results 1 -
3 of
3
Joint processing and discriminative training for letter-to-phoneme conversion
- In Proc. ACL
, 2008
"... We present a discriminative structureprediction model for the letter-to-phoneme task, a crucial step in text-to-speech processing. Our method encompasses three tasks that have been previously handled separately: input segmentation, phoneme prediction, and sequence modeling. The key idea is online di ..."
Abstract
-
Cited by 10 (6 self)
- Add to MetaCart
We present a discriminative structureprediction model for the letter-to-phoneme task, a crucial step in text-to-speech processing. Our method encompasses three tasks that have been previously handled separately: input segmentation, phoneme prediction, and sequence modeling. The key idea is online discriminative training, which updates parameters according to a comparison of the current system output to the desired output, allowing us to train all of our components together. By folding the three steps of a pipeline approach into a unified dynamic programming framework, we are able to achieve substantial performance gains. Our results surpass the current state-of-the-art on six publicly available data sets representing four different languages. 1
Modeling Letter-to-Phoneme Conversion as a Phrase Based Statistical Machine Translation Problem with Minimum Error Rate Training
"... Letter-to-phoneme conversion plays an important role in several applications. It can be a difficult task because the mapping from letters to phonemes can be many-to-many. We present a language independent letter-to-phoneme conversion approach which is based on the popular phrase based Statistical Ma ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Letter-to-phoneme conversion plays an important role in several applications. It can be a difficult task because the mapping from letters to phonemes can be many-to-many. We present a language independent letter-to-phoneme conversion approach which is based on the popular phrase based Statistical Machine Translation techniques. The results of our experiments clearly demonstrate that such techniques can be used effectively for letter-tophoneme conversion. Our results show an overall improvement of 5.8 % over the baseline and are comparable to the state of the art. We also propose a measure to estimate the difficulty level of L2P task for a language. 1
SST 2010 Modeling Pronunciation of OOV Words for Speech Recognition
"... This paper presents a technique for modeling pronunciation in automatic speech recognition using an approach based on statistical machine translation. The task of a pronunciation model in speech recognition is to convert a sequence of phonemes into proper words of the language. This task can be real ..."
Abstract
- Add to MetaCart
This paper presents a technique for modeling pronunciation in automatic speech recognition using an approach based on statistical machine translation. The task of a pronunciation model in speech recognition is to convert a sequence of phonemes into proper words of the language. This task can be realized as a machine translation approach, whereby the source language is a sequence of phonemes and the target language is a sequence of letters forming words. The model presented in this paper specially targets out-of-vocabulary words or words with several different pronunciations. A dynamic string alignment algorithm is applied to learn the phoneme-to-letter alignment. Pronunciation segments are then extracted from these alignments and are used by the decoder to recognize words from an unknown sequence of phonemes. In contrast to usual statistical machine translation decoding, the decoder presented here uses a probabilistic finite state model rather than normal n-gram language models. A number of experiments were performed using the CMU pronunciation dictionary and the results obtained are quite promising, even where only small amounts of training data are used. Index Terms: speech recognition, pronunciation modeling 1.

