Results 1 - 10
of
43
Grapheme Based Speech Recognition
- in Proceedings of the EUROSPEECH
, 2003
"... Large vocabulary speech recognition systems traditionally represent words in terms of subword units, usually phonemes. This paper investigates the potential of graphemes acting as subunits. In order to develop context dependent grapheme based speech recognizers several decision tree based clustering ..."
Abstract
-
Cited by 41 (6 self)
- Add to MetaCart
(Show Context)
Large vocabulary speech recognition systems traditionally represent words in terms of subword units, usually phonemes. This paper investigates the potential of graphemes acting as subunits. In order to develop context dependent grapheme based speech recognizers several decision tree based clustering procedures are performed and compared to each other. Grapheme based speech recognizers in three languages - English, German, and Spanish - are trained and compared to their phoneme based counterparts. The results show that for languages with a close grapheme-to-phoneme relation, grapheme based modeling is as good as the phoneme based one. Furthermore, multilingual grapheme based recognizers are designed to investigate whether grapheme based information can be successfully shared among languages. Finally, some bootstrapping experiments for Swedish were performed to test the potential for rapid language deployment.
Multilingual Acoustic Modeling Using Graphemes
- IN PROCEEDINGS OF EUROPEAN CONFERENCE ON SPEECH COMMUNICATION AND TECHNOLOGY
, 2003
"... In this paper we combine grapheme-based sub-word units with multilingual acoustic modeling. We show that a global decision tree together with automatically generated grapheme questions eliminate manual effort completely. We also investigate the effects of additional language questions. We present ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
(Show Context)
In this paper we combine grapheme-based sub-word units with multilingual acoustic modeling. We show that a global decision tree together with automatically generated grapheme questions eliminate manual effort completely. We also investigate the effects of additional language questions. We present
A Grapheme Based Speech Recognition System for Russian
- SPECOM 2004
, 2004
"... With the increasing availability and deployment of speech recognition technology in real world environments fast and affordable adaptation of speech recognition systems to new languages and/or domains becomes more and more important. One of the most expensive components of a recognition system is th ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
(Show Context)
With the increasing availability and deployment of speech recognition technology in real world environments fast and affordable adaptation of speech recognition systems to new languages and/or domains becomes more and more important. One of the most expensive components of a recognition system is the pronunciation dictionary that maps the orthography of the words in the search vocabulary onto a sequence of sub-units. Often phonemes act as such sub-units. Human expert knowledge is usually required for crafting the pronunciation dictionary, thus making it an expensive and time consuming task. Even automatic tools for creating such dictionaries often require hand labeled amounts of training material and rely on manual revision. In order to address the problem of creating a dictionary in a time and cost efficient way we have examined recognition systems at our lab that rely soly on graphemes rather than phonemes as subunits. The mapping in the dictionary thus becomes trivial, since now every word is simply segmented into its letters. Therefore no expert knowledge is needed anymore. Our experiments on different languages have shown that the quality of the resulting recognizer significantly depends on the grapheme-to-phoneme relation of the underlying language. Since Russian is a language with an alphabetic script with a fairly close graphemeto-phoneme relation it is very well suited to be a candidate for this approach. In this paper we present our results on creating a grapheme based Russian recognizer trained on the GlobalPhone corpus that covers fifteen different languages. We compare the performance of the resulting system to a phoneme based recognition system that was trained in the course of the GlobalPhone project, and compare the performance of two grapheme based systems whose context-dependent models were clustered with two different procedures.
Improving Graphemebased ASR by Probabilistic Lexical Modeling Approach
- in Proc. of Interspeech
, 2013
"... There is growing interest in using graphemes as subword units, especially in the context of the rapid development of hidden Markov model (HMM) based automatic speech recognition (ASR) system, as it eliminates the need to build a phoneme pronunciation lexicon. However, directly modeling the rela-tion ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
(Show Context)
There is growing interest in using graphemes as subword units, especially in the context of the rapid development of hidden Markov model (HMM) based automatic speech recognition (ASR) system, as it eliminates the need to build a phoneme pronunciation lexicon. However, directly modeling the rela-tionship between acoustic feature observations and grapheme states may not be always trivial. It usually depends upon the grapheme-to-phoneme relationship within the language. This paper builds upon our recent interpretation of Kullback-Leibler divergence based HMM (KL-HMM) as a probabilistic lexical modeling approach to propose a novel grapheme-based ASR approach where, first a set of acoustic units are derived by mod-eling context-dependent graphemes in the framework of con-ventional HMM/Gaussian mixture model (HMM/GMM) sys-tem, and then the probabilistic relationship between the derived acoustic units and the lexical units representing graphemes is modeled in the framework of KL-HMM. Through experimental studies on English, where the grapheme-to-phoneme relation-ship is irregular, we show that the proposed grapheme-based ASR approach (without using any phoneme information) can achieve performance comparable to standard phoneme-based ASR approach.
Grapheme and Multilingual Posterior Features for Under-Resourced Speech Recognition: A Study on Scottish Gaelic, in
- Proc. of ICASSP
, 2013
"... Standard automatic speech recognition (ASR) systems use phonemes as subword units. Thus, one of the primary resource required to build a good ASR system is a well developed phoneme pronun-ciation lexicon. However, under-resourced languages typically lack such lexical resources. In this paper, we inv ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
(Show Context)
Standard automatic speech recognition (ASR) systems use phonemes as subword units. Thus, one of the primary resource required to build a good ASR system is a well developed phoneme pronun-ciation lexicon. However, under-resourced languages typically lack such lexical resources. In this paper, we investigate recently proposed grapheme-based ASR in the framework of Kullback-Leibler divergence based hidden Markov model (KL-HMM) for under-resource languages, particularly Scottish Gaelic which has no lexical resources. More specifically, we study the use of grapheme and multilingual phoneme class conditional probabilities (posterior features) as feature observations in KL-HMM. ASR studies con-ducted show that the proposed approach yields better system when compared to conventional HMM/GMM approach using cepstral features. Furthermore, grapheme posterior features estimated using both auxiliary data and Gaelic data yield the best system. Index Terms — Automatic speech recognition, Kullback-Leibler divergence based hidden Markov model, grapheme, phoneme, posterior feature, under-resource speech recognition, Scottish Gaelic 1.
Towards Rapid Language Portability Of Speech Processing Systems
- CONFERENCE ON SPEECH AND LANGUAGE SYSTEMS FOR HUMAN COMMUNICATION
, 2004
"... In recent years, more and more speech processing products in several languages have been widely distributed all over the world. This fact reflects the general believe that speech technologies have a huge potential to let everyone participate in today's information revolution and to bridge the l ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
In recent years, more and more speech processing products in several languages have been widely distributed all over the world. This fact reflects the general believe that speech technologies have a huge potential to let everyone participate in today's information revolution and to bridge the language barriers. However, the development of speech processing systems still requires significant skills and resources to be carried out. With some 4500- 6000 languages in the world, the current cost and effort in building speech support is prohibitive to all but the top, most economically viable languages. In order to overcome these limitations, our research centers around the development of new algorithms and tools to rapidly port speech processing systems to new languages. This paper focuses on our approaches to create acoustic models, pronunciation dictionaries, and language models in new languages with only limited or no data resources available in the language of question. For this purpose we developed language independent and language adaptive acoustic models, investigated pronunciation dictionaries which can be directly derived from the written form and propose cross-lingual language model adaptation. The approaches are evaluated on our multilingual text and speech database GlobalPhone which covers more than 15 languages of the world.
A Morpho-Graphemic Approach for the Recognition of Spontaneous Speech
- in Agglutinative Languages - like Hungarian”, Proceedings of Interspeech
"... A coupled acoustic- and language-modeling approach is presented for the recognition of spontaneous speech primarily in agglutinative languages. The effectiveness of the approach in large vocabulary spontaneous speech recognition is demonstrated on the Hungarian MALACH corpus. The derivation of morph ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
(Show Context)
A coupled acoustic- and language-modeling approach is presented for the recognition of spontaneous speech primarily in agglutinative languages. The effectiveness of the approach in large vocabulary spontaneous speech recognition is demonstrated on the Hungarian MALACH corpus. The derivation of morphs from word forms is based on a statistical morphological segmentation tool while the mapping of morphs into graphemes is obtained trivially by splitting each morph into individual letters. Using morphs instead of words in language modeling gives significant WER reductions in case of both phoneme- and grapheme-based acoustic modeling. The improvements are larger after speaker adaptation of the acoustic models. In conclusion, morpho-phonemic and the proposed morpho-graphemic ASR approaches yield the same best WERs, which are significantly lower than the word-based baselines but essentially without language dependent rules or pronunciation dictionaries in the latter case. Index Terms: spontaneous speech recognition, morphology. 1.
Probabilistic lexical modeling and grapheme-based automatic speech recognition
- http://publications.idiap.ch/downloads/ reports/2013/Rasipuram_Idiap-RR-15-2013. pdf, 2013, Idiap Research Report, Idiap-RR-15-2013
"... Standard hidden Markov model (HMM) based automatic speech recogni-tion (ASR) systems use phonemes as subword units. Thus, development of ASR system for a new language or domain depends upon the availability of a phoneme lexicon in the target language. In this paper, we introduce the notion of probab ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
(Show Context)
Standard hidden Markov model (HMM) based automatic speech recogni-tion (ASR) systems use phonemes as subword units. Thus, development of ASR system for a new language or domain depends upon the availability of a phoneme lexicon in the target language. In this paper, we introduce the notion of probabilistic lexical modeling and present an ASR approach where a) first, the relationship between acoustics and phonemes is learned on available acoustic and lexical resources (not necessarily from the target language or domain), and then b) probabilistic grapheme-to-phoneme rela-tionship is learned using the acoustic data of targeted language or domain. The resulting system is a grapheme-based ASR system. This brings in two potential advantages. First, development of lexicon for target language or domain becomes easy i.e., creation of a grapheme lexicon where each word is transcribed by its orthography. Second, the ASR system can exploit both acoustic and lexical resources of multiple languages and domains. We evaluate and show the potential of the proposed approach through a) an in-domain study, where acoustic and lexical resources of target language or domain are used to build an ASR system, b) a monolingual cross-domain study, where acoustic and lexical resources of another domain are used to build an ASR system for a new domain, and c) a multilingual cross-domain study, where acoustic and lexical resources of multiple languages are used to build multi-accent non-native speech recognition system. Keywords: Automatic speech recognition, Kullback-Leibler divergence based hidden
REVISITING GRAPHEMES WITH INCREASING AMOUNTS OF DATA
"... Letter units, or graphemes, have been reported in the literature as a surprisingly effective substitute to the more traditional phoneme units, at least in languages that enjoy a strong correspondence between pronunciation and orthography. For English however, where letter symbols have less acoustic ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
Letter units, or graphemes, have been reported in the literature as a surprisingly effective substitute to the more traditional phoneme units, at least in languages that enjoy a strong correspondence between pronunciation and orthography. For English however, where letter symbols have less acoustic consistency, previously reported results fell short of systems using highly-tuned pronunciation lexicons. Grapheme units simplify system design, but since graphemes map to a wider set of acoustic realizations than phonemes, we should expect grapheme-based acoustic models to require more training data to capture these variations. In this paper, we compare the rate of improvement of grapheme and phoneme systems trained with datasets ranging from 450 to 1200 hours of speech. We consider various grapheme unit configurations, including using letter-specific, onset, and coda units. We show that the grapheme systems improve faster and, depending on the lexicon, reach or surpass the phoneme baselines with the largest training set. Index Terms — Acoustic modeling, graphemes, directory assistance, speech recognition.
Flexible Speech Translation Systems
- Special Issue in Speech Translation, IEEE Transactions of Speech and Audio Processing, Accepted for publication
, 2006
"... Speech translation research has made significant progress over the years with many high-visibility efforts showing that translation of spontaneously spoken speech from and to diverse languages is possible and applicable in a variety of domains. As language and domains continue to expand, practical c ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
Speech translation research has made significant progress over the years with many high-visibility efforts showing that translation of spontaneously spoken speech from and to diverse languages is possible and applicable in a variety of domains. As language and domains continue to expand, practical concerns such as portability and reconfigurability of speech come into play: system maintenance becomes a key issue and data is never sufficient to cover the changing domains over varying languages. In this paper, we discuss strategies to overcome the limits of today's speech translation systems. In the first part, we describe our layered system architecture that allows for easy component integration, resource sharing across components, comparison of alternative approaches, and the migration toward hybrid desktop/PDA or stand-alone PDA systems. In the second part, we show how flexibility and reconfigurability is implemented by more radically relying on learning approaches and use our English--Thai two-way speech translation system as a concrete example.