• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Context-dependent acoustic modeling using graphemes for large vocabulary speech recognition,” in (2002)

by S Kanthak, H Ney
Venue:Proc. ICASSP,
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 43
Next 10 →

Grapheme Based Speech Recognition

by Mirjam Killer, Sebastian Stüker, Tanja Schultz - in Proceedings of the EUROSPEECH , 2003
"... Large vocabulary speech recognition systems traditionally represent words in terms of subword units, usually phonemes. This paper investigates the potential of graphemes acting as subunits. In order to develop context dependent grapheme based speech recognizers several decision tree based clustering ..."
Abstract - Cited by 41 (6 self) - Add to MetaCart
Large vocabulary speech recognition systems traditionally represent words in terms of subword units, usually phonemes. This paper investigates the potential of graphemes acting as subunits. In order to develop context dependent grapheme based speech recognizers several decision tree based clustering procedures are performed and compared to each other. Grapheme based speech recognizers in three languages - English, German, and Spanish - are trained and compared to their phoneme based counterparts. The results show that for languages with a close grapheme-to-phoneme relation, grapheme based modeling is as good as the phoneme based one. Furthermore, multilingual grapheme based recognizers are designed to investigate whether grapheme based information can be successfully shared among languages. Finally, some bootstrapping experiments for Swedish were performed to test the potential for rapid language deployment.
(Show Context)

Citation Context

...onversion of the orthographic transcription to a phonetic one, using either rule based [1] or statistical approaches [2]. Only some of them have been investigated in the context of speech recognition =-=[3, 4]-=-. Kanthak [4] was one of the first who presented results in speech recognition based on the orthographic representation of words and the use of decision trees for context dependent modeling. Black et ...

Multilingual Acoustic Modeling Using Graphemes

by S. Kanthak, H. Ney - IN PROCEEDINGS OF EUROPEAN CONFERENCE ON SPEECH COMMUNICATION AND TECHNOLOGY , 2003
"... In this paper we combine grapheme-based sub-word units with multilingual acoustic modeling. We show that a global decision tree together with automatically generated grapheme questions eliminate manual effort completely. We also investigate the effects of additional language questions. We present ..."
Abstract - Cited by 13 (0 self) - Add to MetaCart
In this paper we combine grapheme-based sub-word units with multilingual acoustic modeling. We show that a global decision tree together with automatically generated grapheme questions eliminate manual effort completely. We also investigate the effects of additional language questions. We present
(Show Context)

Citation Context

...ilingual acoustic modeling. However, finding a suitable common phoneme set may be challenging and requires phonetic expert knowledge. {kanthak,ney}@informatik.rwth-aachen.de As shown in previous work =-=[5]-=- grapheme-based acoustic units in combination with decision tree statetying may reach the performance of phonemic ones at least on a couple of European languages. The approach is completely driven by ...

A Grapheme Based Speech Recognition System for Russian

by Sebastian Stüker, Tanja Schultz - SPECOM 2004 , 2004
"... With the increasing availability and deployment of speech recognition technology in real world environments fast and affordable adaptation of speech recognition systems to new languages and/or domains becomes more and more important. One of the most expensive components of a recognition system is th ..."
Abstract - Cited by 9 (3 self) - Add to MetaCart
With the increasing availability and deployment of speech recognition technology in real world environments fast and affordable adaptation of speech recognition systems to new languages and/or domains becomes more and more important. One of the most expensive components of a recognition system is the pronunciation dictionary that maps the orthography of the words in the search vocabulary onto a sequence of sub-units. Often phonemes act as such sub-units. Human expert knowledge is usually required for crafting the pronunciation dictionary, thus making it an expensive and time consuming task. Even automatic tools for creating such dictionaries often require hand labeled amounts of training material and rely on manual revision. In order to address the problem of creating a dictionary in a time and cost efficient way we have examined recognition systems at our lab that rely soly on graphemes rather than phonemes as subunits. The mapping in the dictionary thus becomes trivial, since now every word is simply segmented into its letters. Therefore no expert knowledge is needed anymore. Our experiments on different languages have shown that the quality of the resulting recognizer significantly depends on the grapheme-to-phoneme relation of the underlying language. Since Russian is a language with an alphabetic script with a fairly close graphemeto-phoneme relation it is very well suited to be a candidate for this approach. In this paper we present our results on creating a grapheme based Russian recognizer trained on the GlobalPhone corpus that covers fifteen different languages. We compare the performance of the resulting system to a phoneme based recognition system that was trained in the course of the GlobalPhone project, and compare the performance of two grapheme based systems whose context-dependent models were clustered with two different procedures.
(Show Context)

Citation Context

...n of the written form of a word to a phonetic transcription, either by applying rules [1] or by statistical approaches [2]. Only some of them have been investigated in the field of speech recognition =-=[3, 4]-=-. Recently, the use of graphemes as modeling units, instead of phonemes, has been increasingly studied. Graphemes have the advantage over phonemes that they make the creation of the pronunciation dict...

Improving Graphemebased ASR by Probabilistic Lexical Modeling Approach

by Ramya Rasipuram, Mathew Magimai. -doss - in Proc. of Interspeech , 2013
"... There is growing interest in using graphemes as subword units, especially in the context of the rapid development of hidden Markov model (HMM) based automatic speech recognition (ASR) system, as it eliminates the need to build a phoneme pronunciation lexicon. However, directly modeling the rela-tion ..."
Abstract - Cited by 8 (5 self) - Add to MetaCart
There is growing interest in using graphemes as subword units, especially in the context of the rapid development of hidden Markov model (HMM) based automatic speech recognition (ASR) system, as it eliminates the need to build a phoneme pronunciation lexicon. However, directly modeling the rela-tionship between acoustic feature observations and grapheme states may not be always trivial. It usually depends upon the grapheme-to-phoneme relationship within the language. This paper builds upon our recent interpretation of Kullback-Leibler divergence based HMM (KL-HMM) as a probabilistic lexical modeling approach to propose a novel grapheme-based ASR approach where, first a set of acoustic units are derived by mod-eling context-dependent graphemes in the framework of con-ventional HMM/Gaussian mixture model (HMM/GMM) sys-tem, and then the probabilistic relationship between the derived acoustic units and the lexical units representing graphemes is modeled in the framework of KL-HMM. Through experimental studies on English, where the grapheme-to-phoneme relation-ship is irregular, we show that the proposed grapheme-based ASR approach (without using any phoneme information) can achieve performance comparable to standard phoneme-based ASR approach.
(Show Context)

Citation Context

...on. The development of phoneme lexicon requires some minimum phonetic expertise and is usually a semi-automatic process. An alternative to phonemes is graphemes1, which makes lexicon development easy =-=[1, 2, 3]-=-, [4, Chapter 4], [5, 6, 7, 8, 9]. However, modeling the relationship between graphemes and standard acoustic feature observations, such as PLP cepstral coefficients which capture phoneme related info...

Grapheme and Multilingual Posterior Features for Under-Resourced Speech Recognition: A Study on Scottish Gaelic, in

by Ramya Rasipuram, Peter Bell, Mathew Magimai. -doss, Ramya Rasipuram, Peter Bell, Mathew Magimai. -doss - Proc. of ICASSP , 2013
"... Standard automatic speech recognition (ASR) systems use phonemes as subword units. Thus, one of the primary resource required to build a good ASR system is a well developed phoneme pronun-ciation lexicon. However, under-resourced languages typically lack such lexical resources. In this paper, we inv ..."
Abstract - Cited by 7 (5 self) - Add to MetaCart
Standard automatic speech recognition (ASR) systems use phonemes as subword units. Thus, one of the primary resource required to build a good ASR system is a well developed phoneme pronun-ciation lexicon. However, under-resourced languages typically lack such lexical resources. In this paper, we investigate recently proposed grapheme-based ASR in the framework of Kullback-Leibler divergence based hidden Markov model (KL-HMM) for under-resource languages, particularly Scottish Gaelic which has no lexical resources. More specifically, we study the use of grapheme and multilingual phoneme class conditional probabilities (posterior features) as feature observations in KL-HMM. ASR studies con-ducted show that the proposed approach yields better system when compared to conventional HMM/GMM approach using cepstral features. Furthermore, grapheme posterior features estimated using both auxiliary data and Gaelic data yield the best system. Index Terms — Automatic speech recognition, Kullback-Leibler divergence based hidden Markov model, grapheme, phoneme, posterior feature, under-resource speech recognition, Scottish Gaelic 1.
(Show Context)

Citation Context

...grapheme, phoneme, posterior feature, under-resource speech recognition, Scottish Gaelic 1. INTRODUCTION Recently, there is a growing interest to use graphemes as subword units for speech recognition =-=[1, 2]-=-, [3, Chapter 4], [4, 5], especially for under-resourced languages where well developed phoneme sets and phoneme pronunciation dictionaries are usually not available [6, 7, 8, 9]. Under-resource langu...

Towards Rapid Language Portability Of Speech Processing Systems

by Tanja Schultz - CONFERENCE ON SPEECH AND LANGUAGE SYSTEMS FOR HUMAN COMMUNICATION , 2004
"... In recent years, more and more speech processing products in several languages have been widely distributed all over the world. This fact reflects the general believe that speech technologies have a huge potential to let everyone participate in today's information revolution and to bridge the l ..."
Abstract - Cited by 7 (5 self) - Add to MetaCart
In recent years, more and more speech processing products in several languages have been widely distributed all over the world. This fact reflects the general believe that speech technologies have a huge potential to let everyone participate in today's information revolution and to bridge the language barriers. However, the development of speech processing systems still requires significant skills and resources to be carried out. With some 4500- 6000 languages in the world, the current cost and effort in building speech support is prohibitive to all but the top, most economically viable languages. In order to overcome these limitations, our research centers around the development of new algorithms and tools to rapidly port speech processing systems to new languages. This paper focuses on our approaches to create acoustic models, pronunciation dictionaries, and language models in new languages with only limited or no data resources available in the language of question. For this purpose we developed language independent and language adaptive acoustic models, investigated pronunciation dictionaries which can be directly derived from the written form and propose cross-lingual language model adaptation. The approaches are evaluated on our multilingual text and speech database GlobalPhone which covers more than 15 languages of the world.

A Morpho-Graphemic Approach for the Recognition of Spontaneous Speech

by Péter Mihajlik, Tibor Fegyó, Zoltán Tüske, Pavel Ircing - in Agglutinative Languages - like Hungarian”, Proceedings of Interspeech
"... A coupled acoustic- and language-modeling approach is presented for the recognition of spontaneous speech primarily in agglutinative languages. The effectiveness of the approach in large vocabulary spontaneous speech recognition is demonstrated on the Hungarian MALACH corpus. The derivation of morph ..."
Abstract - Cited by 6 (1 self) - Add to MetaCart
A coupled acoustic- and language-modeling approach is presented for the recognition of spontaneous speech primarily in agglutinative languages. The effectiveness of the approach in large vocabulary spontaneous speech recognition is demonstrated on the Hungarian MALACH corpus. The derivation of morphs from word forms is based on a statistical morphological segmentation tool while the mapping of morphs into graphemes is obtained trivially by splitting each morph into individual letters. Using morphs instead of words in language modeling gives significant WER reductions in case of both phoneme- and grapheme-based acoustic modeling. The improvements are larger after speaker adaptation of the acoustic models. In conclusion, morpho-phonemic and the proposed morpho-graphemic ASR approaches yield the same best WERs, which are significantly lower than the word-based baselines but essentially without language dependent rules or pronunciation dictionaries in the latter case. Index Terms: spontaneous speech recognition, morphology. 1.
(Show Context)

Citation Context

...ion dictionaries in theslatter case.sIndex Terms: spontaneous speech recognition, morphology.s1. IntroductionsMorphologically motivated language models [1–5] as well assgrapheme-based acoustic models =-=[5,6]-=- have been successfullysapplied to various speech recognition tasks. The joint use ofsthese technologies [5], which we call morpho-graphemicsapproach has been, however, not thoroughly investigated,ses...

Probabilistic lexical modeling and grapheme-based automatic speech recognition

by Ramya Rasipuram, Mathew Magimai. -doss, Ramya Rasipurama, Mathew Magimai. -dossa - http://publications.idiap.ch/downloads/ reports/2013/Rasipuram_Idiap-RR-15-2013. pdf, 2013, Idiap Research Report, Idiap-RR-15-2013
"... Standard hidden Markov model (HMM) based automatic speech recogni-tion (ASR) systems use phonemes as subword units. Thus, development of ASR system for a new language or domain depends upon the availability of a phoneme lexicon in the target language. In this paper, we introduce the notion of probab ..."
Abstract - Cited by 5 (4 self) - Add to MetaCart
Standard hidden Markov model (HMM) based automatic speech recogni-tion (ASR) systems use phonemes as subword units. Thus, development of ASR system for a new language or domain depends upon the availability of a phoneme lexicon in the target language. In this paper, we introduce the notion of probabilistic lexical modeling and present an ASR approach where a) first, the relationship between acoustics and phonemes is learned on available acoustic and lexical resources (not necessarily from the target language or domain), and then b) probabilistic grapheme-to-phoneme rela-tionship is learned using the acoustic data of targeted language or domain. The resulting system is a grapheme-based ASR system. This brings in two potential advantages. First, development of lexicon for target language or domain becomes easy i.e., creation of a grapheme lexicon where each word is transcribed by its orthography. Second, the ASR system can exploit both acoustic and lexical resources of multiple languages and domains. We evaluate and show the potential of the proposed approach through a) an in-domain study, where acoustic and lexical resources of target language or domain are used to build an ASR system, b) a monolingual cross-domain study, where acoustic and lexical resources of another domain are used to build an ASR system for a new domain, and c) a multilingual cross-domain study, where acoustic and lexical resources of multiple languages are used to build multi-accent non-native speech recognition system. Keywords: Automatic speech recognition, Kullback-Leibler divergence based hidden
(Show Context)

Citation Context

...t also minimum phonetic expertise. In other words, phoneme lexicon development is a semi-automatic process. An alternative to phoneme subword units is graphemes2, which makes lexicon development easy =-=[2, 3, 4]-=-, [5, Chapter 4], [6, 7, 8, 9, 10, 11, 12, 13, 14]. However, modeling the relationship between graphemes and standard spectral-based feature observations, such as PLP cepstral coefficients which captu...

REVISITING GRAPHEMES WITH INCREASING AMOUNTS OF DATA

by Yun-hsuan Sung, Thad Hughes, Françoise Beaufays, Brian Strope
"... Letter units, or graphemes, have been reported in the literature as a surprisingly effective substitute to the more traditional phoneme units, at least in languages that enjoy a strong correspondence between pronunciation and orthography. For English however, where letter symbols have less acoustic ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
Letter units, or graphemes, have been reported in the literature as a surprisingly effective substitute to the more traditional phoneme units, at least in languages that enjoy a strong correspondence between pronunciation and orthography. For English however, where letter symbols have less acoustic consistency, previously reported results fell short of systems using highly-tuned pronunciation lexicons. Grapheme units simplify system design, but since graphemes map to a wider set of acoustic realizations than phonemes, we should expect grapheme-based acoustic models to require more training data to capture these variations. In this paper, we compare the rate of improvement of grapheme and phoneme systems trained with datasets ranging from 450 to 1200 hours of speech. We consider various grapheme unit configurations, including using letter-specific, onset, and coda units. We show that the grapheme systems improve faster and, depending on the lexicon, reach or surpass the phoneme baselines with the largest training set. Index Terms — Acoustic modeling, graphemes, directory assistance, speech recognition.
(Show Context)

Citation Context

...tiple languages, researchers confronted with the bewildering task of maintaining not one but several lexicons asked the inevitable question “what if we just used letter units instead?” Kanthak et al. =-=[7]-=- and Killer et al. [8] observed experimentally that for some languages, grapheme systems performed roughly as well as phoneme systems, but that for others, such as English, there was a high error-rate...

Flexible Speech Translation Systems

by Tanja Schultz, Alan W. Black, Stephan Vogel, Monika Woszczyna - Special Issue in Speech Translation, IEEE Transactions of Speech and Audio Processing, Accepted for publication , 2006
"... Speech translation research has made significant progress over the years with many high-visibility efforts showing that translation of spontaneously spoken speech from and to diverse languages is possible and applicable in a variety of domains. As language and domains continue to expand, practical c ..."
Abstract - Cited by 4 (2 self) - Add to MetaCart
Speech translation research has made significant progress over the years with many high-visibility efforts showing that translation of spontaneously spoken speech from and to diverse languages is possible and applicable in a variety of domains. As language and domains continue to expand, practical concerns such as portability and reconfigurability of speech come into play: system maintenance becomes a key issue and data is never sufficient to cover the changing domains over varying languages. In this paper, we discuss strategies to overcome the limits of today's speech translation systems. In the first part, we describe our layered system architecture that allows for easy component integration, resource sharing across components, comparison of alternative approaches, and the migration toward hybrid desktop/PDA or stand-alone PDA systems. In the second part, we show how flexibility and reconfigurability is implemented by more radically relying on learning approaches and use our English--Thai two-way speech translation system as a concrete example.
(Show Context)

Citation Context

... new languages, we cannot assume that pronunciations of a base vocabulary exist, nor that native experts are available for hand corrections. Recently, grapheme-based models for ASR have been proposed =-=[27]-=-, which back up results indicating that pronunciation variants should not be explicitly modeled through phone string variations but implicitly by the use of single-pronunciation dictionaries [28] and ...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University