Results 1 -
7 of
7
Towards Universal Speech Recognition
- PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES (ICMI-2002
, 2002
"... The increasing interest in multilingual applications like speech-to-speech translation systems is accompanied by the need for speech recognition front-ends in many languages that can also handle multiple input languages at the same time. In this paper we describe a universal speech recognition syste ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
The increasing interest in multilingual applications like speech-to-speech translation systems is accompanied by the need for speech recognition front-ends in many languages that can also handle multiple input languages at the same time. In this paper we describe a universal speech recognition system that fulfills such needs. It is trained by sharing speech and text data across languages and thus reduces the number of parameters and overhead significantly at the cost of only slight accuracy loss. The final recognizer eases the burden of maintaining several monolingual engines, makes dedicated language identification obsolete and allows for code-switching within an utterance. To achieve these goals we developed new methods for constructing multilingual acoustic models and multilingual n-gram language models.
Bilingual and Dialectal Adaptation and Retraining
- IN: PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING (ICSLP-1998), VOL.15, 30 NOVEMBER-4
, 1999
"... In this paper, we report our investigations on the use of adaptation and retraining in our bilingual (Italian, German) and multidialectal recognition system. Our approach for bilingual speech recognition is to assume the two languages as being one, which is best suited for a task where Italian and G ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
In this paper, we report our investigations on the use of adaptation and retraining in our bilingual (Italian, German) and multidialectal recognition system. Our approach for bilingual speech recognition is to assume the two languages as being one, which is best suited for a task where Italian and German natives speak both languages, resulting in a variety of accents and dialects. We performed adaptation on single speakers and speaker groups built from combinations of spoken and native language. Furthermore, we performed retraining on partitions of the adaptation or training data. Our experiments led to an error rate reduction in all cases: compared to the baseline system, we achieved an overall improvement of 14, 12-14 and 7% for speaker adaptation, speaker group adaptation and retraining, respectively. Furthermore, we found among others that performance is rather stable for Italian between adaptation and retraining, while adaptation for German outperforms retraining by far.
Language-Adaptive Persian Speech Recognition
"... Development of robust spoken language technology ideally relies on the availability of large amounts of data preferably in the target domain and language. However, more often than not, speech developers need to cope with very little or no data, typically obtained from a different target domain. This ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Development of robust spoken language technology ideally relies on the availability of large amounts of data preferably in the target domain and language. However, more often than not, speech developers need to cope with very little or no data, typically obtained from a different target domain. This paper focuses on developing techniques towards addressing this challenge. Specifically we consider the case of developing a Persian language speech recognizer with sparse amounts of data. For language modeling, there are several potential sources of text data, e.g., available on the Internet, to help bootstrap initial models; however, acoustic data can be obtained only by tedious data collection efforts. The drawback of limited Persian acoustic data can be partially overcome by making use of acoustic data from languages that have vast resources such as English (and other languages, if available). The phoneme sets especially for diverse languages such as English and Persian differ considerably. However by incorporating knowledge-based as well as data-driven phoneme mappings, reliable Persian acoustic models can be trained using well-trained English models and small amounts of Persian re-training data. In our experiments Persian models re-trained from seed models created by data-driven phoneme mappings of English models resulted in a phoneme error rate of 19.80% as compared to a phoneme error rate of 20.35% when the Persian models were re-trained from seed models created by sparse Persian data.
Efficient Handling Of Multilingual Language Models
- IN PROC. ASRU
, 2003
"... In this paper we introduce techniques for building a multilingual speech recognizer. More specifically, we present a new language model method that allows for the combination of several monolingual into one multilingual language model. Furthermore, we extend our techniques to the concept of grammars ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
In this paper we introduce techniques for building a multilingual speech recognizer. More specifically, we present a new language model method that allows for the combination of several monolingual into one multilingual language model. Furthermore, we extend our techniques to the concept of grammars. All linguistic knowledge sources share one common interface to the search engine. As a consequence, new language model types can be easily integrated into our Ibis decoder. Based on a multilingual acoustic model we compare multilingual statistical n-gram language models with multilingual grammars. Results are given in terms of recognition performance as well as resource requirements. They show
AUTOMATIC SPEECH RECOGNITION AND INTRINSIC SPEECH VARIATION
"... This paper briefly reviews state of the art related to the topic of speech variability sources in automatic speech recognition systems. It focuses on some variations within the speech signal that make the ASR task difficult. The variations detailed in the paper are intrinsic to the speech and affect ..."
Abstract
- Add to MetaCart
This paper briefly reviews state of the art related to the topic of speech variability sources in automatic speech recognition systems. It focuses on some variations within the speech signal that make the ASR task difficult. The variations detailed in the paper are intrinsic to the speech and affect the different levels of the ASR processing chain. For different sources of speech variation, the paper summarizes the current knowledge and highlights specific feature extraction or modeling weaknesses and current trends. 1.
CHINESE-ENGLISH BILINGUAL SPEECH RECOGNITION
"... In this paper, two methods of construct a Chinese-English bilingual phone inventory are proposed and investigated. Our research focuses on a robust, suitable and compact phone combination of the two utterly different languages. The first method is to combine Chinese phonemes and English phonemes tog ..."
Abstract
- Add to MetaCart
In this paper, two methods of construct a Chinese-English bilingual phone inventory are proposed and investigated. Our research focuses on a robust, suitable and compact phone combination of the two utterly different languages. The first method is to combine Chinese phonemes and English phonemes together. It can provide the required consistency with the western languages. The second method is to combine Chinese INITIALs and FINALs(IFs) with English phonemes in the bilingual acoustic modeling. Experimental results show that the first method is more compact and flexible in acoustic modeling than the second method. But the performace decrease significantly about 1.9 % and 3.8 % in Chinese and English test respectively. On the contrary, the second method achieves higher word accuracy than the first. It’s performance degrades only 0.3 % and 2.2 % for two languages, but with more parameters included in acoustic model. Some issues of building this bilingual speech recognizer are also addressed.

