Results 1 - 10
of
40
GlobalPhone: A Multilingual Speech and Text Database Developed at Karlsruhe University
- Proceedings of the ICSLP
, 2002
"... This paper describes the design, collection, and current status of the multilingual database GlobalPhone, an ongoing project since 1995 at Karlsruhe University. GlobalPhone is a highquality read speech and text database in a large variety of languages which is suitable for the development of large v ..."
Abstract
-
Cited by 27 (16 self)
- Add to MetaCart
This paper describes the design, collection, and current status of the multilingual database GlobalPhone, an ongoing project since 1995 at Karlsruhe University. GlobalPhone is a highquality read speech and text database in a large variety of languages which is suitable for the development of large vocabulary speech recognition systems in many languages. It has already been successfully applied to language independent and language adaptive speech recognition. GlobalPhone currently covers 15 languages Arabic, Chinese (Mandarin and Shanghai), Croatian, Czech, French, German, Japanese, Korean, Portuguese, Russian, Spanish, Swedish, Tamil, and Turkish. The corpus contains more than 300 hours of transcribed speech spoken by more than 1500 native, adult speakers and will soon be available from ELRA.
Grapheme Based Speech Recognition
- in Proceedings of the EUROSPEECH
, 2003
"... Large vocabulary speech recognition systems traditionally represent words in terms of subword units, usually phonemes. This paper investigates the potential of graphemes acting as subunits. In order to develop context dependent grapheme based speech recognizers several decision tree based clustering ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
Large vocabulary speech recognition systems traditionally represent words in terms of subword units, usually phonemes. This paper investigates the potential of graphemes acting as subunits. In order to develop context dependent grapheme based speech recognizers several decision tree based clustering procedures are performed and compared to each other. Grapheme based speech recognizers in three languages - English, German, and Spanish - are trained and compared to their phoneme based counterparts. The results show that for languages with a close grapheme-to-phoneme relation, grapheme based modeling is as good as the phoneme based one. Furthermore, multilingual grapheme based recognizers are designed to investigate whether grapheme based information can be successfully shared among languages. Finally, some bootstrapping experiments for Swedish were performed to test the potential for rapid language deployment.
Far-field Speaker Recognition
- International Conference on Acoustic, Speech, and Signal Processing (ICASSP
, 2006
"... To my great parents Jianren and Lianrui, my dear husband Shimin, and my lovely daughter Ada. iv The automatic speaker recognition technologies have developed into more and more important modern technologies required by many speech-aided applications. The main challenge for au-tomatic speaker recogni ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
To my great parents Jianren and Lianrui, my dear husband Shimin, and my lovely daughter Ada. iv The automatic speaker recognition technologies have developed into more and more important modern technologies required by many speech-aided applications. The main challenge for au-tomatic speaker recognition is to deal with the variability of the environments and channels from where the speech was obtained. In previous work, good results have been achieved for clean high-quality speech with matched training and test acoustic conditions, such as high accu-racy of speaker identification and verification using clean wideband speech and Gaussian Mix-ture Models (GMM). However, under mismatched conditions and noisy environments, often expected in real-world conditions, the performance of GMM-based systems degrades signifi-cantly, far away from the satisfactory level. Therefore, robustness becomes a crucial research issue in speaker recognition field. In this thesis, our main focus is to improve the robustness of speaker recognition systems on far-field distant microphones. We investigate approaches to improve robustness from two direc-tions. First, we investigate approaches to improve robustness for traditional speaker recognition system which is based on low-level spectral information. We introduce a new reverberation compensation approach which, along with feature warping in the feature processing procedure, improves the system performance significantly. We propose four multiple channel combina-tion approaches, which utilize information from multiple far-field microphones, to improve robustness under mismatched training-testing conditions. Secondly, we investigate approaches to use high-level speaker information to improve robustness. We propose new techniques to
Integrating multilingual articulatory features into speech recognition
- in Proc. Eurospeech
, 2003
"... The use of articulatory features, such as place and manner of articulation, has been shown to reduce the word error rate of speech recognition systems under different conditions and in different settings. For example recognition systems based on features are more robust to noise and reverberation. I ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
The use of articulatory features, such as place and manner of articulation, has been shown to reduce the word error rate of speech recognition systems under different conditions and in different settings. For example recognition systems based on features are more robust to noise and reverberation. In earlier work we showed that articulatory features can compensate for inter language variability and can be recognized across languages. In this paper we show that using cross- and multilingual detectors to support an HMM based speech recognition system significantly reduces the word error rate. By selecting and weighting the features in a discriminative way, we achieve an error rate reduction that lies in the same range as that seen when using language specific feature detectors. By combining feature detectors from many languages and training the weights discriminatively, we even outperform the case where only monolingual detectors are being used. 1.
Multilingual Articulatory Features
- in Proc. ICASSP, Hong Kong
, 2003
"... Speech recognition systems based on or aided by articulatory features, such as place and manner of articulation, have been shown to be useful under varying circumstances. Recognizers based on features better compensate channel and noise variability. In this work we show that it is also possible to c ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Speech recognition systems based on or aided by articulatory features, such as place and manner of articulation, have been shown to be useful under varying circumstances. Recognizers based on features better compensate channel and noise variability. In this work we show that it is also possible to compensate for inter language variability using articulatory feature detectors. We come to the conclusion that articulatory features can be recognized across languages and that using detectors from many languages can improve the classification accuracy of the feature detectors on a single language. We further demonstrate how those multilingual and crosslingual detectors can support an HMM based recognizer and thereby significantly reduce the word error rate by up to 12.3% relative. We expect that with the use of multilingual articulatory features it is possible to support the rapid deployment of recognition systems for new target languages.
Speaker Identification using Multilingual Phone Strings”, to be presented in
- Proceedings of ICASSP
, 2002
"... Far-field speaker identification is very challenging since varying recording conditions often result in unmatching training and test situations. Although the widely used Gaussian Mixture Models (GMM) approach achieves reasonable good results when training and testing conditions match, its performanc ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
Far-field speaker identification is very challenging since varying recording conditions often result in unmatching training and test situations. Although the widely used Gaussian Mixture Models (GMM) approach achieves reasonable good results when training and testing conditions match, its performance degrades dramatically under non-matching conditions. In this paper we propose a new approach for far-field speaker identification: the usage of multilingual phone strings derived from recognizers in eight different languages. The experiments are carried out on a database of 30 speakers recorded with eight different microphone distances. The results show that the multilingual phone string approach is robust against nonmatching conditions and significantly outperforms the GMMs. On 10-second test chunks, the average closed-set identification performance achieves 96.7 % on variable distance data. 1.
Towards Rapid Language Portability Of Speech Processing Systems
- CONFERENCE ON SPEECH AND LANGUAGE SYSTEMS FOR HUMAN COMMUNICATION
, 2004
"... In recent years, more and more speech processing products in several languages have been widely distributed all over the world. This fact reflects the general believe that speech technologies have a huge potential to let everyone participate in today's information revolution and to bridge the langua ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
In recent years, more and more speech processing products in several languages have been widely distributed all over the world. This fact reflects the general believe that speech technologies have a huge potential to let everyone participate in today's information revolution and to bridge the language barriers. However, the development of speech processing systems still requires significant skills and resources to be carried out. With some 4500- 6000 languages in the world, the current cost and effort in building speech support is prohibitive to all but the top, most economically viable languages. In order to overcome these limitations, our research centers around the development of new algorithms and tools to rapidly port speech processing systems to new languages. This paper focuses on our approaches to create acoustic models, pronunciation dictionaries, and language models in new languages with only limited or no data resources available in the language of question. For this purpose we developed language independent and language adaptive acoustic models, investigated pronunciation dictionaries which can be directly derived from the written form and propose cross-lingual language model adaptation. The approaches are evaluated on our multilingual text and speech database GlobalPhone which covers more than 15 languages of the world.
Acoustic-Phonetic Unit Similarities for Context-Dependent Acoustic Model Portability
- Proceeding on Acoustics, Speech, and Signal Processing (ICASSP-2006
, 2006
"... This paper addresses particularly the use of acoustic-phonetic unit similarities for portability of context dependent acoustic models to new languages. Since the IPA-based method is limited to a source/target phoneme mapping table construction, an estimation method of the similarity between two phon ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
This paper addresses particularly the use of acoustic-phonetic unit similarities for portability of context dependent acoustic models to new languages. Since the IPA-based method is limited to a source/target phoneme mapping table construction, an estimation method of the similarity between two phonemes is proposed in this paper. Based on these phoneme similarities, some estimation methods for polyphone similarity and clustered polyphonic model similarity are investigated. For a new language, first a polyphonic decision tree is built with a small amount of speech data. Then, clustered models in the target language are duplicated from the nearest clustered models in the source language and adapted with limited data to the target language. Results obtained from the experiments demonstrate the feasibility of these methods. 1.
A Database for the Analysis of Cross-Lingual Pronunciation Variants of European City Names
, 2002
"... This paper reports on a speech database that includes non-native pronunciation variants of city names/town names from several European languages. The database is designed as a research tool for the study of pronunciation variants in this specific domain that occur in different groups of non-native s ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
This paper reports on a speech database that includes non-native pronunciation variants of city names/town names from several European languages. The database is designed as a research tool for the study of pronunciation variants in this specific domain that occur in different groups of non-native speakers. The ongoing data collection currently comprises 20 to 27 native speakers of 3 languages each who pronounce material from 5 languages. The languages covered are English, German, French, Italian, and Dutch. All languages are examined as the source language (L1) and as the target language (L2). For the first stage of the data collection, the targeted status is a collection of 5 x 5 language directions with at least 20 speakers per native language.
Thai automatic speech recognition
- in Proc. ICASSP
, 2005
"... We describe the development of a robust and flexible Thai Speech Recognizer as integrated into our English-Thai Speech-to-Speech translation system. We focus on the discussion of the rapid deployment of ASR for Thai under limited time and data resources, including rapid data collection issues, acous ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
We describe the development of a robust and flexible Thai Speech Recognizer as integrated into our English-Thai Speech-to-Speech translation system. We focus on the discussion of the rapid deployment of ASR for Thai under limited time and data resources, including rapid data collection issues, acoustic model bootstrap, and automatic generation of pronunciations. Issues relating to the translation and overall system will be reported elsewhere. 1.

