• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Cross-language Bootstrapping for Unsupervised Acoustic Model Training: Rapid Development of a Polish Speech Recognition System (2009)

by J Lööf, C Gollan, H Ney
Venue:In Interspeech
Add To MetaCart

Tools

Sorted by:
Results 1 - 5 of 5

Rapid Bootstrapping of five Eastern European Languages using the Rapid Language Adaptation Toolkit

by Ngoc Thang Vu, Franziska Kraus, Tanja Schultz , 2010
"... This paper presents our latest efforts toward LVCSR systems for five Eastern European languages such as Bulgarian, Croatian, Czech, Polish, and Russian using our Rapid Language Adaptation Toolkit (RLAT) [1]. We investigated the possibility of crawling large quantities of text material from the Inter ..."
Abstract - Cited by 2 (2 self) - Add to MetaCart
This paper presents our latest efforts toward LVCSR systems for five Eastern European languages such as Bulgarian, Croatian, Czech, Polish, and Russian using our Rapid Language Adaptation Toolkit (RLAT) [1]. We investigated the possibility of crawling large quantities of text material from the Internet, which is very cheap but also requires text post-processing steps due to the varying text quality. The goal of this study is to determine the best strategy for language model optimization on the given domain in a short time period with minimal human effort. Our results show that we can build an initial ASR system for these five languages in only twenty days using RLAT. On the multilingual GlobalPhone speech corpus [2], we achieved a word error rate (WER) of 16.9 % for Bulgarian, 32.8 % for

Cross-language bootstrapping based on completely unsupervised training using multilingual Astabil

by Ngoc Thang Vu, Franziska Kraus, Tanja Schultz - In International Conference on Acoustics, Speech and Signal Processing, ICASSP 2011 , 2011
"... This paper presents our work on rapid language adaptation of acoustic models based on multilingual cross-language bootstrapping and unsupervised training. We used Automatic Speech Recognition (ASR) systems in English, French, German, and Spanish to build a Czech ASR system from scratch. System build ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
This paper presents our work on rapid language adaptation of acoustic models based on multilingual cross-language bootstrapping and unsupervised training. We used Automatic Speech Recognition (ASR) systems in English, French, German, and Spanish to build a Czech ASR system from scratch. System building was performed without using any transcribed audio data by applying three consecutive steps, i.e. cross-language transfer, unsupervised training based on the “multilingual A-stabil “ confidence score [1], and bootstrapping. Based on the confidence score we selected 72% (16.6 hours) of the available audio data with a transcription WER of less than 14.5%. The cross-language bootstrap achieves a word error rate of 23.3 % on the Czech development set and 22.4 % on the evaluation set. These results are very promising as the performance compares favorably to the Czech ASR system which was trained on 23 hours of manually transcribed data (21.8 % on the development set and 21.3 % on the evaluation set). Index Terms — rapid language adaptation of ASR, unsupervised training, multilingual A-Stabil 1.

Automatic Transcription of Courtroom Recordings in the JUMAS project

by Daniele Falavigna, Diego Giuliani, Roberto Gretter, Jonas Lööf, Ralf Schlüter, Hermann Ney
"... Abstract. In this paper we present ongoing work on speech recognition for the judicial domain, performed in the European project JUMAS (Judicial management for digital library semantics.) The specific challenges for courtroom speech recognition are discussed, and the development of speech recognitio ..."
Abstract - Add to MetaCart
Abstract. In this paper we present ongoing work on speech recognition for the judicial domain, performed in the European project JUMAS (Judicial management for digital library semantics.) The specific challenges for courtroom speech recognition are discussed, and the development of speech recognition systems for Italian and Polish are described. The results achieved on the target domain are presented and discussed. 1

Unsupervised Arabic Dialect Adaptation with Self-Training

by Scott Novotney, Rich Schwartz, Sanjeev Khudanpur
"... Useful training data for automatic speech recognition systems of colloquial speech is usually limited to expensive in-domain transcription. Broadcast news is an appealing source of easily available data to bootstrap into a new dialect. However, some languages, like Arabic, have deep linguistic diffe ..."
Abstract - Add to MetaCart
Useful training data for automatic speech recognition systems of colloquial speech is usually limited to expensive in-domain transcription. Broadcast news is an appealing source of easily available data to bootstrap into a new dialect. However, some languages, like Arabic, have deep linguistic differences resulting in poor cross domain performance. If no in-domain transcripts are available, but a large amount of indomain audio is, self-training may be a suitable technique to bootstrap into the domain. In this work, we attempt to adapt Modern Standard Arabic (MSA) models to Levantine Arabic without any in-domain manual transcription. We contrast with varying amounts of in-domain transcription and show that 1) Self-training is effective with only one hour of indomain transcripts. 2) Self-training is not a suitable solution to improve strong MSA models on Levantine. 3) Two metrics that quantify model bias predict self-training success. 4) Model bias explains the failure of self-training to adapt across strong domain mismatch. Index Terms: Arabic ASR, domain adaptation, self-training 1.

Rapid building of an ASR system for Under-Resourced Languages based on Multilingual Unsupervised Training

by Ngoc Thang Vu, Franziska Kraus, Tanja Schultz
"... This paper presents our work on rapid language adaptation of acoustic models based on multilingual cross-language bootstrapping and unsupervised training. We used Automatic Speech Recognition (ASR) systems in the six source languages English, French, German, Spanish, Bulgarian and Polish to build fr ..."
Abstract - Add to MetaCart
This paper presents our work on rapid language adaptation of acoustic models based on multilingual cross-language bootstrapping and unsupervised training. We used Automatic Speech Recognition (ASR) systems in the six source languages English, French, German, Spanish, Bulgarian and Polish to build from scratch an ASR system for Vietnamese, an underresourced language. System building was performed without using any transcribed audio data by applying three consecutive steps, i.e. cross-language transfer, unsupervised training based on the “multilingual A-stabil ” confidence score [1], and bootstrapping. We investigated the correlation between performance of “multilingual A-stabil ” and the number of source languages and improved the performance of “multilingual A-stabil ” by applying it at the syllable level. Furthermore, we showed that increasing the amount of source language ASR systems for the multilingual framework results in better performance of the final ASR system in the target language Vietnamese. The final Vietnamese recognition system has a Syllable Error Rate (SyllER) of 16.8 % on the development set and 16.1 % on the evaluation set. Index Terms: rapid language adaptation of ASR, unsupervised training, multilingual A-Stabil
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University