Results 1 -
7 of
7
An architecture for rapid decoding of large vocabulary conversational speech
- in Eurospeech-2003
, 2003
"... This paper addresses the question of how to design a large vocabulary recognition system so that it can simultaneously handle a sophisticated language model, perform state-ofthe-art speaker adaptation, and run in one times real time 1 (1 RT). The architecture we propose is based on classical HMM Vit ..."
Abstract
-
Cited by 12 (7 self)
- Add to MetaCart
This paper addresses the question of how to design a large vocabulary recognition system so that it can simultaneously handle a sophisticated language model, perform state-ofthe-art speaker adaptation, and run in one times real time 1 (1 RT). The architecture we propose is based on classical HMM Viterbi decoding, but uses an extremely fast initial speaker-independent decoding to estimate VTL warp factors, feature-space and model-space MLLR transformations that are used in a final speaker-adapted decoding. We present results on past Switchboard evaluation data that indicate that this strategy compares favorably to published unlimited-time systems (running in several hundred times real-time). Coincidentally, this is the system that IBM fielded in the 2003 EARS Rich Transcription evaluation. 1.
Fast search for large vocabulary speech recognition
- in Verbmobil: Foundations of Speech-to-Speech Translation, W. Wahlster, Ed
, 2000
"... Abstract. In this article we describe methods for improving the RWTH German speech recognizer used within the VERBMOBIL project. In particular, we present acceleration methods for the search based on both within-word and across-word phoneme models. We also study incremental methods to reduce the res ..."
Abstract
-
Cited by 11 (11 self)
- Add to MetaCart
Abstract. In this article we describe methods for improving the RWTH German speech recognizer used within the VERBMOBIL project. In particular, we present acceleration methods for the search based on both within-word and across-word phoneme models. We also study incremental methods to reduce the response time of the online speech recognizer. Finally, we present experimental off-line results for the three VERBMOBIL scenarios. We report on word error rates and real-time factors for both speaker independent and speaker dependent recognition. 1
Recent Improvements Of The RWTH Large Vocabulary Speech Recognition System On Spontaneous Speech
- Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing
, 2000
"... This paper presents recent improvements of the RWTH large vocabulary continuous speech recognition system (LVCSR). In particular, we will report on the integration of across-word models into the rst recognition pass, and describe better algorithms for fast vocal tract normalization (VTN). We will fo ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
This paper presents recent improvements of the RWTH large vocabulary continuous speech recognition system (LVCSR). In particular, we will report on the integration of across-word models into the rst recognition pass, and describe better algorithms for fast vocal tract normalization (VTN). We will focus both on the improvements in word error rate and how to speed up the recognizer with only minimal loss in recognition accuracy. Implementation details and experimental results are given for the VerbMobil task, a German spontaneous speech corpus. The 25.0% word error rate (WER) of our within-word baseline system was reduced to 21.4% with VTN and across-word models. Decreasing the real-time factor (RTF) by up to 85% resulted in only a small degradation in recognition performance of 2% relative on average. 1. INTRODUCTION The RWTH LVCSR system is a continuous Gaussian mixture density speech recognition system, which has been described in detail in [6]. The baseline system is a trigram Vit...
The RWTH Aachen University Open Source Speech Recognition System
"... We announce the public availability of the RWTH Aachen University speech recognition toolkit. The toolkit includes state of the art speech recognition technology for acoustic model training and decoding. Speaker adaptation, speaker adaptive training, unsupervised training, a finite state automata li ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
We announce the public availability of the RWTH Aachen University speech recognition toolkit. The toolkit includes state of the art speech recognition technology for acoustic model training and decoding. Speaker adaptation, speaker adaptive training, unsupervised training, a finite state automata library, and an efficient tree search decoder are notable components. Comprehensive documentation, example setups for training and recognition, and a tutorial are provided to support newcomers. Index Terms: speech recognition, LVCSR, software 1.
The RWTH Large Vocabulary Speech Recognition System For Spontaneous Speech
- In Proceedings of the Konvens 2000
, 2000
"... This paper presents details of the RWTH large vocabulary continuous speech recognition system used in the VERBMOBIL spontaneous speech translation system. In particular, we report on methods for accelerating the search and algorithms for fast vocal tract normalization (VTN). We focus both on the imp ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper presents details of the RWTH large vocabulary continuous speech recognition system used in the VERBMOBIL spontaneous speech translation system. In particular, we report on methods for accelerating the search and algorithms for fast vocal tract normalization (VTN). We focus both on the improvements in word error rate and how to speed up the recognizer with only minimal loss in recognition accuracy. Implementation details and experimental results are given for the VERBMOBIL German development corpus dev99. The 24.6% word error rate of the baseline system is reduced to 22.8% using VTN. Decreasing the real-time factor by a factor of 5 resulted in only a small degradation in recognition performance of 2% relative on average. Furthermore, we study incremental methods for reducing the response time of the online speech recognizer and an efficient method to reduce the density of word graphs. 1. Introduction This paper describes the RWTH large vocabulary continuous speech recogniti...
Within-Word vs. Across-Word Decoding for Online Speech Recognition
- in Proc. Automatic Speech Recognition Workshop
, 2000
"... In this paper we describe methods for improving the RWTH German speech recognizer used within the VERBMOBIL project. In particular, we present acceleration methods for the search based on both within-word and across-word phoneme models. The recognizer in the VERBMOBIL project is used in an online en ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In this paper we describe methods for improving the RWTH German speech recognizer used within the VERBMOBIL project. In particular, we present acceleration methods for the search based on both within-word and across-word phoneme models. The recognizer in the VERBMOBIL project is used in an online environment. We will discuss some incremental methods to reduce the response time of an on-line speech recognizer. We present experimental off-line results for the VERBMOBIL task, a German spontaneous speech corpus, and report on word error rates and real time performance of the search for both within-word and across-word phoneme models. 1. INTRODUCTION The goal of the VERBMOBIL project is to develop a speaker-independent speech-to-speech translation system that performs close to real-time. In this system, speech recognition is followed by subsequent VERBMOBIL modules (like syntactic analysis and translation) which depend on the recognition result. Therefore, in this application it is partic...
© 2011, F. MetzeParallelization Strategies for a Dynamic Lexical Tree Decoder
"... Increasingly, physical limitations lead to a shift from high clocked single core processors to CPUs with up to eight, or more, independent but slower processing cores, and multi-core or even multi-CPU computers. In order to retain performance gains in the future, the speech decoding process has to b ..."
Abstract
- Add to MetaCart
Increasingly, physical limitations lead to a shift from high clocked single core processors to CPUs with up to eight, or more, independent but slower processing cores, and multi-core or even multi-CPU computers. In order to retain performance gains in the future, the speech decoding process has to be re-organized to employ a certain amount of thread-level parallelism on those CPUs. In this work, we compare two common approaches for dynamic prefix tree decoders: Parallel Score Computation and Parallel Search, and a combination of both. Both have already been studied intensively, however it is shown here, that the latter suffers from hardware cache effects which limit absolute speed-ups and scalability in general. We propose a cache efficient variation of the Parallel Score Computation which is more scalable and faster than any other parallel strategy we compared it with. Index Terms: speech recognition, parallel processing 1

