Results 11 -
18 of
18
Look-Ahead Techniques For Improved Beam Search
- In Proc. of the CRIM-FORWISS Workshop
, 1996
"... . This paper presents two look-ahead techniques for large vocabulary continuous speech recognition. These two techniques, which are referred to as language model look-ahead and phoneme look-ahead, are incorporated into the pruning process of the time-synchronous one-pass beam search algorithm. The s ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
. This paper presents two look-ahead techniques for large vocabulary continuous speech recognition. These two techniques, which are referred to as language model look-ahead and phoneme look-ahead, are incorporated into the pruning process of the time-synchronous one-pass beam search algorithm. The search algorithm is based on a tree-organized pronunciation lexicon in connection with a bigram language model. Both look-ahead techniques have been tested on the 20 000-word NAB'94 task (ARPA North American Business Corpus). The recognition experiments show that the combination of bigram language model look-ahead and phoneme look-ahead reduces the size of search space by a factor of about 27 without affecting the word recognition accuracy. 1 Introduction In this paper, we describe two look-ahead techniques for improved beam search, namely language model look-ahead and phoneme look-ahead, for large vocabulary continuous speech recognition. The basic idea of the language model look-ahead is t...
Empirical properties of multilingual phone-to-word transduction,” in
, 2007
"... This paper explores the error-robustness of phone-to-word transduction across a variety of languages. We implement a noisy channel model in which a phonetic input stream is corrupted by an error model, and then transduced back to words using the inverse error model and linguistic constraints. By con ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper explores the error-robustness of phone-to-word transduction across a variety of languages. We implement a noisy channel model in which a phonetic input stream is corrupted by an error model, and then transduced back to words using the inverse error model and linguistic constraints. By controlling the error level, we are able to measure the sensitivity of different languages to degradation in the phonetic input stream. This analysis is carried further to measure the importance of each phone in each language individually. We study Arabic, Chinese, English, German and Spanish, and find that they behave similarly in this paradigm: in each case, a phone error produces about 1.4 word errors, and frequently incorrect phones matter slightly less than others. In the absence of phone errors, transduced word errors are still present, and we use the conditional entropy of words given phones to explain the observed behavior. Index Terms — Speech recognition, phonetic decoding, transduction, multilingual, ASR
The RWTH Large Vocabulary Speech Recognition System For Spontaneous Speech
- In Proceedings of the Konvens 2000
, 2000
"... This paper presents details of the RWTH large vocabulary continuous speech recognition system used in the VERBMOBIL spontaneous speech translation system. In particular, we report on methods for accelerating the search and algorithms for fast vocal tract normalization (VTN). We focus both on the imp ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper presents details of the RWTH large vocabulary continuous speech recognition system used in the VERBMOBIL spontaneous speech translation system. In particular, we report on methods for accelerating the search and algorithms for fast vocal tract normalization (VTN). We focus both on the improvements in word error rate and how to speed up the recognizer with only minimal loss in recognition accuracy. Implementation details and experimental results are given for the VERBMOBIL German development corpus dev99. The 24.6% word error rate of the baseline system is reduced to 22.8% using VTN. Decreasing the real-time factor by a factor of 5 resulted in only a small degradation in recognition performance of 2% relative on average. Furthermore, we study incremental methods for reducing the response time of the online speech recognizer and an efficient method to reduce the density of word graphs. 1. Introduction This paper describes the RWTH large vocabulary continuous speech recogniti...
Within-Word vs. Across-Word Decoding for Online Speech Recognition
- in Proc. Automatic Speech Recognition Workshop
, 2000
"... In this paper we describe methods for improving the RWTH German speech recognizer used within the VERBMOBIL project. In particular, we present acceleration methods for the search based on both within-word and across-word phoneme models. The recognizer in the VERBMOBIL project is used in an online en ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In this paper we describe methods for improving the RWTH German speech recognizer used within the VERBMOBIL project. In particular, we present acceleration methods for the search based on both within-word and across-word phoneme models. The recognizer in the VERBMOBIL project is used in an online environment. We will discuss some incremental methods to reduce the response time of an on-line speech recognizer. We present experimental off-line results for the VERBMOBIL task, a German spontaneous speech corpus, and report on word error rates and real time performance of the search for both within-word and across-word phoneme models. 1. INTRODUCTION The goal of the VERBMOBIL project is to develop a speaker-independent speech-to-speech translation system that performs close to real-time. In this system, speech recognition is followed by subsequent VERBMOBIL modules (like syntactic analysis and translation) which depend on the recognition result. Therefore, in this application it is partic...
LargevocabuCC, continu,x
, 2002
"... Au,u,4, speech recognition of real-live broadcast news (BN) data(Hu,;: has become a challenging research topic in recent years. This papersur,#CC4; ou key e#orts tobu:6 a largevocabu:6: continu6: speech recognition system for the heterogenou BN taskwithou induuq uduuq6 complexity andcompu4q, ..."
Abstract
- Add to MetaCart
Au,u,4, speech recognition of real-live broadcast news (BN) data(Hu,;: has become a challenging research topic in recent years. This papersur,#CC4; ou key e#orts tobu:6 a largevocabu:6: continu6: speech recognition system for the heterogenou BN taskwithou induuq uduuq6 complexity andcompu4q,x;:# resou4q,x These key e#orts inclu,CC .
SPEECH AND
, 2005
"... This article was originally published in a journal published by Elsevier, and the attached copy is provided by Elsevier for the author’s benefit and for the benefit of the author’s institution, for non-commercial research and educational use including without limitation use in instruction at your in ..."
Abstract
- Add to MetaCart
This article was originally published in a journal published by Elsevier, and the attached copy is provided by Elsevier for the author’s benefit and for the benefit of the author’s institution, for non-commercial research and educational use including without limitation use in instruction at your institution, sending it to specific colleagues that you know, and providing a copy to your institution’s administrator. All other uses, reproduction and distribution, including without limitation commercial reprints, selling or licensing copies or access, or posting on open internet sites, your personal or institution’s website or repository, are prohibited. For exceptions, permission may be sought for such use through Elsevier’s permissions site at:
Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation
"... In this work we present two extensions to the well-known dynamic programming beam search in phrase-based statistical machine translation (SMT), aiming at increased efficiency of decoding by minimizing the number of language model computations and hypothesis expansions. Our results show that language ..."
Abstract
- Add to MetaCart
In this work we present two extensions to the well-known dynamic programming beam search in phrase-based statistical machine translation (SMT), aiming at increased efficiency of decoding by minimizing the number of language model computations and hypothesis expansions. Our results show that language model based pre-sorting yields a small improvement in translation quality and a speedup by a factor of 2. Two look-ahead methods are shown to further increase translation speed by a factor of 2 without changing the search space and a factor of 4 with the side-effect of some additional search errors. We compare our approach with Moses and observe the same performance, but a substantially better trade-off between translation quality and speed. At a speed of roughly 70 words per second, Moses reaches 17.2 % BLEU, whereas our approach yields 20.0 % with identical models. 1

