Results 1 -
8 of
8
Dynamic Programming Search for Continuous Speech Recognition
, 1999
"... Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred: First, the dynamic programming strategy can be combined with avery efficient and practical pruning str ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred: First, the dynamic programming strategy can be combined with avery efficient and practical pruning strategy so that very large search spaces can be handled. Second, the dynamic programming strategy has turned out to be extremely flexible in adapting to new requirements. Examples of such requirements are the lexical tree organization of the pronunciation lexicon and the generation of a word graph instead of the single best sentence. In this paper, we attempt to systematically review the use of dynamic programming search strategies for small-vocabulary and large-vocabulary continuous speech recognition. The following methods are described in detail: search using a linear lexicon, search using a lexical tree, language-model look-ahead and word graph generation.
A One-Pass Decoder Based on Polymorphic Linguistic Context Assignment
, 2001
"... In this study, we examine how fast decoding of conversational speech with large vocabularies pro ts from ecient use of linguistic information, i.e. language models and grammars. Based on a re-entrant single pronunciation pre x tree, we use the concept of linguistic context polymorphism to allow an ..."
Abstract
-
Cited by 29 (10 self)
- Add to MetaCart
In this study, we examine how fast decoding of conversational speech with large vocabularies pro ts from ecient use of linguistic information, i.e. language models and grammars. Based on a re-entrant single pronunciation pre x tree, we use the concept of linguistic context polymorphism to allow an early incorporation of language model information. This approach allows us to use all available language model information in a one-pass decoder, using the same engine to decode with statistical n-gram language models as well as context free grammars or re-scoring of lattices in an ecient way.
The Philips/RWTH System for Transcription of Broadcast News
, 1999
"... This paper contains a description of the Philips/RWTH 1998 HUB4 system which has been build in a joint effort of Philips Research Laboratories Aachen and Aachen University of Technology. We will focus our discussion on recent improvements compared to the original 1997 HUB4 system and evaluate them o ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This paper contains a description of the Philips/RWTH 1998 HUB4 system which has been build in a joint effort of Philips Research Laboratories Aachen and Aachen University of Technology. We will focus our discussion on recent improvements compared to the original 1997 HUB4 system and evaluate them on the HUB4'97 evaluation data. The paper will deal with 1. a rough system overview including feature extraction, acoustic training, audio stream segmentation, and decoding 2. log-linear interpolation of distance-language models, 3. and the integration of various acoustic and language models via Discriminative Model Combination (DMC).
Efficient language model lookahead through polymorphic linguistic context assignment
- Proc. of ICASSP
, 2002
"... In this study, we examine how fast decoding of conversational speech with large vocabularies profits from efficient use of linguistic information, i.e. language models and grammars. Based on a re-entrant single pronunciation prefix tree, we use the concept of linguistic context polymorphism to achie ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
In this study, we examine how fast decoding of conversational speech with large vocabularies profits from efficient use of linguistic information, i.e. language models and grammars. Based on a re-entrant single pronunciation prefix tree, we use the concept of linguistic context polymorphism to achieve an early incorporation of language model information. This approach allows us to use all available language model information in a one-pass decoder, using the same engine to decode with statistical n-gram language models as well as context free grammars or re-scoring of lattices in an efficient way. We compare this approach to our previous decoder, which needed three passes to incorporate all available information. The results on a very large vocabulary task show that the search can be speeded up by almost a factor of three, without introducing additional search errors. On all examined tasks, we observed significant improvements by using an exact language model lookahead over usual bigram lookahead strategies, even for very hard tasks with unmatched conditions, without introducing extra memory overhead. 1.
Towards Automatic Corpus Preparation For A German Broadcast News Transcription System
- Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing
, 2002
"... When setting up a speech recognition system for a new domain, a lot of manual effort is spent on corpus preparation, i.e., data acquisition, cutting and segmentation of the audio material, generation of pronunciation lexica, as well as the definition of suitable training and test sets. In this paper ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
When setting up a speech recognition system for a new domain, a lot of manual effort is spent on corpus preparation, i.e., data acquisition, cutting and segmentation of the audio material, generation of pronunciation lexica, as well as the definition of suitable training and test sets. In this paper we describe several methods that help to automate and thus to speed up this procedure. For this purpose, we assume that only a preliminary, partially incorrect textual transcription is available. The effectivity of the proposed methods is demonstrated with the development of a transcription system for the recognition of German broadcast news.
The 1999 CMU 10x real time broadcast news transcription system
- Proc. DARPA workshop on Automatic Transcription of Broadcast News
, 2000
"... CMU's 10X real time system is the HMM-based SPHINX-III system with a newly developed fast decoder. The fast decoder uses a subvector clustered version of the acoustic models for Gaussian computation and a lexical tree search structure. It was developed in September, 1999, and is currently a first-pa ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
CMU's 10X real time system is the HMM-based SPHINX-III system with a newly developed fast decoder. The fast decoder uses a subvector clustered version of the acoustic models for Gaussian computation and a lexical tree search structure. It was developed in September, 1999, and is currently a first-pass decoder, capable of generating word lattices. It was designed to optimize speed, recognition accuracy as well as memory requirements. For the 1999 Hub 4 evaluation task, the system used two sets of acoustic models- full-bandwidth and narrow-bandwidth. The acoustic models were 6000 senone, 32 Gaussians per state, 3-state HMMs with no skips permitted across states. The system used a single 39 dimensional feature stream consisting of cepstra and cepstral differences. The lattices generated were rescored using a DAG algorithm. The DAG-rescored hypotheses were designated as those of the primary system. The contrastive system consisted of the output of the first pass Viterbi search, with no DAG rescoring of lattices. A trigram language model consisting of 57,000 unigrams, 10 million bigrams and 14.9 million trigrams was used. No adaptation passes were done. In this paper we describe the various components of the primary system. The first-pass word error rate on the 1998 Hub 4 evaluation set was 20.4 % with this system. The overall word error rate scored by NIST for the 1999 Hub 4 evaluation set was 27.6%.
Probabilistic Aspects in Spoken Document Retrieval
"... Accessing information in multimedia databases encompasses a wide range of applications in which spoken document retrieval (SDR) plays an important role. In SDR, a set of automatically transcribed speech documents constitutes the files for retrieval, to which a user may address a request in natural l ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Accessing information in multimedia databases encompasses a wide range of applications in which spoken document retrieval (SDR) plays an important role. In SDR, a set of automatically transcribed speech documents constitutes the files for retrieval, to which a user may address a request in natural language. This article deals with two probabilistic aspects in SDR. The first part investigates the effect of recognition errors on retrieval performance and inquires the question, why recognition errors have only a little effect on the retrieval performance. In the second part, we present a new probabilistic approach to SDR that is based on interpolations between document representations. Experiments performed on the TREC-7 and TREC-8 SDR task show comparable or even better results for the new proposed method than other advanced heuristic and probabilistic retrieval metrics.

