Results 1 - 10
of
13
Dynamic Programming Search for Continuous Speech Recognition
, 1999
"... . Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred: First, the dynamic programming strategy can be combined with avery e#cient and practical pruning str ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
. Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred: First, the dynamic programming strategy can be combined with avery e#cient and practical pruning strategy so that very large search spaces can be handled. Second, the dynamic programming strategy has turned out to be extremely #exible in adapting to new requirements. Examples of such requirements are the lexical tree organization of the pronunciation lexicon and the generation of a word graph instead of the single best sentence. In this paper, we attempt to systematically review the use of dynamic programming search strategies for small#vocabulary and large#vocabulary continuous speech recognition. The following methods are described in detail: search using a linear lexicon, search using a lexical tree, language-model look-ahead and word graph generation. 1 Introduction Search strategie...
Language-Model Look-Ahead For Large Vocabulary Speech Recognition
- Proc. Int. Conf. on Spoken Language Processing
, 1996
"... In this paper, we present an efficient look-ahead technique which incorporates the language model knowledge at the earliest possible stage during the search process. This so-called language model look-ahead is built into the time synchronous beam search algorithm using a tree-organized pronunciation ..."
Abstract
-
Cited by 22 (9 self)
- Add to MetaCart
In this paper, we present an efficient look-ahead technique which incorporates the language model knowledge at the earliest possible stage during the search process. This so-called language model look-ahead is built into the time synchronous beam search algorithm using a tree-organized pronunciation lexicon for a bigram language model. The language model look-ahead technique exploits the full knowledge of the bigram language model by distributing the language model probabilities over the nodes of the lexical tree for each predecessor word. We present a method for handling the resulting memory requirements. The recognition experiments performed on the 20 000-word North American Business task (Nov.'96) demonstrate that in comparison with the unigram look-ahead a reduction by a factor of 5 in the acoustic search effort can be achieved without loss in recognition accuracy.
Look-Ahead Techniques For Fast Beam Search
, 1997
"... this paper, we present two efficient look-ahead pruning techniques in beam search for large vocabulary continuous speech recognition. Both techniques, the language model look-ahead and the phoneme look-ahead, are incorporated into the word conditioned search algorithm using a bigram language model a ..."
Abstract
-
Cited by 20 (8 self)
- Add to MetaCart
this paper, we present two efficient look-ahead pruning techniques in beam search for large vocabulary continuous speech recognition. Both techniques, the language model look-ahead and the phoneme look-ahead, are incorporated into the word conditioned search algorithm using a bigram language model and a lexical prefix tree [5]. The paper present the following novel contributions: ffl We describe a method for language model (LM) look-ahead pruning which is similar to [1, 9]. We show special techniques to reduce the memory and computational requirements. These techniques are based on a compressed LM look-ahead tree. To compute the LM look-ahead tree probabilites in an efficient way, we present a backward dynamic programming scheme
A Comparison Of Time Conditioned And Word Conditioned Search Techniques For Large Vocabulary Speech Recognition
- Proc. Int. Conf. on Spoken Language Processing
, 1996
"... In this paper, we compare the search effort of the word conditioned and the time conditioned tree search methods. Both methods are based on a time-synchronous, left-to-right beam search using a treeorganized lexicon. Whereas the word conditioned method is well known and widely used, the time conditi ..."
Abstract
-
Cited by 19 (14 self)
- Add to MetaCart
In this paper, we compare the search effort of the word conditioned and the time conditioned tree search methods. Both methods are based on a time-synchronous, left-to-right beam search using a treeorganized lexicon. Whereas the word conditioned method is well known and widely used, the time conditioned method is novel in the context of 20 000--word vocabulary recognition. We extend both methods to handle trigram language models in a one--pass strategy. Both methods were tested on a train schedule inquiry task (1 850 words, telephone speech) and on the North American Business (Nov.'94) development corpus (20 000 words).
Language Model Representations For Beam-Search Decoding
- In Proceedings of the ICASSP'95
, 1995
"... This paper presents an efficient way of representing a bigram language model for a beam-search based, continuous speech, large vocabulary HMM recognizer. The tree-based topology considered takes advantage of a factorization of the bigram probability derived from the bigram interpolation scheme, and ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
This paper presents an efficient way of representing a bigram language model for a beam-search based, continuous speech, large vocabulary HMM recognizer. The tree-based topology considered takes advantage of a factorization of the bigram probability derived from the bigram interpolation scheme, and of a tree organization of all the words that can follow a given one. Moreover, an optimization algorithm is used to considerably reduce the space requirements of the language model. Experimental results are provided for two 10,000-word dictation tasks: radiological reporting (perplexity 27) and newspaper dictation (perplexity 120). In the former domain 93% word accuracy is achieved with real-time response and 23 Mb process space. In the newspaper dictation domain, 88.1% word accuracy is achieved with 1:41 real-time response and 38 Mb process space. All recognition tests were performed on an HP-735 workstation. 1. INTRODUCTION Many current ASR systems generate initial hypotheses through a b...
Time Synchronous Chart Parsing of Speech Integrating Unification Grammars with Statistics
- Proceedings of Twente Workshop on Speech and Language Engineering
, 1994
"... We present an active chart parser which parses left connected wordgraphs in a strictly time synchronous way. The parser performs a beam search on the possible paths through the word graph and on the possible derivations of the unification grammar simultaneously. A metric is given to assign scores to ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
We present an active chart parser which parses left connected wordgraphs in a strictly time synchronous way. The parser performs a beam search on the possible paths through the word graph and on the possible derivations of the unification grammar simultaneously. A metric is given to assign scores to edges, taking into account the whole left context thereby combining acoustic probabilities, n-gram probabilities and unification grammar probabilities. A specialized model for the derivation of typed unification grammars is introduced. Different ways of coupling the parser with an LR beam decoder in an online time synchronous fashion are defined and several experimental results are presented. Two top down and one bottom up method are investigated. In bottom up mode, the decoder sends word hypotheses as they are found from left to right, while the parser keeps step. In verify mode, the decoder is always a frame ahead, while the parser verifies received hypotheses, providing language informat...
From Word Hypotheses to Logical Form: An Efficient Interleaved Approach
, 1996
"... This paper revisits word lattice search whose task is to find a plausible semantic interpretation for a given utterance. Our approach of interleaved search and analysis is designed to break the frontier of "toy" applications. The framework is implemented in two interacting modules, running in parall ..."
Abstract
-
Cited by 10 (7 self)
- Add to MetaCart
This paper revisits word lattice search whose task is to find a plausible semantic interpretation for a given utterance. Our approach of interleaved search and analysis is designed to break the frontier of "toy" applications. The framework is implemented in two interacting modules, running in parallel. Instead of simply parsing a word lattice, we rather do tree decoding with a probabilistic approximation of a given grammar, employing a beam search strategy. Logical form is build up in tandem according to the decoded derivation histories, using a codescriptive HPSG grammar for dialog turns. The proposed architecture only uses the knowledge necessary in every processing step, the key aspect being an asynchronous coupling of the two specialized modules.
Improved Lexical Tree Search For Large Vocabulary Speech Recognition
- In Proceedings of the ICASSP
, 1998
"... This paper describes some extensions to the language model (LM) look-ahead pruning approach which is integrated into the time-synchronous beam search algorithm. The search algorithm is based on a lexical prefix tree in combination with a wordconditioned dynamic search space organization for handling ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
This paper describes some extensions to the language model (LM) look-ahead pruning approach which is integrated into the time-synchronous beam search algorithm. The search algorithm is based on a lexical prefix tree in combination with a wordconditioned dynamic search space organization for handling trigram language models in a one-pass strategy. In particular, we study several LM look-ahead pruning techniques. Further, we improve the efficiency of this look-ahead technique by exploiting subtree dominance. This method avoids the computation of redundant subtrees within the copies of the lexical prefix tree and thus reduces the memory requirements of the search algorithm. In addition, we present a pruning criterion depending on the state index. The experimental results on the 20 000-word NAB'94 task (ARPA North American Business Corpus) indicate that the computational effort can be reduced to 4 times real time on a ALPHA5000 PC without a significant loss in the recognition accuracy.
Language Modeling for Efficient Beam-Search
- Computer Speech and Language
, 1995
"... This paper considers the problems of estimating bigram language models and of efficiently representing them by a finite state network, which can be employed by an hidden Markov model based, beam-search, continuous speech recognizer. ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
This paper considers the problems of estimating bigram language models and of efficiently representing them by a finite state network, which can be employed by an hidden Markov model based, beam-search, continuous speech recognizer.
Extensions To The Word Graph Method For Large Vocabulary Continuous Speech Recognition
- Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing
, 1997
"... This paper describes two methods for constructing word graphs for large vocabulary continuous speech recognition. Both word graph methods are based on a time-synchronous, left-to-right beam search strategy in connection with a tree-organized pronunciation lexicon. The first method is based on the so ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
This paper describes two methods for constructing word graphs for large vocabulary continuous speech recognition. Both word graph methods are based on a time-synchronous, left-to-right beam search strategy in connection with a tree-organized pronunciation lexicon. The first method is based on the so-called word pair approximation and fits directly into a word-conditioned search organization. In order to avoid the assumptions made in the word pair approximation, we design another word graph method. This method is based on a time conditioned factoring of the search space. For the case of a trigram language model, we give a detailed comparison of both word graph methods with an integrated search method. The experiments have been carried out on the North American Business (NAB'94) 20,000-word task.

