Results 1 - 10
of
17
Confidence Measures for Large Vocabulary Continuous Speech Recognition
- IEEE Transactions on Speech and Audio Processing
, 2001
"... In this paper, we present several confidence measures for large vocabulary continuous speech recognition. We propose to estimate the confidence of a hypothesized word directly as its posterior probability, given all acoustic observations of the utterance. These probabilities are computed on word gra ..."
Abstract
-
Cited by 70 (7 self)
- Add to MetaCart
In this paper, we present several confidence measures for large vocabulary continuous speech recognition. We propose to estimate the confidence of a hypothesized word directly as its posterior probability, given all acoustic observations of the utterance. These probabilities are computed on word graphs using a forward-backward algorithm. We also study the estimation of posterior probabilities on N-best lists instead of word graphs and compare both algorithms in detail. In addition, we compare the posterior probabilities with two alternative confidence measures, i.e., the acoustic stability and the hypothesis density. We present experimental results on five different corpora: the Dutch ARISE lk evaluation corpus, the German Verbmobil '98 7k evaluation corpus, the English North American Business '94 20k and 64k development corpora, and the English Broadcast News '96 65k evaluation corpus. We show that the posterior probabilities computed on word graphs outperform all other confidence measures. The relative reduction in confidence error rate ranges between 19% and 35% compared to the baseline confidence error rate.
Full Expansion Of Context-Dependent Networks In Large Vocabulary Speech Recognition
- Proceedings of ICASSP 98
, 1998
"... We combine our earlier approach to context-dependent network representation with our algorithm for determinizing weighted networks to build optimized networks for large-vocabulary speech recognition combining an n-gram language model, a pronunciation dictionary and context-dependency modeling. While ..."
Abstract
-
Cited by 32 (12 self)
- Add to MetaCart
We combine our earlier approach to context-dependent network representation with our algorithm for determinizing weighted networks to build optimized networks for large-vocabulary speech recognition combining an n-gram language model, a pronunciation dictionary and context-dependency modeling. While fullyexpanded networks have been used before in restrictive settings (medium vocabulary or no cross-word contexts), we demonstrate that our network determinization method makes it practical to use fully-expanded networks also in large-vocabulary recognition with full cross-word context modeling. For the DARPA North American Business News task (NAB), we give network sizes and recognition speeds and accuracies using bigram and trigram grammars with vocabulary sizes ranging from 10,000 to 160,000 words. With our construction, the fully-expanded NAB context-dependent networks contain only about twice as many arcs as the corresponding language models. Interestingly, we also find that, with these...
Dynamic Programming Search for Continuous Speech Recognition
, 1999
"... . Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred: First, the dynamic programming strategy can be combined with avery e#cient and practical pruning str ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
. Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred: First, the dynamic programming strategy can be combined with avery e#cient and practical pruning strategy so that very large search spaces can be handled. Second, the dynamic programming strategy has turned out to be extremely #exible in adapting to new requirements. Examples of such requirements are the lexical tree organization of the pronunciation lexicon and the generation of a word graph instead of the single best sentence. In this paper, we attempt to systematically review the use of dynamic programming search strategies for small#vocabulary and large#vocabulary continuous speech recognition. The following methods are described in detail: search using a linear lexicon, search using a lexical tree, language-model look-ahead and word graph generation. 1 Introduction Search strategie...
Language-Model Look-Ahead For Large Vocabulary Speech Recognition
- Proc. Int. Conf. on Spoken Language Processing
, 1996
"... In this paper, we present an efficient look-ahead technique which incorporates the language model knowledge at the earliest possible stage during the search process. This so-called language model look-ahead is built into the time synchronous beam search algorithm using a tree-organized pronunciation ..."
Abstract
-
Cited by 22 (9 self)
- Add to MetaCart
In this paper, we present an efficient look-ahead technique which incorporates the language model knowledge at the earliest possible stage during the search process. This so-called language model look-ahead is built into the time synchronous beam search algorithm using a tree-organized pronunciation lexicon for a bigram language model. The language model look-ahead technique exploits the full knowledge of the bigram language model by distributing the language model probabilities over the nodes of the lexical tree for each predecessor word. We present a method for handling the resulting memory requirements. The recognition experiments performed on the 20 000-word North American Business task (Nov.'96) demonstrate that in comparison with the unigram look-ahead a reduction by a factor of 5 in the acoustic search effort can be achieved without loss in recognition accuracy.
A voice-controlled automatic telephone switchboard and directory information system
- Speech Communication
, 1997
"... The Philips automatic telephone switchboard and directory information system PADIS provides a natural-language user interface to a telephone directory database. Using speech recognition and language understanding technologies, the system offers phone numbers, fax numbers, email addresses, and room n ..."
Abstract
-
Cited by 17 (5 self)
- Add to MetaCart
The Philips automatic telephone switchboard and directory information system PADIS provides a natural-language user interface to a telephone directory database. Using speech recognition and language understanding technologies, the system offers phone numbers, fax numbers, email addresses, and room numbers as well as direct call completion to a desired party. In this paper, we present the underlying probabilistic framework, the system architecture, and the individual modules for speech recognition, language understanding, dialogue control, and speech output. In addition, we report results on performance and user behaviour obtained from a field test in our research lab with a 600-entry database. We derive a new maximum-a-posteriori decision rule which incorporates database knowledge and dialogue history as constraints in speech recognition and language understanding. It has improved speech understanding accuracy by 19 % (in terms of concept error rate), and reduced attribute substitution errors (e.g. recognition of a wrong name) by 38%. The decision rule is implemented in a multi-stage approach as a combination of state-of-the-art speech recognition, partial parsing with an attributed stochastic context-free grammar, and an N-best algorithm which is also described in this paper. The system conducts a flexible mixed-initiative dialogue rather than using a rigid form-filling scheme, and incorporates database knowledge to optimize the dialogue flow.
Network Optimizations for Large Vocabulary Speech Recognition
- Speech Communication
, 1998
"... The redundancy and the size of networks in large-vocabulary speech recognition systems can have a critical effect on their overall performance. We describe the use of two new algorithms: weighted determinization and minimization [12]. These algorithms transform recognition labeled networks into equi ..."
Abstract
-
Cited by 16 (7 self)
- Add to MetaCart
The redundancy and the size of networks in large-vocabulary speech recognition systems can have a critical effect on their overall performance. We describe the use of two new algorithms: weighted determinization and minimization [12]. These algorithms transform recognition labeled networks into equivalent ones that require much less time and space in large-vocabulary speech recognition. They are both optimal: weighted determinization eliminates the number of alternatives at each state to the minimum, and weighted minimization reduces the size of deterministic networks to the smallest possible number of states and transitions. These algorithms generalize classical automata determinization and minimization to deal properly with the probabilities of alternative hypotheses and with the relationships between units (distributions, phones, words) at different levels in the recognition system. We illustrate their use in several applications, and report the results of our experiments. Key words...
High Quality Word Graphs Using Forward-Backward Pruning
- In Proc. of the IEEE Int. Conf. on Acoustics, Speech and Signal Processing
, 1999
"... This paper presents an efficient method for constructing high quality word graphs for large vocabulary continuous speech recognition. The word graphs are constructed in a two-pass strategy. In the first pass, a huge word graph is produced using the timesynchronous lexical tree search method. Then, i ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
This paper presents an efficient method for constructing high quality word graphs for large vocabulary continuous speech recognition. The word graphs are constructed in a two-pass strategy. In the first pass, a huge word graph is produced using the timesynchronous lexical tree search method. Then, in the second pass, this huge word graph is pruned by applying a modified forwardbackward algorithm. To analyze the characteristic properties of this word graph pruning method, we present a detailed comparison with the conventional time-synchronous forward pruning. The recognition experiments, carried out on the North American Business (NAB) 20 000-word task, demonstrate that, in comparison to the forward pruning, the new method leads to a significant reduction in the size of the word graph without an increase in the graph word error rate. 1. INTRODUCTION In this paper, we present a different approach to the word graph forward pruning technique [7]. This approach is based on the paradigm of...
Automatic Question Generation For Decision Tree Based State Tying
- Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing
, 1998
"... Decision tree based state tying uses so-called phonetic questions to assign triphone states to reasonable acoustic models. These phonetic questions are in fact phonetic categories such as vowels, plosives or fricatives. The assumption behind this is that context phonemes which belong to the same pho ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
Decision tree based state tying uses so-called phonetic questions to assign triphone states to reasonable acoustic models. These phonetic questions are in fact phonetic categories such as vowels, plosives or fricatives. The assumption behind this is that context phonemes which belong to the same phonetic class have a similar influence on the pronunciation of a phoneme. For a new phoneme set, which has to be used e.g. when switching to a different corpus, a phonetic expert is needed to define proper phonetic questions. In this paper a new method is presented which automatically defines good phonetic questions for a phoneme set. This method uses the intermediate clusters from a phoneme clustering algorithm which are reduced to an appropriate number afterwards. Recognition results on the Wall Street Journal data for within-word and acrossword phoneme models show competitive performance of the automatically generated questions with our best handcrafted question set.
Progress in Dynamic Programming Search for LVCSR
- Proceedings of the IEEE
, 1997
"... This paper gives an overview of the recent improvements in dynamic programming search for large vocabulary continuous speech recognition: search using lexical trees, time-conditioned search and word graph construction. ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
This paper gives an overview of the recent improvements in dynamic programming search for large vocabulary continuous speech recognition: search using lexical trees, time-conditioned search and word graph construction.
Improved Lexical Tree Search For Large Vocabulary Speech Recognition
- In Proceedings of the ICASSP
, 1998
"... This paper describes some extensions to the language model (LM) look-ahead pruning approach which is integrated into the time-synchronous beam search algorithm. The search algorithm is based on a lexical prefix tree in combination with a wordconditioned dynamic search space organization for handling ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
This paper describes some extensions to the language model (LM) look-ahead pruning approach which is integrated into the time-synchronous beam search algorithm. The search algorithm is based on a lexical prefix tree in combination with a wordconditioned dynamic search space organization for handling trigram language models in a one-pass strategy. In particular, we study several LM look-ahead pruning techniques. Further, we improve the efficiency of this look-ahead technique by exploiting subtree dominance. This method avoids the computation of redundant subtrees within the copies of the lexical prefix tree and thus reduces the memory requirements of the search algorithm. In addition, we present a pruning criterion depending on the state index. The experimental results on the 20 000-word NAB'94 task (ARPA North American Business Corpus) indicate that the computational effort can be reduced to 4 times real time on a ALPHA5000 PC without a significant loss in the recognition accuracy.

