Results 1 - 10
of
53
Confidence Measures for Large Vocabulary Continuous Speech Recognition
- IEEE Transactions on Speech and Audio Processing
, 2001
"... In this paper, we present several confidence measures for large vocabulary continuous speech recognition. We propose to estimate the confidence of a hypothesized word directly as its posterior probability, given all acoustic observations of the utterance. These probabilities are computed on word gra ..."
Abstract
-
Cited by 70 (7 self)
- Add to MetaCart
In this paper, we present several confidence measures for large vocabulary continuous speech recognition. We propose to estimate the confidence of a hypothesized word directly as its posterior probability, given all acoustic observations of the utterance. These probabilities are computed on word graphs using a forward-backward algorithm. We also study the estimation of posterior probabilities on N-best lists instead of word graphs and compare both algorithms in detail. In addition, we compare the posterior probabilities with two alternative confidence measures, i.e., the acoustic stability and the hypothesis density. We present experimental results on five different corpora: the Dutch ARISE lk evaluation corpus, the German Verbmobil '98 7k evaluation corpus, the English North American Business '94 20k and 64k development corpora, and the English Broadcast News '96 65k evaluation corpus. We show that the posterior probabilities computed on word graphs outperform all other confidence measures. The relative reduction in confidence error rate ranges between 19% and 35% compared to the baseline confidence error rate.
Algorithms For Bigram And Trigram Word Clustering
- Speech Communication
, 1995
"... . This paper presents and analyzes improved algorithms for clustering bigram and trigram word equivalence classes, and their respective results: 1) We give a detailed time complexity analysis of bigram clustering algorithms. 2) We present an improved implementation of bigram clustering so that large ..."
Abstract
-
Cited by 46 (0 self)
- Add to MetaCart
. This paper presents and analyzes improved algorithms for clustering bigram and trigram word equivalence classes, and their respective results: 1) We give a detailed time complexity analysis of bigram clustering algorithms. 2) We present an improved implementation of bigram clustering so that large corpora (38 million words and more) can be clustered within a small number of days or even hours. 3) We extend the clustering approach from bigrams to trigrams. 4) We present experimental results on a 38 million word training corpus. 1. INTRODUCTION Word equivalence classes are a method for improving undertrained word M--gram language models [1], [2], [4]. Words are grouped into classes, and each word belongs to only one such class. Thus, if a word pair is not seen in training, it is quite likely that the corresponding class pair is seen. For bigram and trigram class models, we have the equations p(wn jwn\Gamma1 ) = p0(wn jG(wn)) (1) \Deltap 1(G(wn)jG(wn\Gamma1)) p(wn jwn\Gamma2 ; wn\Gam...
Using Word Probabilities As Confidence Measures
- in Proc. ICASSP
, 1998
"... Estimates of confidence for the output of a speech recognition system can be used in many practical applications of speech recognition technology. They can be employed for detecting possible errors and can help to avoid undesirable verification turns in automatic inquiry systems. In this paper we pr ..."
Abstract
-
Cited by 38 (5 self)
- Add to MetaCart
Estimates of confidence for the output of a speech recognition system can be used in many practical applications of speech recognition technology. They can be employed for detecting possible errors and can help to avoid undesirable verification turns in automatic inquiry systems. In this paper we propose to estimate the confidence in a hypothesized word as its posterior probability, given all acoustic feature vectors of the speaker utterance. The basic idea of our approach is to estimate the posterior word probabilities as the sum of all word hypothesis probabilities which represent the occurrence of the same word in more or less the same segment of time. The word hypothesis probabilities are approximated by paths in a wordgraph and are computed using a simplified forward-backward algorithm. We present experimental results on the NORTH AMERICAN BUSINESS (NAB'94) and the German VERBMOBIL recognition task. 1. INTRODUCTION With the rising number of different application areas for speech ...
Comparison of Discriminative Training Criteria and Optimization Methods for Speech Recognition
, 2001
"... The aim of this work is to build up a common framework for a class of discriminative training criteria and optimization methods for continuous speech recognition. A unified discriminative criterion based on likelihood ratios of correct and competing models with optional smoothing is presented. The u ..."
Abstract
-
Cited by 32 (6 self)
- Add to MetaCart
The aim of this work is to build up a common framework for a class of discriminative training criteria and optimization methods for continuous speech recognition. A unified discriminative criterion based on likelihood ratios of correct and competing models with optional smoothing is presented. The unified criterion leads to particular criteria through the choice of competing word sequences and the choice of smoothing. Analytic and experimental comparisons are presented for both the maximum mutual information (MMI) and the minimum classification error (MCE) criterion together with the optimization methods gradient descent (GD) and extended Baum (EB) algorithm. A tree search-based restricted recognition method using word graphs is presented, so as to reduce the computational complexity of large vocabulary discriminative training. Moreover, for MCE training, a method using word graphs for efficient calculation of discriminative statistics is introduced. Experiments were performed for continuous speech recognition using the ARPA wall street journal (WSJ) corpus with a vocabulary of 5k words and for the recognition of continuously spoken digit strings using both the TI digit string corpus for American English digits, and the SieTill corpus for telephone line recorded German digits. For the MMI criterion, neither analytical nor experimental results do indicate significant differences between EB and GD optimization. For acoustic models of low complexity, MCE training gave significantly better results than MMI training. The recognition results for large vocabulary MMI training on the WSJ corpus show a significant dependence on the context length of the language model used for training. Best results were obtained using a unigram language model for MMI training. No significant co...
Dynamic Programming Search for Continuous Speech Recognition
, 1999
"... . Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred: First, the dynamic programming strategy can be combined with avery e#cient and practical pruning str ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
. Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred: First, the dynamic programming strategy can be combined with avery e#cient and practical pruning strategy so that very large search spaces can be handled. Second, the dynamic programming strategy has turned out to be extremely #exible in adapting to new requirements. Examples of such requirements are the lexical tree organization of the pronunciation lexicon and the generation of a word graph instead of the single best sentence. In this paper, we attempt to systematically review the use of dynamic programming search strategies for small#vocabulary and large#vocabulary continuous speech recognition. The following methods are described in detail: search using a linear lexicon, search using a lexical tree, language-model look-ahead and word graph generation. 1 Introduction Search strategie...
Discriminative training of language models for speech recognition
- In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP
, 2002
"... In this paper we describe how discriminative training can be applied to language models for speech recognition. Language models are important to guide the speech recognizer, particularly in compensating for mistakes in acoustic decoding. A frequently used measure of the quality of language models is ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
In this paper we describe how discriminative training can be applied to language models for speech recognition. Language models are important to guide the speech recognizer, particularly in compensating for mistakes in acoustic decoding. A frequently used measure of the quality of language models is the perplexity; however, what is more important for accurate decoding is not necessarily having the maximum likelihood, but rather the best separation of the correct string from the competing, acoustically confusible hypotheses. Discriminative training can help to improve language models for the purpose of speech recognition by improving the separation of the correct hypothesis from the competing hypotheses. We describe the algorithm and demonstrate modest improvements in word and sentence error rates on the DARPA Communicator task. 1.
A Comparison Of Word Graph And N-Best List Based Confidence Measures
- in Proc. EUROSPEECH
, 1999
"... In this paper we present and compare several confidence measures for large vocabulary continuous speech recognition. We show that posterior word probabilities computed on word graphs and N-best lists clearly outperform non-probabilistic confidence measures, e.g. the acoustic stability and the hypoth ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
In this paper we present and compare several confidence measures for large vocabulary continuous speech recognition. We show that posterior word probabilities computed on word graphs and N-best lists clearly outperform non-probabilistic confidence measures, e.g. the acoustic stability and the hypothesis density. In addition, we prove that the estimation of posterior word probabilities on word graphs yields better results than their estimation on N-best lists and discuss both methods in detail. We present experimental results on three different corpora, the English NAB '94 20k development corpus, the German VERBMOBIL '96 evaluation corpus and a Dutch corpus, which has been recorded with a train timetable information system in the ARISE project. 1. INTRODUCTION In previous studies, the combination of several confidence features was investigated. These features were collected during the acoustic decoding process, e.g. [1] or were extracted from Nbest lists and word graphs, e.g. [2, 5]. ...
Efficient Lattice Representation and Generation
- In Proc. of ICSLP
, 1998
"... In large-vocabulary, multi-pass speech recognition systems, it is desirable to generate word lattices incorporating a large number of hypotheses while keeping the lattice sizes small. We describe two new techniques for reducing word lattice sizes without eliminating hypotheses. The first technique i ..."
Abstract
-
Cited by 18 (6 self)
- Add to MetaCart
In large-vocabulary, multi-pass speech recognition systems, it is desirable to generate word lattices incorporating a large number of hypotheses while keeping the lattice sizes small. We describe two new techniques for reducing word lattice sizes without eliminating hypotheses. The first technique is an algorithm to reduce the size of non-deterministic bigram word lattices. The algorithm iteratively combines lattice nodes and transitions if local properties show that this does not change the set of allowed hypotheses. On bigram word lattices generated from Hub4 Broadcast News speech, it reduces lattice sizes by half on average. It was also found to produce smaller lattices than the standard finite state automaton determinization and minimization algorithms. The second technique is an improved algorithm for expanding lattices with trigram language models. Instead of giving all nodes a unique trigram context, this algorithm only creates unique contexts for trigrams that are explicitly represented in the model. Backed-off trigram probabilities are encoded without node duplication by factoring the probabilities into bigram probabilities and backoff weights. Experiments on Broadcast News show that this method reduces trigram lattice sizes by a factor of 6, and reduces expansion time by more than a factor of 10. Compared to conventionally expanded lattices, recognition with the compactly expanded lattices was also found to be 40 % faster, without affecting recognition accuracy. 1 1.
A voice-controlled automatic telephone switchboard and directory information system
- Speech Communication
, 1997
"... The Philips automatic telephone switchboard and directory information system PADIS provides a natural-language user interface to a telephone directory database. Using speech recognition and language understanding technologies, the system offers phone numbers, fax numbers, email addresses, and room n ..."
Abstract
-
Cited by 17 (5 self)
- Add to MetaCart
The Philips automatic telephone switchboard and directory information system PADIS provides a natural-language user interface to a telephone directory database. Using speech recognition and language understanding technologies, the system offers phone numbers, fax numbers, email addresses, and room numbers as well as direct call completion to a desired party. In this paper, we present the underlying probabilistic framework, the system architecture, and the individual modules for speech recognition, language understanding, dialogue control, and speech output. In addition, we report results on performance and user behaviour obtained from a field test in our research lab with a 600-entry database. We derive a new maximum-a-posteriori decision rule which incorporates database knowledge and dialogue history as constraints in speech recognition and language understanding. It has improved speech understanding accuracy by 19 % (in terms of concept error rate), and reduced attribute substitution errors (e.g. recognition of a wrong name) by 38%. The decision rule is implemented in a multi-stage approach as a combination of state-of-the-art speech recognition, partial parsing with an attributed stochastic context-free grammar, and an N-best algorithm which is also described in this paper. The system conducts a flexible mixed-initiative dialogue rather than using a rigid form-filling scheme, and incorporates database knowledge to optimize the dialogue flow.
Start-synchronous search for large vocabulary continuous speech recognition
- IEEE Trans. Speech and Audio Processing
"... Abstract — In this paper, we present a novel, efficient search strategy for large vocabulary continuous speech recognition. The search algorithm, based on a stack decoder framework, utilizes phone-level posterior probability estimates (produced by a connectionist/hidden Markov model acoustic model) ..."
Abstract
-
Cited by 17 (9 self)
- Add to MetaCart
Abstract — In this paper, we present a novel, efficient search strategy for large vocabulary continuous speech recognition. The search algorithm, based on a stack decoder framework, utilizes phone-level posterior probability estimates (produced by a connectionist/hidden Markov model acoustic model) as a basis for phone deactivation pruning—a highly efficient method of reducing the required computation. The single-pass algorithm is naturally factored into the time-asynchronous processing of the word sequence and the time-synchronous processing of the hidden Markov model state sequence. This enables the search to be decoupled from the language model while still maintaining the computational benefits of time-synchronous processing. The incorporation of the language model in the search is discussed and computationally cheap approximations to the full language model are introduced. Experiments were performed on the North American Business News task using a 60 000 word vocabulary and a trigram language model. Results indicate that the computational cost of the search may be reduced by more than a factor of 40 with a relative search error of less than 2 % using the techniques discussed in the paper. Index Terms — Hidden Markov model, large vocabulary continuous speech recognition, phone deactivation pruning, search, stack decoding. I.

