Results 1 - 10
of
25
The Use of Context in Large Vocabulary Speech Recognition
, 1995
"... decide which contexts are similar and can share parameters. A key feature of this approach is that it allows the construction of models which are dependent upon contextual effects occurring across word boundaries. The use of cross word context dependent models presents problems for conventional dec ..."
Abstract
-
Cited by 93 (0 self)
- Add to MetaCart
decide which contexts are similar and can share parameters. A key feature of this approach is that it allows the construction of models which are dependent upon contextual effects occurring across word boundaries. The use of cross word context dependent models presents problems for conventional decoders. The second part of the thesis therefore presents a new decoder design which is capable of using these models efficiently. The decoder is suitable for use with very large vocabularies and long span language models. It is also capable of generating a lattice of word hypotheses with little computational overhead. These lattices can be used to constrain further decoding, allowing efficient use of complex acoustic and language models. The effectiveness of these techniques has been assessed on a variety of large vocabulary continuous speech recognition tasks and results are presented which analyse performance in terms of computational complexity and recognition accuracy. The experiments dem
Markovian Models for Sequential Data
, 1996
"... Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many machine learning applications, especially for speech recognition. Furthermore, in the last few years, many new and promising probabilistic models related to HMMs have been proposed. We firs ..."
Abstract
-
Cited by 69 (2 self)
- Add to MetaCart
Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many machine learning applications, especially for speech recognition. Furthermore, in the last few years, many new and promising probabilistic models related to HMMs have been proposed. We first summarize the basics of HMMs, and then review several recent related learning algorithms and extensions of HMMs, including in particular hybrids of HMMs with artificial neural networks, Input-Output HMMs (which are conditional HMMs using neural networks to compute probabilities), weighted transducers, variable-length Markov models and Markov switching state-space models. Finally, we discuss some of the challenges of future research in this very active area. 1 Introduction Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many applications in artificial intelligence, pattern recognition, speech recognition, and modeling of biological ...
Large Vocabulary Continuous Speech Recognition: a Review
- of INCIS Project, Schedule 6 in (Small
, 1996
"... This article will discuss the principles and architecture of current LVR systems and identify the key issues affecting their future deployment. To illustrate the various points raised, the Cambridge University HTK system will be described. This is a modern design giving state-of-the-art performance ..."
Abstract
-
Cited by 62 (1 self)
- Add to MetaCart
This article will discuss the principles and architecture of current LVR systems and identify the key issues affecting their future deployment. To illustrate the various points raised, the Cambridge University HTK system will be described. This is a modern design giving state-of-the-art performance and it is typical of the current generation of recognition systems. 2 System Overview
MMIE training of large vocabulary recognition systems
, 1997
"... This paper describes a framework for optimising the structure and parameters of a continuous density HMM-based large Z. vocabulary recognition system using the Maximum Mutual Information Estimation MMIE criterion. To reduce the computational complexity of the MMIE training algorithm, confusable seg ..."
Abstract
-
Cited by 32 (2 self)
- Add to MetaCart
This paper describes a framework for optimising the structure and parameters of a continuous density HMM-based large Z. vocabulary recognition system using the Maximum Mutual Information Estimation MMIE criterion. To reduce the computational complexity of the MMIE training algorithm, confusable segments of speech are identified and stored as word lattices of alternative utterance hypotheses. An iterative mixture splitting procedure is also employed to adjust the number of mixture components in each state during training such that the optimal balance between the number of parameters and the available training data is achieved. Experiments are presented on various test sets from the Wall Street Journal database using up to 66 hours of acoustic training data. These demonstrate that the use of lattices makes MMIE training practicable for very complex recognition systems and large training sets. Furthermore, the experimental results show that MMIE optimisation of system structure and param...
Dynamic Programming Search for Continuous Speech Recognition
, 1999
"... . Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred: First, the dynamic programming strategy can be combined with avery e#cient and practical pruning str ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
. Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred: First, the dynamic programming strategy can be combined with avery e#cient and practical pruning strategy so that very large search spaces can be handled. Second, the dynamic programming strategy has turned out to be extremely #exible in adapting to new requirements. Examples of such requirements are the lexical tree organization of the pronunciation lexicon and the generation of a word graph instead of the single best sentence. In this paper, we attempt to systematically review the use of dynamic programming search strategies for small#vocabulary and large#vocabulary continuous speech recognition. The following methods are described in detail: search using a linear lexicon, search using a lexical tree, language-model look-ahead and word graph generation. 1 Introduction Search strategie...
Look-Ahead Techniques For Fast Beam Search
, 1997
"... this paper, we present two efficient look-ahead pruning techniques in beam search for large vocabulary continuous speech recognition. Both techniques, the language model look-ahead and the phoneme look-ahead, are incorporated into the word conditioned search algorithm using a bigram language model a ..."
Abstract
-
Cited by 20 (8 self)
- Add to MetaCart
this paper, we present two efficient look-ahead pruning techniques in beam search for large vocabulary continuous speech recognition. Both techniques, the language model look-ahead and the phoneme look-ahead, are incorporated into the word conditioned search algorithm using a bigram language model and a lexical prefix tree [5]. The paper present the following novel contributions: ffl We describe a method for language model (LM) look-ahead pruning which is similar to [1, 9]. We show special techniques to reduce the memory and computational requirements. These techniques are based on a compressed LM look-ahead tree. To compute the LM look-ahead tree probabilites in an efficient way, we present a backward dynamic programming scheme
Start-synchronous search for large vocabulary continuous speech recognition
- IEEE Trans. Speech and Audio Processing
"... Abstract — In this paper, we present a novel, efficient search strategy for large vocabulary continuous speech recognition. The search algorithm, based on a stack decoder framework, utilizes phone-level posterior probability estimates (produced by a connectionist/hidden Markov model acoustic model) ..."
Abstract
-
Cited by 17 (9 self)
- Add to MetaCart
Abstract — In this paper, we present a novel, efficient search strategy for large vocabulary continuous speech recognition. The search algorithm, based on a stack decoder framework, utilizes phone-level posterior probability estimates (produced by a connectionist/hidden Markov model acoustic model) as a basis for phone deactivation pruning—a highly efficient method of reducing the required computation. The single-pass algorithm is naturally factored into the time-asynchronous processing of the word sequence and the time-synchronous processing of the hidden Markov model state sequence. This enables the search to be decoupled from the language model while still maintaining the computational benefits of time-synchronous processing. The incorporation of the language model in the search is discussed and computationally cheap approximations to the full language model are introduced. Experiments were performed on the North American Business News task using a 60 000 word vocabulary and a trigram language model. Results indicate that the computational cost of the search may be reduced by more than a factor of 40 with a relative search error of less than 2 % using the techniques discussed in the paper. Index Terms — Hidden Markov model, large vocabulary continuous speech recognition, phone deactivation pruning, search, stack decoding. I.
On Supervised Learning From Sequential Data With Applications For Speech Recognition
, 1999
"... visualization of the problem to model human speech. A large number of example sequences of observation vectors (shown connected as continuous trajectories) depending on a given sequence of class labels, with each class representing for example a phoneme (here the name Keiko with given durations). In ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
visualization of the problem to model human speech. A large number of example sequences of observation vectors (shown connected as continuous trajectories) depending on a given sequence of class labels, with each class representing for example a phoneme (here the name Keiko with given durations). In this synthetic example, the one-dimensional target data would be represented poorly by a uni-modal Gaussian distribution with a constant variance (which corresponds to using the squared-error objective function), which would average the two separate branches, indicated by the fat lines as the mean and constant variance of the single Gaussian. Compare this figure with Figure 3.10, Figure 3.11 and Figure 3.12 to see a subsequent improvement of the model.
Time Synchronous Chart Parsing of Speech Integrating Unification Grammars with Statistics
- Proceedings of Twente Workshop on Speech and Language Engineering
, 1994
"... We present an active chart parser which parses left connected wordgraphs in a strictly time synchronous way. The parser performs a beam search on the possible paths through the word graph and on the possible derivations of the unification grammar simultaneously. A metric is given to assign scores to ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
We present an active chart parser which parses left connected wordgraphs in a strictly time synchronous way. The parser performs a beam search on the possible paths through the word graph and on the possible derivations of the unification grammar simultaneously. A metric is given to assign scores to edges, taking into account the whole left context thereby combining acoustic probabilities, n-gram probabilities and unification grammar probabilities. A specialized model for the derivation of typed unification grammars is introduced. Different ways of coupling the parser with an LR beam decoder in an online time synchronous fashion are defined and several experimental results are presented. Two top down and one bottom up method are investigated. In bottom up mode, the decoder sends word hypotheses as they are found from left to right, while the parser keeps step. In verify mode, the decoder is always a frame ahead, while the parser verifies received hypotheses, providing language informat...
Incremental Language Models For Speech Recognition Using Finite-State Transducers
- in Proc. IEEE Automatic Speech Recogntion and Understanding Workshop, Madonna di Campiglio
, 2001
"... to speech recognition, we investigate a novel decoding strategy to deal with very large n-gram language models often used in large-vocabulary systems. In particular, we present an alternative to full, static expansion and optimization of the finite-state transducer network. This alternative is usefu ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
to speech recognition, we investigate a novel decoding strategy to deal with very large n-gram language models often used in large-vocabulary systems. In particular, we present an alternative to full, static expansion and optimization of the finite-state transducer network. This alternative is useful when the individual knowledge sources, modeled as transducers, are too large to be composed and optimized. While the recognition decoder perceives a single, weighted finitestate transducer, we apply a divide-and-conquer technique to split the language model into two parts which add up exactly to the original language model. We investigate the merits of these `incremental language models' and present some initial results.

