Results 1 -
2 of
2
Tree-Based State Tying for High Accuracy Acoustic Modelling
, 1994
"... The key problem to be faced when building a HMM-based continuous speech recogniser is maintaining the balance be-tween model complexity and available training data. For large vocabulary systems requiring cross-word context dependent modelling, this is particularly acute since many mmh contexts will ..."
Abstract
-
Cited by 139 (15 self)
- Add to MetaCart
The key problem to be faced when building a HMM-based continuous speech recogniser is maintaining the balance be-tween model complexity and available training data. For large vocabulary systems requiring cross-word context dependent modelling, this is particularly acute since many mmh contexts will never occur in the training data. This paper describes a method of creating a tied-state continuous speech recognition system using a phonetic decision tree. This tree-based clustering is shown to lead to similar recognition performance to that obtained using an earlier data-driven approach but to have the additional advantage of providing a mapping for unseen triphones. State-tying is also compared with traditional model-based tying and shown to be clearly superior. Experimental results are presented for both the Resource Management and Wall Street Journal tasks.
Continuous Speech Recognition in the WAXHOLM Dialogue System
, 1996
"... This paper presents the status of the continuous speech recognition engine of the WAXHOLM project. The engine is a software only system written in portable C code. The design is flexible and different modes for phonetic pattern matching are available. In particular, artificial neural networks and ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This paper presents the status of the continuous speech recognition engine of the WAXHOLM project. The engine is a software only system written in portable C code. The design is flexible and different modes for phonetic pattern matching are available. In particular, artificial neural networks and standard multiple Gaussian mixtures are implemented for phone probability estimation, and for research purposes, a general mode where the input consists of a phone-graph also exists. A lexicon with multiple pronunciations for many words and a class bigram-grammar is used. The lexicon and grammar constraints are represented by a lexical graph, optimised for efficient lexical decoding. The decoding is performed in a two-pass search. The first pass is a Viterbi beam-search and the second is an A* stackdecoding search. Pruning-strategies and memory management in the two passes are discussed in the report. Several different output formats are available. Results can be reported either on the word or phoneme level with or without the time alignment information. Multiple hypotheses can be output either as standard Nbest lists or in a more compact word-graph format. Continuous speech recognition can be performed on a standard UNIX workstation in real-time with a lexicon of about 1000 words.

