Results 1 - 10
of
17
On Supervised Learning From Sequential Data With Applications For Speech Recognition
, 1999
"... visualization of the problem to model human speech. A large number of example sequences of observation vectors (shown connected as continuous trajectories) depending on a given sequence of class labels, with each class representing for example a phoneme (here the name Keiko with given durations). In ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
visualization of the problem to model human speech. A large number of example sequences of observation vectors (shown connected as continuous trajectories) depending on a given sequence of class labels, with each class representing for example a phoneme (here the name Keiko with given durations). In this synthetic example, the one-dimensional target data would be represented poorly by a uni-modal Gaussian distribution with a constant variance (which corresponds to using the squared-error objective function), which would average the two separate branches, indicated by the fat lines as the mean and constant variance of the single Gaussian. Compare this figure with Figure 3.10, Figure 3.11 and Figure 3.12 to see a subsequent improvement of the model.
A Word Graph Based N-Best Search in Continuous Speech Recognition
, 1996
"... In this paper, weintroduce an e#cient algorithm for the exhaustive search of N best sentence hypotheses in a word graph. The search procedure is based on a two-pass algorithm. In the #rst pass, a word graph is constructed with standard time-synchronous beam search. The actual extraction of N best wo ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
In this paper, weintroduce an e#cient algorithm for the exhaustive search of N best sentence hypotheses in a word graph. The search procedure is based on a two-pass algorithm. In the #rst pass, a word graph is constructed with standard time-synchronous beam search. The actual extraction of N best word sequences from the word graph takes place during the second pass.
Dynamic Search-Space Pruning For Time-Constrained Speech Recognition
- in: International Conference on Spoken Language Processing
, 2002
"... In automatic speech recognition complex state spaces are searched during the recognition process. By limiting these search spaces the computation time can be reduced, but unfortunately the recognition rate mostly decreases, too. However, especially for time-critical recognition tasks a search-space ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In automatic speech recognition complex state spaces are searched during the recognition process. By limiting these search spaces the computation time can be reduced, but unfortunately the recognition rate mostly decreases, too. However, especially for time-critical recognition tasks a search-space pruning is necessary. Therefore, we developed a dynamic mechanism to optimize the pruning parameters for time-constrained recognition tasks, e.g. speech recognition for robotic systems, in respect to word accuracy and computation time. With this mechanism an automatic speech recognition system can process speech signals with an approximately constant processing rate. Compared to a system without such a dynamic mechanism and the same time available for computation, the variance of the processing rate is decreased greatly without a significant loss of word accuracy. Furthermore, the extended system can be sped up to real-time processing, if desired or necessary.
A Baseline For The Transcription Of Italian Broadcast News
- IN PROC. OF ICASSP
"... This paper presents the first achievements in the development of a broadcast news transcription system to be applied for the processing of huge audio archives. In particular, the Italian broadcast news corpus under collection is introduced, and the first implemented baseline system is outlined. The ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
This paper presents the first achievements in the development of a broadcast news transcription system to be applied for the processing of huge audio archives. In particular, the Italian broadcast news corpus under collection is introduced, and the first implemented baseline system is outlined. The baseline system consists of an audio segmentation module and a speech recognizer featuring a recursive Viterbi beam search, a 64K-word lexicon, a treebased trigram LM representation, and MLLR adaptation. The word error rate of the baseline was 20.9% on planned studio speech and 28.8% on the whole test set.
Juicer: A Weighted Finite-State Transducer speech decoder
"... Abstract. A major component in the development of any speech recognition system is the decoder. As task complexities and, consequently, system complexities have continued to increase the decoding problem has become an increasingly significant component in the overall speech recognition system develo ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Abstract. A major component in the development of any speech recognition system is the decoder. As task complexities and, consequently, system complexities have continued to increase the decoding problem has become an increasingly significant component in the overall speech recognition system development effort, with efficient decoder design contributing to significantly improve the trade-off between decoding time and search errors. In this paper we present the“Juicer”(from transducer) large vocabulary continuous speech recognition (LVCSR) decoder based on weighted finite-State transducer (WFST). We begin with a discussion of the need for open source, state-of-the-art decoding software in LVCSR research and how this lead to the development of Juicer, followed by a brief overview of decoding techniques and major issues in decoder design. We present Juicer and its major features, emphasising its potential not only as a critical component in the development of LVCSR systems, but also as an important research tool in itself, being based around the flexible WFST paradigm. We also provide results of benchmarking tests that have been carried out to date, demonstrating that in many respects Juicer, while still in its early development, is already achieving stateof-the-art. These benchmarking tests serve to not only demonstrate the utility of Juicer in its present state, but are also being used to guide future development, hence, we conclude with a brief discussion of some of the extensions that are currently under way or being considered for Juicer. 1
A Fast Version Of The ATROS System
, 1999
"... Atros is an automatic speech recognition/understanding /translation system whose knowledge sources (acoustic models, lexical models, syntactic language models, semantic models and translation models) can be learnt automatically from training data by using similar techniques. The search process in At ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Atros is an automatic speech recognition/understanding /translation system whose knowledge sources (acoustic models, lexical models, syntactic language models, semantic models and translation models) can be learnt automatically from training data by using similar techniques. The search process in Atros is performed through a Synchronous Beam Search technique. In this paper, a faster version of Atros is presented and evaluated. This version supports improved acoustic and syntactical models. It also incorporates improved search algorithms to reduce and the computational requirements for decoding: Fast Phoneme Look-Ahead and Histogram Pruning. The system has been tested on a Spanish task of queries to a geographical database (with a vocabulary of 1,264 words). The best result achieved (in real time) was 7.10% of word error rate. 1 System overview Optimal speech decoding based on a search process in an integrated network of different knowledge sources is a hard computational problem [1]. ...
Acoustic And Syntactical Modeling in the ATROS System
, 1999
"... Current speech technology allows us to build efficient speech recognition systems. However, model learning of knowledge sources in a speech recognition system is not a closed problem. In addition, lower demand of computational requirements are crucial to building real-time systems. ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Current speech technology allows us to build efficient speech recognition systems. However, model learning of knowledge sources in a speech recognition system is not a closed problem. In addition, lower demand of computational requirements are crucial to building real-time systems.
An Adaptive-Beam Pruning Technique For Continuous Speech Recognition
"... Pruning is an essential paradigm to build HMM-based large vocabulary speech recognisers that use reasonable computing resources. Unlikely sentence, word or subword hypotheses are removed from the search space when their likelihood falls outside a beam relative to the best scoring hypothesis. A metho ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Pruning is an essential paradigm to build HMM-based large vocabulary speech recognisers that use reasonable computing resources. Unlikely sentence, word or subword hypotheses are removed from the search space when their likelihood falls outside a beam relative to the best scoring hypothesis. A method for automatically steering this beam such that the search space attains a prede#ned size is presented.
The Juicer LVCSR Decoder- User Manual for Juicer version 0.5.0
, 2005
"... Juicer is a decoder for HMM-based large vocabulary speech recognition that uses a weighted finite state transducer (WFST) representation of the search space. The package consists of a number of command line utilities: the Juicer decoder itself, along with a number of tools and scripts that are used ..."
Abstract
- Add to MetaCart
Juicer is a decoder for HMM-based large vocabulary speech recognition that uses a weighted finite state transducer (WFST) representation of the search space. The package consists of a number of command line utilities: the Juicer decoder itself, along with a number of tools and scripts that are used
Fast Phoneme Look-Ahead in the ATROS system
- Accepted in VIII Spanish Symposium on Pattern Recognition and Image Analysis
, 1999
"... Current speech recognition systems require a lot of computational resources to decode an input utterance. Many efforts have been done in order to reduce these requirements. One of the techniques that is being explored is the fast phoneme look-ahead. The idea is to compute quickly approximate scor ..."
Abstract
- Add to MetaCart
Current speech recognition systems require a lot of computational resources to decode an input utterance. Many efforts have been done in order to reduce these requirements. One of the techniques that is being explored is the fast phoneme look-ahead. The idea is to compute quickly approximate scores in order to prune little promising hypothesis. These scores are computed by using simple phone-like units and analysing an acoustic segment look-ahead.

