Results 1 - 10
of
12
An Application of Recurrent Nets to Phone Probability Estimation
- IEEE Transactions on Neural Networks
, 1994
"... This paper presents an application of recurrent networks for phone probability estimation in large vocabulary speech recognition. The need for efficient exploitation of context information is discussed ..."
Abstract
-
Cited by 165 (8 self)
- Add to MetaCart
This paper presents an application of recurrent networks for phone probability estimation in large vocabulary speech recognition. The need for efficient exploitation of context information is discussed
Transcription Of Broadcast Television And Radio News: The 1996 Abbot System
- In DARPA Speech Recognition Workshop
, 1997
"... ABBOT is a hybrid connectionist-HMM large vocabulary continuous speech recognition system developed at the Cambridge University Engineering Department. This uses a recurrent neural network acoustic model to map acoustic features into posterior phone probabilities. These posterior probabilities are t ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
ABBOT is a hybrid connectionist-HMM large vocabulary continuous speech recognition system developed at the Cambridge University Engineering Department. This uses a recurrent neural network acoustic model to map acoustic features into posterior phone probabilities. These posterior probabilities are then converted to scaled likelihoods and used as observation likelihoods for phone HMMs [1, 2]. This paper describes the development of the CUCON system which participated in the 1996 ARPA Hub 4 Evaluations. The system is based on ABBOT. The Hub 4 Evaluation task involves the transcription of broadcast television and radio news programmes. This is an extremely demanding task for state-of-the-art speech recognition systems. Typical programmes include a wide variety of speaking styles and acoustic conditions. These range from read speech recorded in the studio to extemporaneous speech recorded over telephone channels. Results are presented for the system at various stages of development, as we...
The 1997 Abbot System For The Transcription Of Broadcast News
- IN PROCEEDINGS OF THE 1998 BROADCAST NEWS TRANSCRIPTION AND UNDERSTANDING WORKSHOP
, 1998
"... This paper describes the development of a connectionist-hidden Markov model (HMM) system for the 1997 DARPA Hub-4E CSR evaluations. We describe both system development and the enhancements designed to improve performance on broadcast news data. Both multilayer perceptron (MLP) and recurrent neural n ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
This paper describes the development of a connectionist-hidden Markov model (HMM) system for the 1997 DARPA Hub-4E CSR evaluations. We describe both system development and the enhancements designed to improve performance on broadcast news data. Both multilayer perceptron (MLP) and recurrent neural network acoustic models have been investigated. We assess the effect of using gender-dependent acoustic models, and the impact on performance of varying both the number of parameters and the amount of training data used for acoustic modelling. The use of contextdependent phone models is described, and the effect of the number of context classes is investigated. We also describe a method for incorporating syllable boundary information during search. Results are reported on the 1997 DARPA Hub-4E development test set. We then describe the CU-CON evaluation system and report results on the 1997 Hub-4E test set.
Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks
- In Proceedings of the International Conference on Machine Learning, ICML 2006
, 2006
"... Many real-world sequence learning tasks require the prediction of sequences of labels from noisy, unsegmented input data. In speech recognition, for example, an acoustic signal is transcribed into words or sub-word units. Recurrent neural networks (RNNs) are powerful sequence learners that would see ..."
Abstract
-
Cited by 13 (9 self)
- Add to MetaCart
Many real-world sequence learning tasks require the prediction of sequences of labels from noisy, unsegmented input data. In speech recognition, for example, an acoustic signal is transcribed into words or sub-word units. Recurrent neural networks (RNNs) are powerful sequence learners that would seem well suited to such tasks. However, because they require pre-segmented training data, and post-processing to transform their outputs into label sequences, their applicability has so far been limited. This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems. An experiment on the TIMIT speech corpus demonstrates its advantages over both a baseline HMM and a hybrid HMM-RNN. 1.
On Supervised Learning From Sequential Data With Applications For Speech Recognition
, 1999
"... visualization of the problem to model human speech. A large number of example sequences of observation vectors (shown connected as continuous trajectories) depending on a given sequence of class labels, with each class representing for example a phoneme (here the name Keiko with given durations). In ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
visualization of the problem to model human speech. A large number of example sequences of observation vectors (shown connected as continuous trajectories) depending on a given sequence of class labels, with each class representing for example a phoneme (here the name Keiko with given durations). In this synthetic example, the one-dimensional target data would be represented poorly by a uni-modal Gaussian distribution with a constant variance (which corresponds to using the squared-error objective function), which would average the two separate branches, indicated by the fat lines as the mean and constant variance of the single Gaussian. Compare this figure with Figure 3.10, Figure 3.11 and Figure 3.12 to see a subsequent improvement of the model.
Phonetic Context-Dependency In a Hybrid ANN/HMM Speech Recognition System
, 1997
"... This report uses a bark scale, which has been replaced here with a mel-scale. CHAPTER 3. THE ABBOT SPEECH RECOGNITION SYSTEM 32 where, ¯ i = 1 ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
This report uses a bark scale, which has been replaced here with a mel-scale. CHAPTER 3. THE ABBOT SPEECH RECOGNITION SYSTEM 32 where, ¯ i = 1
The 1995 Abbot Lvcsr System For Multiple Unknown Microphones
- IN INT. CONF. IN SPOKEN LANGUAGE PROCESSING
, 1996
"... ABBOT is the hybrid connectionist-hidden Markov model largevocabulary speech recognition system developed at Cambridge University. In this system, a recurrent network maps each acoustic vector to an estimate of the posterior probabilities of the phone classes, which are used as observation probabili ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
ABBOT is the hybrid connectionist-hidden Markov model largevocabulary speech recognition system developed at Cambridge University. In this system, a recurrent network maps each acoustic vector to an estimate of the posterior probabilities of the phone classes, which are used as observation probabilities within an HMM. This paper describes the system which participated in the November 1995 ARPA Hub-3 Multiple Unknown Microphones (MUM) evaluation of continuous speech recognition systems, under the guise of the CU-CON system. The emphasis of the paper is on the changes made to the 1994 ABBOT system, specifically to accomodate the H3 task. This includes improved acoustic modelling using limited word-internal context-dependentmodels, training on the Wall Street Journal secondary channel database, and using the linear input network for speaker and environmental adaptation. Experimental results are reported for various test and development sets from the November 1994 and 1995 ARPA benchmark tests.
The 1995 Abbot Hybrid Connectionist-HMM Large-Vocabulary Recognition System
"... Abbot is the hybrid connectionist-hidden Markov model large-vocabulary speech recognition system developed at Cambridge University. In this system, a recurrent network maps each acoustic vector to an estimate of the posterior probabilities of the phone classes. This paper describes the system which ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abbot is the hybrid connectionist-hidden Markov model large-vocabulary speech recognition system developed at Cambridge University. In this system, a recurrent network maps each acoustic vector to an estimate of the posterior probabilities of the phone classes. This paper describes the system which participated in the November 1995 ARPA H3 Multiple Unknown Microphones (MUM) evaluation of continuous speech recognition systems, under the guise of the CU-CON system. The emphasis of the paper is on the changes made to the 1994 Abbot system, specifically to accomodate the H3 task. This includes improved acoustic modelling using limited word-internal context-dependent models, training on the Wall Street Journal secondary channel database, the linear input network for speaker and environmental adaptation and the continued development of a realtime single-pass decoder well suited to the hybrid approach. Experimental results are reported for various test and development sets from the November 1...
The 1994 Abbot Hybrid Connectionist-HMM Large-Vocabulary Recognition System
, 1995
"... ABBOT is the hybrid connectionist-hidden Markov model largevocabulary speech recognition system developed at Cambridge University. In this system, a recurrent network maps each acoustic vector to an estimate of the posterior probabilities of the phone classes. The maximum likelihood word string is t ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
ABBOT is the hybrid connectionist-hidden Markov model largevocabulary speech recognition system developed at Cambridge University. In this system, a recurrent network maps each acoustic vector to an estimate of the posterior probabilities of the phone classes. The maximum likelihood word string is then extracted using Markov models. As in traditional hidden Markov models, the Markov process is used to model the lexical and language model constraints. This paper describes the system which participated in the November 1994 ARPA evaluation of continuous speech recognition systems. The emphasis of the paper is on the differences between the 1993 and 1994 versions of the ABBOT system. This includes the utilization of a larger training corpus (SI284 versus SI84), the extension of the lexicon from 5,000 words to 65,000 words, the application of a trigram language model, and the development of a near-realtime single-pass decoder well suited for the hybrid approach. Experimental results are rep...
Acoustic Model Building Based On Non-Uniform Segments And Bidirectional Recurrent Neural Networks
- ICASSP 97, Muenchen
, 1996
"... In this paper a new framework for acoustic model building is presented. It is based on non-uniform segment models, which are learned and scored with a time bidirectional recurrent neural network. While usually neural networks in speech recognition systems are used to estimate posterior "frame to pho ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In this paper a new framework for acoustic model building is presented. It is based on non-uniform segment models, which are learned and scored with a time bidirectional recurrent neural network. While usually neural networks in speech recognition systems are used to estimate posterior "frame to phoneme" probabilities, they are used here to estimate directly "segment to phoneme" probabilities, which results in an improved duration model. The special MAP approach allows not only incorporation of long term dependencies on the acoustic side, but also on the phone (output) side, which results automatically in parameter efficient context dependent models. While the use of neural networks as frame or phoneme classifiers always results in discriminative training for the acoustic information, the MAP approach presented here also incorporates discriminative training for the internally learned phoneme language model. Classification tests for the TIMIT phoneme database gave promising results of 7...

