Results 1 - 10
of
14
Transcription Of Broadcast Television And Radio News: The 1996 Abbot System
- In DARPA Speech Recognition Workshop
, 1997
"... ABBOT is a hybrid connectionist-HMM large vocabulary continuous speech recognition system developed at the Cambridge University Engineering Department. This uses a recurrent neural network acoustic model to map acoustic features into posterior phone probabilities. These posterior probabilities are t ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
ABBOT is a hybrid connectionist-HMM large vocabulary continuous speech recognition system developed at the Cambridge University Engineering Department. This uses a recurrent neural network acoustic model to map acoustic features into posterior phone probabilities. These posterior probabilities are then converted to scaled likelihoods and used as observation likelihoods for phone HMMs [1, 2]. This paper describes the development of the CUCON system which participated in the 1996 ARPA Hub 4 Evaluations. The system is based on ABBOT. The Hub 4 Evaluation task involves the transcription of broadcast television and radio news programmes. This is an extremely demanding task for state-of-the-art speech recognition systems. Typical programmes include a wide variety of speaking styles and acoustic conditions. These range from read speech recorded in the studio to extemporaneous speech recorded over telephone channels. Results are presented for the system at various stages of development, as we...
The 1997 Abbot System For The Transcription Of Broadcast News
- IN PROCEEDINGS OF THE 1998 BROADCAST NEWS TRANSCRIPTION AND UNDERSTANDING WORKSHOP
, 1998
"... This paper describes the development of a connectionist-hidden Markov model (HMM) system for the 1997 DARPA Hub-4E CSR evaluations. We describe both system development and the enhancements designed to improve performance on broadcast news data. Both multilayer perceptron (MLP) and recurrent neural n ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
This paper describes the development of a connectionist-hidden Markov model (HMM) system for the 1997 DARPA Hub-4E CSR evaluations. We describe both system development and the enhancements designed to improve performance on broadcast news data. Both multilayer perceptron (MLP) and recurrent neural network acoustic models have been investigated. We assess the effect of using gender-dependent acoustic models, and the impact on performance of varying both the number of parameters and the amount of training data used for acoustic modelling. The use of contextdependent phone models is described, and the effect of the number of context classes is investigated. We also describe a method for incorporating syllable boundary information during search. Results are reported on the 1997 DARPA Hub-4E development test set. We then describe the CU-CON evaluation system and report results on the 1997 Hub-4E test set.
Boosting The Performance Of Connectionist Large Vocabulary Speech Recognition
, 1996
"... Hybrid connectionist-hidden Markov model large vocabulary speech recognition has, in recent years, been shown to be competitive with more traditional HMM systems [4]. Connectionist acoustic models generally use considerably less parameters than HMM's, allowing real-time operation without significant ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Hybrid connectionist-hidden Markov model large vocabulary speech recognition has, in recent years, been shown to be competitive with more traditional HMM systems [4]. Connectionist acoustic models generally use considerably less parameters than HMM's, allowing real-time operation without significant degradation of performance. However, the small number of parameters in connectionist acoustic models also poses a problem --- how do we make the best use of large amounts of training data? This paper proposes a solution to this problem in which a "smart" procedure makes selective use of training data to increase performance.
Phonetic Context-Dependency In a Hybrid ANN/HMM Speech Recognition System
, 1997
"... This report uses a bark scale, which has been replaced here with a mel-scale. CHAPTER 3. THE ABBOT SPEECH RECOGNITION SYSTEM 32 where, ¯ i = 1 ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
This report uses a bark scale, which has been replaced here with a mel-scale. CHAPTER 3. THE ABBOT SPEECH RECOGNITION SYSTEM 32 where, ¯ i = 1
The 1995 Abbot Hybrid Connectionist-HMM Large-Vocabulary Recognition System
"... Abbot is the hybrid connectionist-hidden Markov model large-vocabulary speech recognition system developed at Cambridge University. In this system, a recurrent network maps each acoustic vector to an estimate of the posterior probabilities of the phone classes. This paper describes the system which ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abbot is the hybrid connectionist-hidden Markov model large-vocabulary speech recognition system developed at Cambridge University. In this system, a recurrent network maps each acoustic vector to an estimate of the posterior probabilities of the phone classes. This paper describes the system which participated in the November 1995 ARPA H3 Multiple Unknown Microphones (MUM) evaluation of continuous speech recognition systems, under the guise of the CU-CON system. The emphasis of the paper is on the changes made to the 1994 Abbot system, specifically to accomodate the H3 task. This includes improved acoustic modelling using limited word-internal context-dependent models, training on the Wall Street Journal secondary channel database, the linear input network for speaker and environmental adaptation and the continued development of a realtime single-pass decoder well suited to the hybrid approach. Experimental results are reported for various test and development sets from the November 1...
Ensemble Methods for Connectionist Acoustic Modelling
- In Eurospeech
, 1997
"... In this paper we investigate a number of ensemble methods for improving the performance of connectionist acoustic models for large vocabulary continuous speech recognition. We discuss boosting, a data selection technique which results in an ensemble of models, and mixtures-ofexperts. These technique ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
In this paper we investigate a number of ensemble methods for improving the performance of connectionist acoustic models for large vocabulary continuous speech recognition. We discuss boosting, a data selection technique which results in an ensemble of models, and mixtures-ofexperts. These techniques have been applied to multilayer perceptron acoustic models used to build a hybrid connectionist-HMM speech recognition system. We present results on a number of ARPA benchmark tasks, and show that the ensemble methods lead to considerable improvements in recognition accuracy. 1. INTRODUCTION When developing a classification or prediction system it is common practice to train a number of different models, and to retain the model which exhibits the best performance on a cross-validation data set. However, reports in the statistics and neural network literature suggest that improved performance can be achieved by combining the estimates of all the available models [1, 2, 3, 4]. Systems that...
The 1994 Abbot Hybrid Connectionist-HMM Large-Vocabulary Recognition System
, 1995
"... ABBOT is the hybrid connectionist-hidden Markov model largevocabulary speech recognition system developed at Cambridge University. In this system, a recurrent network maps each acoustic vector to an estimate of the posterior probabilities of the phone classes. The maximum likelihood word string is t ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
ABBOT is the hybrid connectionist-hidden Markov model largevocabulary speech recognition system developed at Cambridge University. In this system, a recurrent network maps each acoustic vector to an estimate of the posterior probabilities of the phone classes. The maximum likelihood word string is then extracted using Markov models. As in traditional hidden Markov models, the Markov process is used to model the lexical and language model constraints. This paper describes the system which participated in the November 1994 ARPA evaluation of continuous speech recognition systems. The emphasis of the paper is on the differences between the 1993 and 1994 versions of the ABBOT system. This includes the utilization of a larger training corpus (SI284 versus SI84), the extension of the lexicon from 5,000 words to 65,000 words, the application of a trigram language model, and the development of a near-realtime single-pass decoder well suited for the hybrid approach. Experimental results are rep...
Speech recognition via phonetically-featured syllables
- Institute of Phonetics, University of the Saarland
, 2000
"... We describe recent work on two new automatic speech recognition systems. The first part of this paper describes the components of a system based on phonological features (which we call Espresso-P) in which the values of these features are estimated from the speech signal before being used as the bas ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We describe recent work on two new automatic speech recognition systems. The first part of this paper describes the components of a system based on phonological features (which we call Espresso-P) in which the values of these features are estimated from the speech signal before being used as the basis for recognition. In the second part of the paper, another system (which we call Espresso-A) is described in which articulatory parameters are used instead of phonological features and a linear dynamical system model is used to perform recognition from automatically estimated values of these articulatory parameters. 1. Phonological feature-based system: Espresso-P The first 5 sections of this paper report work on the components of a two stage recognition architecture based on phonological features rather than phones. While phonological features have been proposed before as the basis of a speech recognition system (see section 1.2 for a review), the use of features has been out of favour until recently because there had been little success in extracting them from speech waveforms, and a lack of suitable models with
Data Selection and Model Combination in Connectionist Speech Recognition
, 1997
"... nts of training data. Boosting is a method which makes selective use of training data, and produces an ensemble with each model trained on data drawn from a different distribution. Results on the optical character recognition task suggest that boosting can provide considerable gains in classificatio ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
nts of training data. Boosting is a method which makes selective use of training data, and produces an ensemble with each model trained on data drawn from a different distribution. Results on the optical character recognition task suggest that boosting can provide considerable gains in classification performance. The application of boosting to acoustic modelling has been investigated, and a modified boosting procedure developed. The boosting algorithms have been applied to multilayer perceptron acoustic models, and performance of the models assessed on a number of ARPA benchmark tasks. The results show that boosting consistently provides a 14--19% reduction in word error rate. The standard boosting techniques are not suitable for use with recurrent network acoustic models, and three new boosting algorithms have been developed for use with connectionist models with internal memory. These new boosting algorithms have also been evaluated on a number of ARPA benchmark tasks, and have been

