• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Decoder Technology for Connectionist Large Vocabulary Speech Recognition (1995)

by S Renals, M M Hochberg
Add To MetaCart

Tools

Sorted by:
Results 11 - 19 of 19

Ensemble Methods for Connectionist Acoustic Modelling

by G. D. Cook, S.R. Waterhouse, A. J. Robinson - In Eurospeech , 1997
"... In this paper we investigate a number of ensemble methods for improving the performance of connectionist acoustic models for large vocabulary continuous speech recognition. We discuss boosting, a data selection technique which results in an ensemble of models, and mixtures-ofexperts. These technique ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
In this paper we investigate a number of ensemble methods for improving the performance of connectionist acoustic models for large vocabulary continuous speech recognition. We discuss boosting, a data selection technique which results in an ensemble of models, and mixtures-ofexperts. These techniques have been applied to multilayer perceptron acoustic models used to build a hybrid connectionist-HMM speech recognition system. We present results on a number of ARPA benchmark tasks, and show that the ensemble methods lead to considerable improvements in recognition accuracy. 1. INTRODUCTION When developing a classification or prediction system it is common practice to train a number of different models, and to retain the model which exhibits the best performance on a cross-validation data set. However, reports in the statistics and neural network literature suggest that improved performance can be achieved by combining the estimates of all the available models [1, 2, 3, 4]. Systems that...

Language Modeling Based on Neural Clustering of Words

by Vesa Siivola , 2000
"... This document describes a neural method for clustering words and its use in language modeling for speech recognizers. The method is based on clustering the words which appear on similar local context and estimating the parameters needed for language modeling based on these clusters. The language mod ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
This document describes a neural method for clustering words and its use in language modeling for speech recognizers. The method is based on clustering the words which appear on similar local context and estimating the parameters needed for language modeling based on these clusters. The language model used is similar to the traditional n-grams. Contents 1 Introduction 3 1.1 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Theory 4 2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Clustering the Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3 Language Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3.2 N-grams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.4 Word Cluster...

Data Selection and Model Combination in Connectionist Speech Recognition

by Gary David Cook , 1997
"... nts of training data. Boosting is a method which makes selective use of training data, and produces an ensemble with each model trained on data drawn from a different distribution. Results on the optical character recognition task suggest that boosting can provide considerable gains in classificatio ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
nts of training data. Boosting is a method which makes selective use of training data, and produces an ensemble with each model trained on data drawn from a different distribution. Results on the optical character recognition task suggest that boosting can provide considerable gains in classification performance. The application of boosting to acoustic modelling has been investigated, and a modified boosting procedure developed. The boosting algorithms have been applied to multilayer perceptron acoustic models, and performance of the models assessed on a number of ARPA benchmark tasks. The results show that boosting consistently provides a 14--19% reduction in word error rate. The standard boosting techniques are not suitable for use with recurrent network acoustic models, and three new boosting algorithms have been developed for use with connectionist models with internal memory. These new boosting algorithms have also been evaluated on a number of ARPA benchmark tasks, and have been

Transcribing Broadcast News With The 1997 Abbot System

by Gary Cook, Tony Robinson - In ICASSP 98. Proceedings of the International Conference on Acoustics, Speech, and Signal Processing , 1998
"... Recent DARPA CSR evaluations have focused on the transcription of broadcast news from both television and radio programmes [17]. This is a challenging task because the data includes a variety of speaking styles and channel conditions. This paper describes the development of a connectionist-hidden Ma ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
Recent DARPA CSR evaluations have focused on the transcription of broadcast news from both television and radio programmes [17]. This is a challenging task because the data includes a variety of speaking styles and channel conditions. This paper describes the development of a connectionist-hidden Markov model (HMM) system, and the enhancements designed to improve performance on broadcast news data. Both multilayer perceptron (MLP) and recurrent neural network acoustic models have been investigated. We asses the effect of using gender-dependent acoustic models, and the impact on performance of varying both the number of parameters and the amount of training data used for acoustic modelling. The use of context-dependent phone models is described, and the effect of the number of context classes is investigated. We also describe a method for incorporating syllable boundary information during search. Results are reported on the 1997 DARPA Hub-4 development test set. 1. INTRODUCTION Televis...

Nozomi - A Fast, Memory-Efficient Stack Decoder For Lvcsr

by Mike Schuster - in 5th International Conference on Spoken Language Processsing (ICSLP , 1996
"... This paper describes some of the implementation details of the "Nozomi" 1 stack decoder for LVCSR. The decoder was tested on a Japanese Newspaper Dictation Task using a 5000 word vocabulary. Using continuous density acoustic models with 2000 and 3000 states trained on the JNAS/ASJ corpora and a 3-gr ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
This paper describes some of the implementation details of the "Nozomi" 1 stack decoder for LVCSR. The decoder was tested on a Japanese Newspaper Dictation Task using a 5000 word vocabulary. Using continuous density acoustic models with 2000 and 3000 states trained on the JNAS/ASJ corpora and a 3-gram LM trained on the RWC text corpus, both models provided by the IPA group [7], it was possible to reach more than 95% word accuracy on the standard test set. With computationally cheap acoustic models we could achieve around 89% accuracy in nearly realtime on a 300 Mhz Pentium II. Using a disk-based LM the memory usage could be optimized to 4 MB in total. 1. INTRODUCTION LVCSR is currently limited to workstations and fast highend laptops with a lot of memory. To make LVCSR work on PDAs, cellular phones, user-interfaces, wrist watches etc., it is necessary find time- and memory-efficient algorithms. The goal for implementation of any search engine must be to minimize time and memory requ...

Divide and Conquer: Pattern Recognition using Mixtures of Experts

by Steve Waterhouse, Steve Waterhouse , 1997
"... speech recognition task. The mixture of experts is shown to be a superior method for speaker adaptation of connectionist models to new conditions. In addition, the significant improvement of the performance of an ensemble of classifiers via the mixture framework is demonstrated. In addition to these ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
speech recognition task. The mixture of experts is shown to be a superior method for speaker adaptation of connectionist models to new conditions. In addition, the significant improvement of the performance of an ensemble of classifiers via the mixture framework is demonstrated. In addition to these applications, a number of theoretical extensions of the mixture of experts have been made in this thesis. The link between hierarchical mixtures of experts (HME) and other tree based models is described and used to motivate a new training algorithm for the HME, known as tree growing. Tree growing is a constructive algorithm which results in faster training and a more efficient use of parameters than standard training methods. The second extension described is path pruning which is a fast training and evaluation algorithm for deep hierarchies in which paths through the tree which have low probability are ignored. A stabilising method for the algorithm based on weight decay regularisation is

!V$N$>$_!W - A Fast, Memory Efficient One-Pass Stack Decoder

by Mike Schuster , 1996
"... This paper describes features and implementation details of the $N$>$_ decoder, a fast, memory efficient one-pass stack decoder designed for large vocabulary speech recognition with dictionaries 65536 words. The stack decoder design made it possible to use arbitrary backoff N-gram language models i ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
This paper describes features and implementation details of the $N$>$_ decoder, a fast, memory efficient one-pass stack decoder designed for large vocabulary speech recognition with dictionaries 65536 words. The stack decoder design made it possible to use arbitrary backoff N-gram language models in the first pass. A new on-demand N-gram LM-lookahead for the tree lexicon is introduced. Decoding time without search errors for a 6200 word test task using a trigram LM is about 0.1 x realtime on a 300 MHz Pentium II, with a total memory use of about 4 MB. 1. INTRODUCTION A major part of any speech recognition system is the search engine combining the available information from the acoustic models, the language model and possibly other information, to search for the most probable word sequence. A search engine useful for large vocabulary speech recognition like dictation or broadcast news transcription has to fulfill a number of requirements: ffl large vocabulary (also ?65536 words) ffl...

Neural Networks for Speech Processing

by Mike Schuster, John Wiley , 1998
"... this article. Currently (1998), successful use of NNs for speech processing is mainly limited to ..."
Abstract - Add to MetaCart
this article. Currently (1998), successful use of NNs for speech processing is mainly limited to

A Study of the Use and Evaluation of Confidence . . .

by Gethin Williams , 1998
"... Confidence measures have been found to be useful for a number tasks within the field of Automatic Speech Recognition (ASR). For example, the use of confidence measures has been reported in the utterance verification, keyword spotting and Out-of-Vocabulary (OOV) word spotting literature. In this repo ..."
Abstract - Add to MetaCart
Confidence measures have been found to be useful for a number tasks within the field of Automatic Speech Recognition (ASR). For example, the use of confidence measures has been reported in the utterance verification, keyword spotting and Out-of-Vocabulary (OOV) word spotting literature. In this report, it is shown that so called 'hybrid Artificial Neural Network/Hidden Markov Model' (HMM/ANN) systems are well suited to the task of generating confidence measures, due to their ability to provide local phone class posterior probability estimates which may be used to generate confidence measures in a computationally efficient manner. A number of evaluation metrics are also described and the performance of five confidence measures derived from the ABBOT hybrid HMM/ANN system for the tasks of utterance verification and OOV word spotting are evaluated using these metrics. Besides the tasks described above, confidence measures may also be used for tasks such as filtering the acoustics for a nu...
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University