Results 1 -
7 of
7
Support vector machines for speech recognition
- Proceedings of the International Conference on Spoken Language Processing
, 1998
"... Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative informati ..."
Abstract
-
Cited by 47 (2 self)
- Add to MetaCart
Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative information and are prone to overfitting and over-parameterization. Recent work in machine learning has focused on models, such as the support vector machine (SVM), that automatically control generalization and parameterization as part of the overall optimization process. In this paper, we show that SVMs provide a significant improvement in performance on a static pattern classification task based on the Deterding vowel data. We also describe an application of SVMs to large vocabulary speech recognition, and demonstrate an improvement in error rate on a continuous alphadigit task (OGI Aphadigits) and a large vocabulary conversational speech task (Switchboard). Issues related to the development and optimization of an SVM/HMM hybrid system are discussed.
Progressive Bayes: A New Framework for Nonlinear State Estimation
, 2003
"... This paper is concerned with recursively estimating the internal state of a nonlinear dynamic system by processing noisy measurements and the known system input. In the case of continuous states, an exact analytic representation of the probability density characterizing the estimate is generally too ..."
Abstract
-
Cited by 26 (20 self)
- Add to MetaCart
This paper is concerned with recursively estimating the internal state of a nonlinear dynamic system by processing noisy measurements and the known system input. In the case of continuous states, an exact analytic representation of the probability density characterizing the estimate is generally too complex for recursive estimation or even impossible to obtain. Hence, it is replaced by a convenient type of approximate density characterized by a finite set of parameters. Of course, parameters are desired that systematically minimize a given measure of deviation between the (often unknown) exact density and its approximation, which in general leads to a complicated optimization problem. Here, a new framework for state estimation based on progressive processing is proposed. Rather than trying to solve the original problem, it is exactly converted into a corresponding system of explicit ordinary first–order differential equations. Solving this system over a finite “time” interval yields the desired optimal density parameters.
The development of SRI’s 1997 Broadcast News transcription system
- In Proceedings DARPA BroadcastNews Transcription and Understanding Workshop
"... This paper describes SRI’s 1997 broadcastnews transcription system used for the 1997 DARPA H4 evaluations. Our system had several novel components. These include automatic segmentation of entire broadcast shows, word-internal and crossword acoustic models robustly estimated with a new Gaussian Mergi ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
This paper describes SRI’s 1997 broadcastnews transcription system used for the 1997 DARPA H4 evaluations. Our system had several novel components. These include automatic segmentation of entire broadcast shows, word-internal and crossword acoustic models robustly estimated with a new Gaussian Merging-Splitting (GMS) algorithm, the use of trigram language models (LMs) in lattices instead of for rescoring N-best lists, and an LM pruning algorithm that allows efficient representation of high-order (like 4- or 5-gram) LMs. We briefly describe these features and give comparative experimental results. We achieved a 18.7 % relative improvement in performance on our 1996 H4 partitioned evaluation (PE) development test set as compared to our 1996 H4 PE evaluation system. 1.
Dynaspeak: SRI’s scalable speech recognizer for embedded and mobile systems
- in Proceedsings of HLT
, 2002
"... We introduce SRI’s new speech recognition engine, DynaSpeak TM, which is characterized by its scalability and flexibility, high recognition accuracy, memory and speed efficiency, adaptation capability, efficient grammar optimization, support for natural language parsing functionality, and operation ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
We introduce SRI’s new speech recognition engine, DynaSpeak TM, which is characterized by its scalability and flexibility, high recognition accuracy, memory and speed efficiency, adaptation capability, efficient grammar optimization, support for natural language parsing functionality, and operation based on integer arithmetic. These features are designed to address the needs of the fast-developing and changing domain of embedded and mobile computing platforms.
Learning Structured Models for Phone Recognition
"... We present a maximally streamlined approach to learning HMM-based acoustic models for automatic speech recognition. In our approach, an initial monophone HMM is iteratively refined using a split-merge EM procedure which makes no assumptions about subphone structure or context-dependent structure, an ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
We present a maximally streamlined approach to learning HMM-based acoustic models for automatic speech recognition. In our approach, an initial monophone HMM is iteratively refined using a split-merge EM procedure which makes no assumptions about subphone structure or context-dependent structure, and which uses only a single Gaussian per HMM state. Despite the much simplified training process, our acoustic model achieves state-of-the-art results on phone classification (where it outperforms almost all other methods) and competitive performance on phone recognition (where it outperforms standard CD triphone / subphone / GMM approaches). We also present an analysis of what is and is not learned by our system. 1
Improved Modeling and Efficiency for Automatic Transcription of Broadcast News
, 2000
"... Over the last few years, the DARPA-sponsored Hub4 continuous speech recognition evaluations have pushed speech recognition technology for the very interesting and difficult task of automatically transcribing broadcast news. In this paper, we report on our research and progress on this problem. We fo ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Over the last few years, the DARPA-sponsored Hub4 continuous speech recognition evaluations have pushed speech recognition technology for the very interesting and difficult task of automatically transcribing broadcast news. In this paper, we report on our research and progress on this problem. We focus on individual techniques we developed, rather than on descriptions of our evaluation systems. We provide comparative experimental results showing the improvements obtained with the novel approaches we developed. 1 Introduction In recent years there has been increasing interest in developing large-vocabulary continuous speech recognition (LVCSR) systems for speech found in real sources. Broadcast news, in particular, has been the testbed for the DARPA-sponsored Hub4 continuous speech recognition (CSR) evaluations over the last few years, and represents a significant challenge to speech recognition researchers. Many interesting problems are associated with the automatic recognition of b...
DynaSpeak: SRI's Scalable Speech Recognizer for
- in Proceedsings of HLT
, 2002
"... We introduce SRI's new speech recognition engine, , which is characterized by its scalability and flexibility, high recognition accuracy, memory and speed efficiency, adaptation capability, efficient grammar optimization, support for natural language parsing functionality, and operation based on i ..."
Abstract
- Add to MetaCart
We introduce SRI's new speech recognition engine, , which is characterized by its scalability and flexibility, high recognition accuracy, memory and speed efficiency, adaptation capability, efficient grammar optimization, support for natural language parsing functionality, and operation based on integer arithmetic. These features are designed to address the needs of the fast-developing and changing domain of embedded and mobile computing platforms.

