• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Learning discriminative temporal patterns in speech: Development of novel TRAPS-like classifiers (2003)

by B Chen, S Chang, S Sivadas
Venue:in Proceedings of Eurospeech
Add To MetaCart

Tools

Sorted by:
Results 1 - 5 of 5

Using MLP features in SRI’s conversational speech recognition system

by Qifeng Zhu, Andreas Stolcke, Barry Y. Chen, Nelson Morgan - in Proc. Interspeech , 2005
"... We describe the development of a speech recognition system for conversational telephone speech (CTS) that incorporates acoustic features estimated by multilayer perceptrons (MLP). The acoustic features are based on frame-level phone posterior probabilities, obtained by merging two different MLP esti ..."
Abstract - Cited by 21 (4 self) - Add to MetaCart
We describe the development of a speech recognition system for conversational telephone speech (CTS) that incorporates acoustic features estimated by multilayer perceptrons (MLP). The acoustic features are based on frame-level phone posterior probabilities, obtained by merging two different MLP estimators, one based on PLP-Tandem features, the other based on hidden activation TRAPs (HATs) features. This paper focuses on the challenges arising when incorporating these nonstandard features into a full-scale speech-to-text (STT) system, as used by SRI in the Fall 2004 DARPA STT evaluations. First, we developed a series of time-saving techniques for training feature MLPs on 1800 hours of speech. Second, we investigated which components of a multipass, multi-front-end recognition system are most profitably augmented with MLP features for best overall performance. The final system obtained achieved a 2 % absolute (10 % relative) WER reduction over a comparable baseline system that did not include Tandem/HATs MLP features. 1.

Incorporating tandem/HATs MLP features into SRI’s conversational speech recognition system

by Qifeng Zhu, Andreas Stolcke, Barry Y. Chen, Nelson Morgan - in Proc. DARPA RT Workshop , 2004
"... We describe the development of a speech recognition system for conversational telephone speech (CTS) that incorporates acoustic features estimated by multilayer perceptrons (MLPs). The acoustic features are based on frame-level phone posterior probabilities, obtained by merging two different MLP est ..."
Abstract - Cited by 5 (1 self) - Add to MetaCart
We describe the development of a speech recognition system for conversational telephone speech (CTS) that incorporates acoustic features estimated by multilayer perceptrons (MLPs). The acoustic features are based on frame-level phone posterior probabilities, obtained by merging two different MLP estimators, one based on PLP-Tandem features, the other based on hidden activation TRAPs (HATs) features. These features had previously been shown to give significant accuracy improvements for CTS recognition when used with modest amounts of training data and relatively simple recognition architectures. This paper focuses on the challenges arising when incorporating these nonstandard features into a fullscale speech-to-text (STT) system, as used by SRI in the Fall 2004 DARPA STT evaluations. First, we developed a series of timesaving techniques for training feature MLPs on 1500 hours of speech. Second, we investigated which components of a multipass, multi-front-end recognition system are most profitably augmented with MLP features for best overall performance. The final system obtained achieved a 2 % absolute (10 % relative) WER reduction over a comparable baseline system that did not include Tandem/HATs MLP features. 1.

A Comparative Large Scale Study of MLP Features for Mandarin ASR

by Fabio Valente, Mathew Magimai Doss, Christian Plahl, Suman Ravuri, Wen Wang - INTERSPEECH 2010 , 2010
"... MLP based front-ends have shown significant complementary properties to conventional spectral features. As part of the DARPA GALE program, different MLP features were developed for Mandarin ASR. In this paper, all the proposed frontends are compared in systematic manner and we extensively investigat ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
MLP based front-ends have shown significant complementary properties to conventional spectral features. As part of the DARPA GALE program, different MLP features were developed for Mandarin ASR. In this paper, all the proposed frontends are compared in systematic manner and we extensively investigate the scalability of these features in terms of the amount of training data (from 100 hours to 1600 hours) and system complexity (maximum likelihood training, SAT, lattice level combination, and discriminative training). Results on 5 hours of evaluation data from the GALE project reveal that the MLP features consistently produce relative improvements in the range of 15 % − 23 % at the different steps of a multipass system when compared to the conventional short-term spectral based features like MFCC and PLP. The largest improvement is obtained using a hierarchical MLP approach.

INCORPORATING TANDEM/HATS MLP FEATURES INTO SRI’S CONVERSATIONAL SPEECH RECOGNITION SYSTEM

by unknown authors
"... We describe the development of a speech recognition system for conversational telephone speech (CTS) that incorporates acoustic features estimated by multilayer perceptrons (MLPs). The acoustic features are based on frame-level phone posterior probabilities, obtained by merging two different MLP est ..."
Abstract - Add to MetaCart
We describe the development of a speech recognition system for conversational telephone speech (CTS) that incorporates acoustic features estimated by multilayer perceptrons (MLPs). The acoustic features are based on frame-level phone posterior probabilities, obtained by merging two different MLP estimators, one based on PLP-Tandem features, the other based on hidden activation TRAPs (HATs) features. These features had previously been shown to give significant accuracy improvements for CTS recognition when used with modest amounts of training data and relatively simple recognition architectures. This paper focuses on the challenges arising when incorporating these nonstandard features into a fullscale speech-to-text (STT) system, as used by SRI in the Fall 2004 DARPA STT evaluations. First, we developed a series of timesaving techniques for training feature MLPs on 1500 hours of speech. Second, we investigated which components of a multipass, multi-front-end recognition system are most profitably augmented with MLP features for best overall performance. The final system obtained achieved a 2 % absolute (10 % relative) WER reduction over a comparable baseline system that did not include Tandem/HATs MLP features. 1.

Robust Speech Recognition based on Spectro-Temporal Features

by Studiengang Diplom-physik, Vorgelegt Von Bernd Meyer, Betreuender Gutachter, Prof Dr, Dr. Birger Kollmeier, Zweiter Gutachter, Prof Dr. -ing, Alfred Mertins
"... 1.1 Automatic Speech Recognition (ASR).................... 5 1.2 Robustness of ASR systems.......................... 5 1.3 Scope of this thesis............................... 6 ..."
Abstract - Add to MetaCart
1.1 Automatic Speech Recognition (ASR).................... 5 1.2 Robustness of ASR systems.......................... 5 1.3 Scope of this thesis............................... 6
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University