• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

The use of a linguistically motivated language model in conversational speech recognition (2004)

by W Wang, A Stolcke, M P Harper
Venue:In: Proc. ICASSP’04
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 13
Next 10 →

Structured speech modeling

by Li Deng, Dong Yu, Alex Acero - IEEE Transactions on Audio, Speech and Language Processing (Special Issue on Rich Transcription , 2006
"... Abstract—Modeling dynamic structure of speech is a novel paradigm in speech recognition research within the generative modeling framework, and it offers a potential to overcome limitations of the current hidden Markov modeling approach. Analogous to structured language models where syntactic structu ..."
Abstract - Cited by 19 (11 self) - Add to MetaCart
Abstract—Modeling dynamic structure of speech is a novel paradigm in speech recognition research within the generative modeling framework, and it offers a potential to overcome limitations of the current hidden Markov modeling approach. Analogous to structured language models where syntactic structure is exploited to represent long-distance relationships among words [5], the structured speech model described in this paper makes use of the dynamic structure in the hidden vocal tract resonance space to characterize long-span contextual influence among phonetic units. A general overview is provided first on hierarchically classified types of dynamic speech models in the literature. A detailed account is then given for a specific model type called the hidden trajectory model, and we describe detailed steps of model construction and the parameter estimation algorithms. We show how the use of resonance target parameters and their temporal filtering enables joint modeling of long-span coarticulation and phonetic reduction effects. Experiments on phonetic recognition evaluation demonstrate superior recognizer performance over a modern hidden Markov model-based system. Error analysis shows that the greatest performance gain occurs within the sonorant speech class. Index Terms—Hidden dynamics, hidden trajectory, long span modeling, maximum-likelihood, nonlinear prediction, parameter learning, structured modeling, vocal tract resonance. I.

LANDMARK-BASED SPEECH RECOGNITION: REPORT OF THE 2004 Johns Hopkins Summer Workshop

by Mark Hasegawa-Johnson ,James Baker, Steven Greenberg, Katrin Kirchhoff, Jennifer Muller, Kemal Sönmez, Sarah Borys, Ken Chen, Amit Juneja, Karen Livescu, Srividya Mohan, Emily Coogan, Tianyu Wang , 2005
"... ..."
Abstract - Cited by 14 (1 self) - Add to MetaCart
Abstract not found

Discriminative syntactic language modeling for speech recognition

by Michael Collins, Brian Roark, Murat Saraclar - In Proc. of the 43nd Annual Meeting of the Association for Computational Linguistics (ACL-05 , 2005
"... We describe a method for discriminative training of a language model that makes use of syntactic features. We follow a reranking approach, where a baseline recogniser is used to produce 1000-best output for each acoustic input, and a second “reranking ” model is then used to choose an utterance from ..."
Abstract - Cited by 13 (0 self) - Add to MetaCart
We describe a method for discriminative training of a language model that makes use of syntactic features. We follow a reranking approach, where a baseline recogniser is used to produce 1000-best output for each acoustic input, and a second “reranking ” model is then used to choose an utterance from these 1000-best lists. The reranking model makes use of syntactic features together with a parameter estimation method that is based on the perceptron algorithm. We describe experiments on the Switchboard speech recognition task. The syntactic features provide an additional 0.3 % reduction in test–set error rate beyond the model of (Roark et al., 2004a; Roark et al., 2004b) (significant at p < 0.001), which makes use of a discriminatively trained n-gram model, giving a total reduction of 1.2 % over the baseline Switchboard system. 1

Structural Event Detection for Rich Transcription of Speech

by Yang Liu , 2004
"... xviii 1 ..."
Abstract - Cited by 12 (5 self) - Add to MetaCart
Abstract not found

Recent innovations in speech-to-text transcription at sri-icsi-uw

by Andreas Stolcke, Senior Member, Barry Chen, Horacio Franco, Venkata Ramana Rao, Martin Graciarena, Mei-yuh Hwang, Katrin Kirchhoff, Nelson Morgan, Xin Lei, Tim Ng, Mari Ostendorf - IEEE Transactions on Audio, Speech & Language Processing , 2006
"... Abstract — We summarize recent progress in automatic speechto-text ..."
Abstract - Cited by 10 (2 self) - Add to MetaCart
Abstract — We summarize recent progress in automatic speechto-text

Shrinking exponential language models

by Stanley F. Chen - In Proc. of HLT-NAACL , 2009
"... In (Chen, 2009), we show that for a variety of language models belonging to the exponential family, the test set cross-entropy of a model can be accurately predicted from its training set cross-entropy and its parameter values. In this work, we show how this relationship can be used to motivate two ..."
Abstract - Cited by 8 (2 self) - Add to MetaCart
In (Chen, 2009), we show that for a variety of language models belonging to the exponential family, the test set cross-entropy of a model can be accurately predicted from its training set cross-entropy and its parameter values. In this work, we show how this relationship can be used to motivate two heuristics for “shrinking ” the size of a language model to improve its performance. We use the first heuristic to develop a novel class-based language model that outperforms a baseline word trigram model by 28 % in perplexity and 1.9% absolute in speech recognition word-error rate on Wall Street Journal data. We use the second heuristic to motivate a regularized version of minimum discrimination information models and show that this method outperforms other techniques for domain adaptation. 1

Incorporating tandem/HATs MLP features into SRI’s conversational speech recognition system

by Qifeng Zhu, Andreas Stolcke, Barry Y. Chen, Nelson Morgan - in Proc. DARPA RT Workshop , 2004
"... We describe the development of a speech recognition system for conversational telephone speech (CTS) that incorporates acoustic features estimated by multilayer perceptrons (MLPs). The acoustic features are based on frame-level phone posterior probabilities, obtained by merging two different MLP est ..."
Abstract - Cited by 5 (1 self) - Add to MetaCart
We describe the development of a speech recognition system for conversational telephone speech (CTS) that incorporates acoustic features estimated by multilayer perceptrons (MLPs). The acoustic features are based on frame-level phone posterior probabilities, obtained by merging two different MLP estimators, one based on PLP-Tandem features, the other based on hidden activation TRAPs (HATs) features. These features had previously been shown to give significant accuracy improvements for CTS recognition when used with modest amounts of training data and relatively simple recognition architectures. This paper focuses on the challenges arising when incorporating these nonstandard features into a fullscale speech-to-text (STT) system, as used by SRI in the Fall 2004 DARPA STT evaluations. First, we developed a series of timesaving techniques for training feature MLPs on 1500 hours of speech. Second, we investigated which components of a multipass, multi-front-end recognition system are most profitably augmented with MLP features for best overall performance. The final system obtained achieved a 2 % absolute (10 % relative) WER reduction over a comparable baseline system that did not include Tandem/HATs MLP features. 1.

Performance Prediction for Exponential Language Models

by Stanley F. Chen
"... We investigate the task of performance prediction for language models belonging to the exponential family. First, we attempt to empirically discover a formula for predicting test set cross-entropy for n-gram language models. We build models over varying domains, data set sizes, and n-gram orders, an ..."
Abstract - Cited by 5 (3 self) - Add to MetaCart
We investigate the task of performance prediction for language models belonging to the exponential family. First, we attempt to empirically discover a formula for predicting test set cross-entropy for n-gram language models. We build models over varying domains, data set sizes, and n-gram orders, and perform linear regression to see whether we can model test set performance as a simple function of training set performance and various model statistics. Remarkably, we find a simple relationship that predicts test set performance with a correlation of 0.9997. We analyze why this relationship holds and show that it holds for other exponential language models as well, including class-based models and minimum discrimination information models. Finally, we discuss how this relationship can be applied to improve language model performance. 1

Lexical Syntax for Statistical Machine Translation

by Hany Hassan , 2009
"... ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract not found

Index Terms — Natural languages, Language Modeling,

by Hany Hassan, Andy Way
"... Syntactically-enriched language models (parsers) constitute a promising component in applications such as machine translation and speech-recognition. To maintain a useful level of accuracy, existing parsers are non-incremental and must span a combinatorially growing space of possible structures as e ..."
Abstract - Add to MetaCart
Syntactically-enriched language models (parsers) constitute a promising component in applications such as machine translation and speech-recognition. To maintain a useful level of accuracy, existing parsers are non-incremental and must span a combinatorially growing space of possible structures as every input word is processed. This prohibits their incorporation into standard linear-time decoders. In this paper, we present an incremental, linear-time dependency parser based on Combinatory Categorial Grammar (CCG) and classification techniques. We devise a deterministic transform of CCGbank canonical derivations into incremental ones, and train our parser on this data. We discover that a cascaded, incremental version provides an appealing balance between efficiency and accuracy.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University