• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Blame assignment for errors made by large vocabulary speech recognizers (1997)

by L Chase
Venue:in proceedings Eurospeech ’97
Add To MetaCart

Tools

Sorted by:
Results 1 - 8 of 8

Analyzing and Predicting Language Model Improvements

by R. Iyer, M. Ostendorf, M. Meteer , 1997
"... this paper, we study alternatives to perplexity for predicting language model performance, including other global features as well as a new approach that predicts, with a high correlation (0.96), performance differences associated with localized changes in language models given a recognition system. ..."
Abstract - Cited by 10 (1 self) - Add to MetaCart
this paper, we study alternatives to perplexity for predicting language model performance, including other global features as well as a new approach that predicts, with a high correlation (0.96), performance differences associated with localized changes in language models given a recognition system. Experiments focus on the problem of augmenting in-domain Switchboard text with out-of-domain text from Wall Street Journal and Broadcast News that differ in both style and content from the in-domain data.

Automated speech and audio analysis for semantic access to multimedia

by Franciska De Jong, Marijn Huijbregts - Proceedings of the First International Conference on Semantic and Digital Media Technologies, SAMT 2006 , 2006
"... Abstract. The deployment and integration of audio processing tools can enhance the semantic annotation of multimedia content, and as a consequence, improve the effectiveness of conceptual access tools. This paper overviews the various ways in which automatic speech and audio analysis can contribute ..."
Abstract - Cited by 6 (3 self) - Add to MetaCart
Abstract. The deployment and integration of audio processing tools can enhance the semantic annotation of multimedia content, and as a consequence, improve the effectiveness of conceptual access tools. This paper overviews the various ways in which automatic speech and audio analysis can contribute to increased granularity of automatically extracted metadata. A number of techniques will be presented, including the alignment of speech and text resources, large vocabulary speech recognition, key word spotting and speaker classification. The applicability of techniques will be discussed from a media crossing perspective. The added value of the techniques and their potential contribution to the content value chain will be illustrated by the description of two (complementary) demonstrators for browsing broadcast news archives. 1

Adaptation of Statistical Language Models for Automatic Speech Recognition

by Philip R. Clarkson , 1999
"... Statistical language models encode linguistic information in such a way as to be useful to systems which process human language. Such systems include those for optical character recognition and machine translation. Currently, however, the most common application of language modelling is in automatic ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
Statistical language models encode linguistic information in such a way as to be useful to systems which process human language. Such systems include those for optical character recognition and machine translation. Currently, however, the most common application of language modelling is in automatic speech recognition, and it is this that forms the focus of this thesis. Most current speech recognition systems are dedicated to one specific task (for example, the recognition of broadcast news), and thus use a language model which has been trained on text which is appropriate to that task. If, however, one wants to perform recognition on more general language, then creating an appropriate language model is far from straightforward. A taskspecific language model will often perform very badly on language from a different domain, whereas a model trained on text from many diverse styles of language might perform better in general, but will not be especially well suited to any particular domai...

Dependencies between Student State and Speech Recognition Problems in Spoken Tutoring Dialogues

by Mihai Rotaru , 2006
"... Speech recognition problems are a reality in current spoken dialogue systems. In order to better understand these phenomena, we study dependencies between speech recognition problems and several higher level dialogue factors that define our notion of student state: frustration/anger, certainty and c ..."
Abstract - Cited by 3 (2 self) - Add to MetaCart
Speech recognition problems are a reality in current spoken dialogue systems. In order to better understand these phenomena, we study dependencies between speech recognition problems and several higher level dialogue factors that define our notion of student state: frustration/anger, certainty and correctness. We apply Chi Square (χ2) analysis to a corpus of speech-based computer tutoring dialogues to discover these dependencies both within and across turns. Significant dependencies are combined to produce interesting insights regarding speech recognition problems and to propose new strategies for handling these problems. We also find that tutoring, as a new domain for speech applications, exhibits interesting tradeoffs and new factors to consider for spoken dialogue design. 1

Measuring the Quality of Pronunciation Dictionaries

by Matthias Wolff, Matthias Eichner, Rüdiger Hoffmann - Proc. PMLA
"... In this paper we investigate measures for the evaluation of pronunciation dictionaries that can be used independently of the type of lexicon, the language, a specific recognizer and how the dictionary was generated. We will describe statistical measures, measures based on information theory and perf ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
In this paper we investigate measures for the evaluation of pronunciation dictionaries that can be used independently of the type of lexicon, the language, a specific recognizer and how the dictionary was generated. We will describe statistical measures, measures based on information theory and performance measures and give examples how these measures can be practically applied in supervision of data-driven dictionary training, selection of pronunciation variants and evaluation of the consistency of different dictionaries. Although the introduced measures are independent of the type of dictionary, we only report results obtained with a datadriven dictionary generation and do not address measures specific to rule-based approaches. 1.

In memory of my brother,

by Ingrid Ahmer, Thor Christopher Ahmer , 1955
"... This thesis addresses the application of automatic speech recognition to the task of offline closed-captioning of television programs, and describes the collection of corpora to support such research and an exploration of issues to be addressed. The use of automatic speech recognition (ASR) for tran ..."
Abstract - Add to MetaCart
This thesis addresses the application of automatic speech recognition to the task of offline closed-captioning of television programs, and describes the collection of corpora to support such research and an exploration of issues to be addressed. The use of automatic speech recognition (ASR) for transcription of broadcast speech and as an aid to captioning is reviewed. As background to the task, the methodology for large vocabulary continuous speech recognition (LVCSR) is presented, with particular attention given to the issues of large vocabulary language modelling and consideration of the acoustic complexity arising in broadcast material. A speech corpus of segmented and transcribed speech utterances for ten program episodes was developed for a typical genre of television programming (travelogues) for which offline closed-captions are applied. The development of this corpus demonstrates the feasibility of using existing closed-caption sources for generating labelled acoustic data suitable for speech recognition research. The speech corpus exhibits far greater acoustic complexity and much lower signal to noise ratios than occurs in broadcast news data (which has been systematically evaluated in ASR research). Noise-tolerant speech recognisers were developed and effectively

Modeling Pronunciation Variation . . .

by Gopala Krishna Anumanchipalli , 2008
"... ..."
Abstract - Add to MetaCart
Abstract not found

Fast N-Gram Language Model Look-Ahead for Decoders With Static Pronunciation Prefix Trees

by Marijn Huijbregts, Franciska De Jong
"... Decoders that make use of token-passing restrict their search space by various types of token pruning. With use of the Language Model Look-Ahead (LMLA) technique it is possible to increase the number of tokens that can be pruned without loss of decoding precision. Unfortunately, for token passing de ..."
Abstract - Add to MetaCart
Decoders that make use of token-passing restrict their search space by various types of token pruning. With use of the Language Model Look-Ahead (LMLA) technique it is possible to increase the number of tokens that can be pruned without loss of decoding precision. Unfortunately, for token passing decoders that use single static pronunciation prefix trees, full n-gram LMLA increases the needed number of language model probability calculations considerably. In this paper a method for applying full n-gram LMLA in a decoder with a single static pronunciation tree is introduced. The experiments show that this method improves the speed of the decoder without an increase of search errors.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University