Results 1 - 10
of
33
A Probabilistic Framework For Segment-Based Speech Recognition
, 2003
"... Most current speech recognizers use an observatE9 space based on atS8VV al sequence of measur extn ct from fixed-lengt "frames" (e.g., Mel-cepst-ce Given ahypot9; ical word or sub-word sequence, te acoustO likelihood computp;VW always involves allobservat ion frames,t,;LI t, mapping beting individ ..."
Abstract
-
Cited by 108 (33 self)
- Add to MetaCart
Most current speech recognizers use an observatE9 space based on atS8VV al sequence of measur extn ct from fixed-lengt "frames" (e.g., Mel-cepst-ce Given ahypot9; ical word or sub-word sequence, te acoustO likelihood computp;VW always involves allobservat ion frames,t,;LI t, mapping beting individual frames andintV nal recognizerstr;E will depend on t;hypotEO; zed segmentme;LH There is anotLO tot of recognizer whoseobservat ion space isbetI r represente as anet ork, or graph, where each arc in t; graph correspondst a hypotL;) zed variable-lengt segment tm is represente by a fixed-dimensional "featO e". In suchfeatSE;)E sed recognizers, eachhypotO99 zed segmentme;L will correspondt a segment sequence, orpatH ttHSV tt overall segme ntme aph th; is associato wit a subset of all possible feat revectI s intV tVLI observatEV space. Int;E work we examine a maximum apostW iori decoding stcodin forfeat ure-based recognizers and develop a normalizat ioncrit9S on useful for a segme ntme; ed VitOLO or A # search. Experiment arereport ed for bot phoneto and word recognitco tcog .
Confidence Estimation for Machine Translation
- IN M. ROLLINS (ED.), MENTAL IMAGERY
, 2004
"... ..."
Modeling Out-Of-Vocabulary Words For Robust Speech Recognition
, 2000
"... This thesis concerns the problem of unknown or out-of-vocabulary (00V) words in continuous speech recognition. Most of today's state-of-the-art speech recognition systems can recognize only words that belong to some predefined finite word vocabulary. When encountering an OOV word, a speech recognize ..."
Abstract
-
Cited by 43 (5 self)
- Add to MetaCart
This thesis concerns the problem of unknown or out-of-vocabulary (00V) words in continuous speech recognition. Most of today's state-of-the-art speech recognition systems can recognize only words that belong to some predefined finite word vocabulary. When encountering an OOV word, a speech recognizer erroneously substitutes the OOV word with a similarly sounding word from its vocabulary. Furthermore, a recognition error due to an OOV word tends to spread errors into neighboring words; dramatically degrading overall recognition performance.
Speech and Sketching for Multimodal Design
- In Proceedings of the 9th International Conference on Intelligent User Interfaces
, 2004
"... While sketches are commonly and effectively used in the early stages of design, some information is far more easily conveyed verbally than by sketching. In response, we have combined sketching with speech, enabling a more natural form of communication. We studied the behavior of people sketching and ..."
Abstract
-
Cited by 21 (7 self)
- Add to MetaCart
While sketches are commonly and effectively used in the early stages of design, some information is far more easily conveyed verbally than by sketching. In response, we have combined sketching with speech, enabling a more natural form of communication. We studied the behavior of people sketching and speaking, and from this derived a set of rules for segmenting and aligning the signals from both modalities. Once the inputs are aligned, we use both modalities in interpretation. The result is a more natural interface to our system.
Learning Units for Domain-Independent Out-of-Vocabulary Word Modelling
, 2001
"... This paper describes our recent work on detecting and recognizing out-of-vocabulary (OOV) words for robust speech recognition and understanding. To allow for OOV recognition within a word-based recognizer, the in-vocabulary (IV) word network is augmented with an OOV word model so that OOV words are ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
This paper describes our recent work on detecting and recognizing out-of-vocabulary (OOV) words for robust speech recognition and understanding. To allow for OOV recognition within a word-based recognizer, the in-vocabulary (IV) word network is augmented with an OOV word model so that OOV words are considered simultaneously with IV words during recognition. We explore several configurations for the OOV model, the best of which utilizes a set of domain-independent, automatically derived, variable-length units. The units are created using an iterative bottom-up procedure where, at each iteration, the unit pairs with maximum mutual information are merged. When evaluating this method on a weather information domain, the false alarm rate of our baseline OOV model [1] is reduced by over 60%. For example, with an OOV detection rate of 70%, the OOV false alarm rate is reduced from 8.5% to 3.2%. At these settings the addition of the OOV model degrades the word error rate on IV data by only 0.3% absolute (3% relative). 1.
Evaluating the Effect of Predicting Oral Reading Miscues
- Proc. 8th European Conference on Speech Communication and Technology (Eurospeech 2003
, 2003
"... This paper extends and evaluates previously published methods for predicting likely miscues in children’s oral reading in a Reading Tutor that listens. The goal is to improve the speech recognizer’s ability to detect miscues but limit the number of “false alarms ” (correctly read words misclassified ..."
Abstract
-
Cited by 14 (7 self)
- Add to MetaCart
This paper extends and evaluates previously published methods for predicting likely miscues in children’s oral reading in a Reading Tutor that listens. The goal is to improve the speech recognizer’s ability to detect miscues but limit the number of “false alarms ” (correctly read words misclassified as incorrect). The “rote ” method listens for specific miscues from a training corpus. The “extrapolative ” method generalizes to predict other miscues on other words. We construct and evaluate a scheme that combines our rote and extrapolative models. This combined approach reduced false alarms by 0.52 % absolute (12% relative) while simultaneously improving miscue detection by 1.04 % absolute (4.2 % relative) over our existing miscue prediction scheme. 1.
Spoken conversational interaction for language learning
- In Proc. INSTIL/CALL
, 2004
"... This paper describes our efforts towards utilizing multilingual spoken dialogue systems as an aid to second language acquisition. We argue that it is important for language students to have the opportunity to practice communication in a non-threatening environment, something that a computer can natu ..."
Abstract
-
Cited by 12 (8 self)
- Add to MetaCart
This paper describes our efforts towards utilizing multilingual spoken dialogue systems as an aid to second language acquisition. We argue that it is important for language students to have the opportunity to practice communication in a non-threatening environment, something that a computer can naturally provide. We envision a three-stage interaction focused around a specific topic of a lesson plan. The first stage would familiarize the student with the vocabulary and syntax by presenting simulated dialogues at a Web page. The second stage would involve spoken dialogue interaction with the computer, either at a workstation or on the telephone. The third stage would provide feedback to the user on the quality of the utterances they recorded during the dialogue exchange. We have thus far concentrated on Mandarin and English as the two languages, where the system can be configured reversibly to support either learning English or Mandarin. We have begun to develop dialogue interaction capabilities for a number of domains centered around the scenario of a traveler to a foreign city. 1
Intelligent Barge-In in Conversational Systems
, 2000
"... In this paper we present novel solutions to problems related to barge-in in telephony-based conversational systems. In particular we address recovery from falsely detected barge-in events and a method for signaling to the user that barge-in is disallowed at a particular dialogue state. The mechanism ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
In this paper we present novel solutions to problems related to barge-in in telephony-based conversational systems. In particular we address recovery from falsely detected barge-in events and a method for signaling to the user that barge-in is disallowed at a particular dialogue state. The mechanisms and signals used to manage turn taking are similar to those in human-human conversation, which makes them easy to understand for users without explanation or prior training. 1. INTRODUCTION In telephony-based spoken language systems, it is desirable to let users interrupt system output at any time, in particular if the output is based on erroneous understanding or contain superfluous information. Thus, enabling barge-in, i.e., the ability for the user to start speaking before system output has ended, can significantly enhance the user experience. However, users' new freedom also poses new challenges. One challenge is sorting out true user barge-in from background noise and nonspeech soun...
Maximum Entropy Confidence Estimation For Speech Recognition
, 2007
"... For many automatic speech recognition (ASR) applications, it is useful to predict the likelihood that the recognized string contains an error. This paper explores two modifications of a classic design. First, it replaces the standard maximum likelihood classifier with a maximum entropy classifier. T ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
For many automatic speech recognition (ASR) applications, it is useful to predict the likelihood that the recognized string contains an error. This paper explores two modifications of a classic design. First, it replaces the standard maximum likelihood classifier with a maximum entropy classifier. The maximum entropy framework carries the dual advantages discriminative training and reasonable generalization. Second, it includes a number of alternative features. Our ASR system is heavily pruned, and often produces recognition lattices with only a single path. These alternate features are meant to serve as a surrogate for the typical features that can be computed from a rich lattice. We show that the maximum entropy classifier easily outperforms the standard baseline system, and the alternative features provide consistent gains for all of our test sets.
A self-transcribing speech corpus: collecting continuous speech with an online educational game
- the Speech and Language Technology in Education (SLaTE) Workshop
, 2009
"... We describe a novel approach to collecting orthographically transcribed continuous speech data through the use of an online educational game called Voice Scatter, in which players study flashcards by using speech to match terms with their definitions. We analyze a corpus of 30,938 utterances, totali ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
We describe a novel approach to collecting orthographically transcribed continuous speech data through the use of an online educational game called Voice Scatter, in which players study flashcards by using speech to match terms with their definitions. We analyze a corpus of 30,938 utterances, totaling 27.63 hours of speech, collected during the first 22 days that Voice Scatter was publicly available. Though each individual game covers only a small vocabulary, in aggregate speech recognition hypotheses in the corpus contain 21,758 distinct words. We show that Amazon Mechanical Turk can be used to orthographically transcribe utterances in the corpus quickly and cheaply, with near-expert accuracy. Moreover, we present a filtering technique that automatically identifies a sub-corpus of 39 % of the data for which recognition hypotheses can be considered human-quality transcripts. We demonstrate the usefulness of such self-transcribed data for acoustic model adaptation. 1.

