Results 1 -
5 of
5
Modeling the prosody of hidden events for improved word recognition
- in Proc. EUROSPEECH
, 1999
"... We investigate a new approach for using speech prosody as a knowledge source for speech recognition. The idea is to penalize word hypotheses that are inconsistent with prosodic features such as duration and pitch. To model the interaction between words and prosody we modify the language model to rep ..."
Abstract
-
Cited by 27 (4 self)
- Add to MetaCart
We investigate a new approach for using speech prosody as a knowledge source for speech recognition. The idea is to penalize word hypotheses that are inconsistent with prosodic features such as duration and pitch. To model the interaction between words and prosody we modify the language model to represent hidden events such as sentence boundaries and various forms of disfluency, and combine with it decision trees that predict such events from prosodic features. N-best rescoring experiments on the Switchboard corpus show a small but consistent reduction of word error as a result of this modeling. We conclude with a preliminary analysis of the types of errors that are corrected by the prosodically informed model. 1.
A Robust Loose Coupling for Speech Recognition and Natural Language Understanding
- IEEE, Bob O'Hara and Al
, 1995
"... The focus of this thesis proposal is to improve the ability of a computational system to understand spoken utterances in a dialogue with a human. Available computational methods for word recognition do not perform as well on spontaneous speech as we would hope. Even a state of the art recognizer ach ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
The focus of this thesis proposal is to improve the ability of a computational system to understand spoken utterances in a dialogue with a human. Available computational methods for word recognition do not perform as well on spontaneous speech as we would hope. Even a state of the art recognizer achieves slightly worse than 70% word accuracy on (nearly) spontaneous speech in a conversation about a specific problem. To address this problem, I will explore novel methods for post-processing the output of a speech recognizer in order to correct errors. I adopt statistical techniques for modeling the noisy channel from the speaker to the listener in order to correct some of the errors introduced there. The statistical model accounts for frequent errors such as simple word/word confusions and short phrasal problems (one-to-many word substitutionsand many-to-one word concatenations). To use the model, a search algorithm is required to find the most likely correction of a given word sequence ...
Speech Interface Exploiting Intentionally-Controlled Nonverbal Speech Information
, 2005
"... This paper describes our research on speech interfaces using nonverbal speech information. Although speech information consists of verbal and nonverbal information, most speechrecognition research has made use of only verbal information such as words and sentences. From among nonverbal information, ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper describes our research on speech interfaces using nonverbal speech information. Although speech information consists of verbal and nonverbal information, most speechrecognition research has made use of only verbal information such as words and sentences. From among nonverbal information, we have focused on hesitation (filled pause) and prosody (voice pitch) to create four speech-interface functions: Speech Completion, Speech Shift, Speech Starter, and Speech Spotter. Hesitation, for example, can be used as a trigger to complete an uttered fragment and pitch changing can be used to enter a word with it having different functions. By having users intentionally utter nonverbal information according to simple rules, we have achieved interfaces that can exploit the potential of speech in various forms. ACM Classification: H5.2 [Information interfaces and presentation]:
Word Class Driven Synthesis of Prosodic Annotations
"... Prosody is an important aspect of speech that current text to speech synthesis systems fail to mimic in a convincing or natural way#1, 2, 3, 4#. This paper describes researchona partial system for prosodic synthesis using easily derived low level syntactic information. ..."
Abstract
- Add to MetaCart
Prosody is an important aspect of speech that current text to speech synthesis systems fail to mimic in a convincing or natural way#1, 2, 3, 4#. This paper describes researchona partial system for prosodic synthesis using easily derived low level syntactic information.
Proceedings of the 8th European Conference on Speech Communication and Technology (Eurospeech 2003),
- Proc. of Eurospeech 2003
, 2003
"... This paper describes a speech-input interface function, called speech shift, that enables a user to specify a speech-input mode by simply changing (shifting) voice pitch. While current speech-input interfaces have used only verbal information, we aimed at building a more user-friendly speech interfa ..."
Abstract
- Add to MetaCart
This paper describes a speech-input interface function, called speech shift, that enables a user to specify a speech-input mode by simply changing (shifting) voice pitch. While current speech-input interfaces have used only verbal information, we aimed at building a more user-friendly speech interface by making use of nonverbal information, the voice pitch. By intentionally controlling the pitch, a user can enter the same word with it having different meanings (functions) without explicitly changing the speech-input mode. Our speech-shift function implemented on a voice-enabled word processor, for example, can distinguish an utterance with a high pitch from one with a normal (low) pitch, and regard the former as voice-command-mode input (such as file-menu and edit-menu commands) and the latter as regular dictation-mode text input. Our experimental results from twenty subjects showed that the speech-shift function is effective, easy to use, and a labor-saving input method.

