Results 1 -
4 of
4
Error-responsive feedback mechanisms for speech recognizers
, 1997
"... This thesis is about modeling, analyzing, and predicting errorful behavior in large vocabulary continuous speech recognition systems. Because today's state-of-the-art recognizers are not designed to be situated naturally in an error feedback loop, they are ill-positioned for inclusion in multi-modal ..."
Abstract
-
Cited by 37 (4 self)
- Add to MetaCart
This thesis is about modeling, analyzing, and predicting errorful behavior in large vocabulary continuous speech recognition systems. Because today's state-of-the-art recognizers are not designed to be situated naturally in an error feedback loop, they are ill-positioned for inclusion in multi-modal interfaces, multi-media databases, and other interesting applications. I make improvements to the current approach to predicting and analyzing error behaviors, which is currently based only on the measurement ofword error rate. The speech recognizer's functionality is extended to include con dence annotations, which are \meta-level " markings that indicate how certain the recognizer is that it has decoded its input correctly. This is accomplished by feeding externally de ned error conditions back to the recognizer. Error feedback enables the construction of statistical models that map measurements of the recognizer's internal states and behaviors to externally de ned error conditions.
Exploring benefits of non-linear time compression
- ACM Multimedia
, 2001
"... In comparison to text, audio-video content is much more challenging to browse. Time-compression has been suggested as a key technology that can support browsing – time compression speeds up the playback of audio-video content without causing the pitch to change. Simple forms of time-compression are ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
In comparison to text, audio-video content is much more challenging to browse. Time-compression has been suggested as a key technology that can support browsing – time compression speeds up the playback of audio-video content without causing the pitch to change. Simple forms of time-compression are starting to appear in commercial streaming-media products from Microsoft and Real Networks. In this paper we explore the potential benefits of more recent and advanced types of time compression, called non-linear time compression. The most advanced of these algorithms exploit fine-grain structure of human speech (e.g., phonemes) to differentially speedup segments of speech, so that the overall speedup can be higher. In this paper we explore what are the actual gains achieved by end-users from these advanced algorithms. Our results indicate that the gains are actually quite small in common cases and come with significant system complexity and some audio/video synchronization issues.
Pronunciation Modeling in Speech Synthesis
, 1998
"... iii ACKNOWLEDGMENTS I am very pleased to have had the encouragement and support of a committee of three linguists for whom I have the greatest respect and admiration: Mark Liberman, William Labov and Eugene Buckley. Each of them made my transition back to Penn pleasant after what seemed like a long ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
iii ACKNOWLEDGMENTS I am very pleased to have had the encouragement and support of a committee of three linguists for whom I have the greatest respect and admiration: Mark Liberman, William Labov and Eugene Buckley. Each of them made my transition back to Penn pleasant after what seemed like a long absence. It was a great pleasure to have Mark Randolph both as an external reader and as a colleague at Motorola. Mark’s work at MIT a decade ago has served as an inspiration to me. Orhan Karaali made this dissertation possible in this millennium. As my manager for over two years at Motorola, Orhan insisted on making my dissertation a priority at work. Harry Bliss provided his voice to this project and our whole group is very grateful for his patience and cooperation. My colleagues at Motorola listened to my ideas and provided technical and theoretical assistance at every turn: Noel
User Benefits of Non-Linear Time Compression
, 2000
"... In comparison to text, audio-video content is much more challenging to browse. Time-compression has been suggested as a key technology that can support browsing -- time compression speeds-up the playback of audio-video content without causing the pitch to change. Simple forms of time-compression are ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In comparison to text, audio-video content is much more challenging to browse. Time-compression has been suggested as a key technology that can support browsing -- time compression speeds-up the playback of audio-video content without causing the pitch to change. Simple forms of time-compression are starting to appear in commercial streaming-media products from Microsoft and Real Networks. In this paper we explore the potential benefits of more recent and advanced types of time compression, called non-linear time compression. The most advanced of these algorithms exploit fine-grain structure of human speech (e.g., phonemes) to differentially speed-up segments of speech, so that the overall speed-up can be higher. In this paper we explore what are the actual gains achieved by end-users from these advanced algorithms, and whether the gains are worth the additional systems complexity. Our results indicate that the gains today are actually quite small and may not be worth the additional c...

