Results 1 - 10
of
34
How to find trouble in communication
, 2003
"... Automatic dialogue systems used, for instance, in call centers, should be able to determine in a critical phase of the dialogue––indicated by the customers vocal expression of anger/irritation––when it is better to pass over to a human operator. At a first glance, this does not seem to be a complica ..."
Abstract
-
Cited by 85 (14 self)
- Add to MetaCart
Automatic dialogue systems used, for instance, in call centers, should be able to determine in a critical phase of the dialogue––indicated by the customers vocal expression of anger/irritation––when it is better to pass over to a human operator. At a first glance, this does not seem to be a complicated task: It is reported in the literature that emotions can be told apart quite reliably on the basis of prosodic features. However, these results are achieved most of the time in a laboratory setting, with experienced speakers (actors), and with elicited, controlled speech. We compare classification results obtained with the same feature set for elicited speech and for a Wizard-of-Oz scenario, where users believe that they are really communicating with an automatic dialogue system. It turns out that the closer we get to a realistic scenario, the less reliable is prosody as an indicator of the speakersÕ emotional state. As a consequence, we propose to change the target such that we cease looking for traces of particular emotions in the usersÕ speech, but instead look for indicators of TROUBLE INCOMMUNICATION. INCOMMUNICATION For this reason, we propose the module Monitoring of User State [especially of] Emotion (MOUSE MOUSE) in which a prosodic classifier is combined with other knowledge sources, such as conversationally peculiar linguistic behavior, for example, the use of repetitions. For this module, preliminary exper-imental results are reported showing a more adequate modelling of TROUBLE INCOMMUNICATION.
From First Contact to Close Encounters: A Developmentally Deep Perceptual System for a Humanoid Robot
, 2003
"... This thesis presents a perceptual system for a humanoid robot that integrates abilities such as object localization and recognition with the deeper developmental machinery required to forge those competences out of raw physical experiences. It shows that a robotic platform can build up and maintain ..."
Abstract
-
Cited by 49 (8 self)
- Add to MetaCart
(Show Context)
This thesis presents a perceptual system for a humanoid robot that integrates abilities such as object localization and recognition with the deeper developmental machinery required to forge those competences out of raw physical experiences. It shows that a robotic platform can build up and maintain a system for object localization, segmentation, and recognition, starting from very little. What the robot starts with is a direct solution to achieving figure/ground separation: it simply `pokes around' in a region of visual ambiguity and watches what happens. If the arm passes through an area, that area is recognized as free space. If the arm collides with an object, causing it to move, the robot can use that motion to segment the object from the background. Once the robot can acquire reliable segmented views of objects, it learns from them, and from then on recognizes and segments those objects without further contact. Both low-level and high-level visual features can also be learned in this way, and examples are presented for both: orientation detection and affordance recognition, respectively.
Error Detection in Spoken Human-Machine Interaction
, 1989
"... Given the state of the art of current language and speech technology, errors are unavoidable in present-day spoken dialogue systems. Therefore, one of the main concerns in dialogue design is how to decide whether or not the system has understood the user correctly. In human-human communication, dial ..."
Abstract
-
Cited by 32 (1 self)
- Add to MetaCart
Given the state of the art of current language and speech technology, errors are unavoidable in present-day spoken dialogue systems. Therefore, one of the main concerns in dialogue design is how to decide whether or not the system has understood the user correctly. In human-human communication, dialogue participants are continuously sending and receiving signals on the status of the information being exchanged. We claim that if spoken dialogue systems were able to detect such cues and change their strategy accordingly, the interaction between user and system would improve. The goals of the present study are therefore twofold: (i) to find out which positive and negative cues people actually use in human-machine interaction in response to explicit and implicit verification questions and how informative these signals are, and (ii) to explore the possibilities of spotting errors automatically and on-line. To reach these goals, we first perform a descriptive analysis, followed...
Using natural language processing and discourse features to identify understanding errors in a spoken dialogue system
- In Proc. ICML
, 2000
"... While it has recently become possible to build spoken dialogue systems that interact with users in real-time in a range of domains, systems that support conversational natural language are still subject to a large number of spoken language understanding (SLU) errors. Endowing such systems with the a ..."
Abstract
-
Cited by 31 (1 self)
- Add to MetaCart
While it has recently become possible to build spoken dialogue systems that interact with users in real-time in a range of domains, systems that support conversational natural language are still subject to a large number of spoken language understanding (SLU) errors. Endowing such systems with the ability to reliably distinguish SLU errors from correctly understood utterances might allow them to correct some errors automatically or to interact with users to repair them, thereby improving the system’s overall performance. We report experiments on learning to automatically distinguish SLU errors in 11,787 spoken utterances collected in a field trial of AT&T’s How May I Help You system interacting with live customer traffic. We apply the automatic classifier RIPPER (Cohen 96) to train an SLU classifier using features that are automatically obtainable in real-time. The classifer achieves 86 % accuracy on this task, an improvement of 23 % over the majority class baseline. We show that the most important features are those that the natural language understanding module can compute, suggesting that integrating the trained classifier into the NLU module of the How May I Help You system should be straightforward. 1.
Automatically Training a Problematic Dialogue Predictor for a Spoken Dialogue System
- Journal of Artificial Intelligence Research
, 2002
"... sources and services from any phone. However, current spoken dialogue systems are deficient in their strategies for preventing, identifying and repairing problems that arise in the conversation. This paper reports results on automatically training a Problematic Dialogue Predictor to predict probl ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
(Show Context)
sources and services from any phone. However, current spoken dialogue systems are deficient in their strategies for preventing, identifying and repairing problems that arise in the conversation. This paper reports results on automatically training a Problematic Dialogue Predictor to predict problematic human-computer dialogues using a corpus of 4692 dialogues collected with the How May I Help You spoken dialogue system. The Problematic Dialogue Predictor can be immediately applied to the system's decision of whether to transfer the call to a human customer care agent, or be used as a cue to the system's dialogue manager to modify its behavior to repair problems, and even perhaps, to prevent them. We show that a Problematic Dialogue Predictor using automaticallyobtainable features from the first two exchanges in the dialogue can predict problematic dialogues 13.2% more accurately than the baseline.
Generalizing Prosodic Prediction Of Speech Recognition Errors
- In Proceedings of the 6th International Conference of Spoken Language Processing (ICSLP-2000
, 2000
"... Since users of spoken dialogue systems have difficulty correcting system misconceptions, it is important for automatic speech recognition (ASR) systems to know when their best hypothesis is incorrect. We compare results of previous experiments which showed that prosody improves the detection of ASR ..."
Abstract
-
Cited by 22 (6 self)
- Add to MetaCart
(Show Context)
Since users of spoken dialogue systems have difficulty correcting system misconceptions, it is important for automatic speech recognition (ASR) systems to know when their best hypothesis is incorrect. We compare results of previous experiments which showed that prosody improves the detection of ASR errors to experiments with a new system and new domain, the W99 conference registration system. Our new results again show that prosodic features can improve prediction of ASR misrecognitions over the use of other standard techniques for ASR rejection.
Characterizing and predicting corrections in spoken dialogue systems
- Comput. Linguist
, 2006
"... This article focuses on the analysis and prediction of corrections, defined as turns where a user tries to correct a prior error made by a spoken dialogue system. We describe our labeling procedure of various corrections types and statistical analyses of their features in a corpus collected from a t ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
This article focuses on the analysis and prediction of corrections, defined as turns where a user tries to correct a prior error made by a spoken dialogue system. We describe our labeling procedure of various corrections types and statistical analyses of their features in a corpus collected from a train information spoken dialogue system. We then present results of machinelearning experiments designed to identify user corrections of speech recognition errors. We investigate the predictive power of features automatically computable from the prosody of the turn, the speech recognition process, experimental conditions, and the dialogue history. Our best-performing features reduce classification error from baselines of 25.70–28.99 % to 15.72%. 1.
Acoustic Models for Hyperarticulated Speech
, 2000
"... In spoken dialogue systems, hyperarticulation occur as an effect to recover previous recognition errors. It is commonly observed that in particular real users apply similar recovery strategies as in human-human interactions. Previous studies have shown that current speech recognizer cannot handle hy ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
In spoken dialogue systems, hyperarticulation occur as an effect to recover previous recognition errors. It is commonly observed that in particular real users apply similar recovery strategies as in human-human interactions. Previous studies have shown that current speech recognizer cannot handle hyperarticulated speech. As an effect of higher word error rates at hyperarticulated speech, humans try to reinforce this speaking style which result in even more recognition errors. In this paper, we present approaches to build robust acoustic models for hyperarticulated speech. One key point is that the changes of acoustic features at hyperarticulation is a phone dependent effect. The idea is to use the likelihood criterion to decide, which phones should be treated separately. This can be done by incorporating dynamic questions about hyperarticulation into the clustering stage. Based on such phonetic decision tree, we can generate appropriate acoustic models. With this method, we achieved a wo...
Audiovisual cues to uncertainty
- Proceedings of the ISCA Workshop on Error Handling in Spoken Dialogue Systems, Chateau-D'Oex
, 2003
"... This paper presents research on the use of audiovisual prosody to signal a speaker’s level of uncertainty. The first study con-sists of an experiment, in which subjects are asked factual ques-tions in a conversational setting, while they are being filmed. Statistical analyses bring to light that the ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
This paper presents research on the use of audiovisual prosody to signal a speaker’s level of uncertainty. The first study con-sists of an experiment, in which subjects are asked factual ques-tions in a conversational setting, while they are being filmed. Statistical analyses bring to light that the speakers ’ Feeling-of-Knowing (FOK) correlate significantly with a number of vi-sual and verbal properties. Interestingly, it appears that an-swers tend to have a higher number of marked feature settings (i.e., divergences of the neutral audiovisual expression) when the FOK score is low, while the reverse is true for non-answers. The second study is a perception experiment, in which a selec-tion of the utterances from the first study is presented to sub-jects in one of three conditions: vision only, sound only or vi-sion+sound. Results reveal that human observers can reliably distinguish HighFOK responses from LowFOK responses in all three conditions, be it that answers are easier than non-answers, and that a bimodal presentation of the stimuli is easier than their unimodal counterparts. Results of these two experiments are potentially relevant for improving the communication style in human-machine interaction. 1.