Results 1 - 10
of
35
How can I help you? Comparing engagement classification strategies for a robot bartender
, 2013
"... A robot agent existing in the physical world must be able to under-stand the social states of the human users it interacts with in order to respond appropriately. We compared two implemented methods for estimating the engagement state of customers for a robot bartender based on low-level sensor data ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
(Show Context)
A robot agent existing in the physical world must be able to under-stand the social states of the human users it interacts with in order to respond appropriately. We compared two implemented methods for estimating the engagement state of customers for a robot bartender based on low-level sensor data: a rule-based version derived from the analysis of human behaviour in real bars, and a trained version using supervised learning on a labelled multimodal corpus. We first compared the two implementations using cross-validation on real sensor data and found that nearly all classifier types significantly outperformed the rule-based classifier. We also carried out feature selection to see which sensor features were the most informative for the classification task, and found that the position of the head and hands were relevant, but that the torso orientation was not. Finally, we performed a user study comparing the ability of the two clas-sifiers to detect the intended user engagement of actual customers of the robot bartender; this study found that the trained classifier was faster at detecting initial intended user engagement, but that the rule-based classifier was more stable.
The Face Speaks: Contextual and Temporal Sensitivity To Backchannel Responses.
"... Abstract. It is often assumed that one person in a conversation is active (the speaker) and the rest passive (the listeners). Conversational analysis has shown, however, that listeners take an active part in the conversation, providing feedback signals that can control conversational flow. The face ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
(Show Context)
Abstract. It is often assumed that one person in a conversation is active (the speaker) and the rest passive (the listeners). Conversational analysis has shown, however, that listeners take an active part in the conversation, providing feedback signals that can control conversational flow. The face plays a vital role in these backchannel responses. A deeper understanding of facial backchannel signals is crucial for many applications in social signal processing, including automatic modeling and analysis of conversations, or in the development of life-like, effective conversational agents. Here, we present results from two experiments testing the sensitivity to the context and the timing of backchannel responses. We utilised sequences from a newly recorded database of 5-minute, two-person conversations. Experiment 1 tested how well participants would be able to match backchannel sequences to their corresponding speaker sequence. On average, participants performed well above chance. Experiment 2 tested how sensitive participants would be to temporal misalignments of the backchannel sequence. Interestingly, participants were able to estimate the correct temporal alignment for the sequence pairs. Taken together, our results show that human conversational skills are highly tuned both towards context and temporal alignment, showing the need for accurate modeling of conversations in social signal processing. 1
Look at Who’s Talking: Voice Activity Detection by Automated Gesture Analysis
"... Abstract. This paper proposes an approach for Voice Activity Detection (VAD) based on the automatic measurement of gesturing. The main motivation of the work is that gestures have been shown to be tightly correlated with speech, hence they can be considered a reliable evidence that a person is talki ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
Abstract. This paper proposes an approach for Voice Activity Detection (VAD) based on the automatic measurement of gesturing. The main motivation of the work is that gestures have been shown to be tightly correlated with speech, hence they can be considered a reliable evidence that a person is talking. The use of gestures rather than speech for performing VAD can be helpful in many situation (e.g., surveillance and monitoring in public spaces) where speech cannot be obtained for technical, legal or ethical issues. The results show that the gesturing measurement approach proposed in this work achieves, on a frame-by-frame basis, an accuracy of 71 percent in distinguishing between speech and non-speech. 1
From Talking and Listening Robots to Intelligent Communicative Machines
"... It is a popular view that the future will be inhabited by intelligent talking and listening robots with whom we shall converse using the full palette of linguistic expression available to us as human beings. Of course, recent technical and engineering developments such as Siri would appear to sugges ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
It is a popular view that the future will be inhabited by intelligent talking and listening robots with whom we shall converse using the full palette of linguistic expression available to us as human beings. Of course, recent technical and engineering developments such as Siri would appear to suggest that important steps are being made in that direction – and indeed they are. However, it is argued here that we need to go far beyond our current capabilities and understanding towards a more integrated perspective; simply interfacing state-of-the-art speech technology with a state-of-the-art robot is very unlikely to lead to effective human-robot interaction. We need to move from developing robots that simply talk and listen to evolving intelligent communicative machines that are capable of truly understanding human behavior, and this means that we need to look beyond speech, beyond words, beyond meaning, beyond communication, beyond dialog and beyond one-off interactions. I.
1Body Movements for Affective Expression: A Survey of Automatic Recognition and Generation
"... Abstract—Body movements communicate affective expressions and, in recent years, computational models have been developed to recognize affective expressions from body movements or to generate movements for virtual agents or robots which convey affective expressions. This survey summarizes the state o ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Abstract—Body movements communicate affective expressions and, in recent years, computational models have been developed to recognize affective expressions from body movements or to generate movements for virtual agents or robots which convey affective expressions. This survey summarizes the state of the art on automatic recognition and generation of such movements. For both automatic recognition and generation, important aspects such as the movements analyzed, the affective state representation used, and the use of notation systems is discussed. The survey concludes with an outline of open problems and directions for future work.
Twente debate corpus - a multimodal corpus for head movement analysis,”
- in International Conference on Language Resources and Evaluation (LREC 2014),
, 2014
"... Abstract This paper introduces a multimodal discussion corpus for the study into head movement and turn-taking patterns in debates. Given that participants either acted alone or in a pair, cooperation and competition and their nonverbal correlates can be analyzed. In addition to the video and audio ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract This paper introduces a multimodal discussion corpus for the study into head movement and turn-taking patterns in debates. Given that participants either acted alone or in a pair, cooperation and competition and their nonverbal correlates can be analyzed. In addition to the video and audio of the recordings, the corpus contains automatically estimated head movements, and manual annotations of who is speaking and who is looking where. The corpus consists of over 2 hours of debates, in 6 groups with 18 participants in total. We describe the recording setup and present initial analyses of the recorded data. We found that the person who acted as single debater speaks more and also receives more attention compared to the other debaters, also when corrected for the time speaking. We also found that a single debater was more likely to speak after a team debater. Future work will be aimed at further analysis of the relation between speaking and looking patterns, the outcome of the debate and perceived dominance of the debaters.
Learning a Sparse Codebook of Facial and Body Microexpressions for Emotion Recognition
"... Obtaining a compact and discriminative representation of facial and body expressions is a difficult problem in emotion recognition. Part of the difficulty is capturing microexpressions, i.e., short, invol-untary expressions that last for only a fraction of a second: at a micro-temporal scale, there ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Obtaining a compact and discriminative representation of facial and body expressions is a difficult problem in emotion recognition. Part of the difficulty is capturing microexpressions, i.e., short, invol-untary expressions that last for only a fraction of a second: at a micro-temporal scale, there are so many other subtle face and body movements that do not convey semantically meaningful informa-tion. We present a novel approach to this problem by exploiting the sparsity of the frequent micro-temporal motion patterns. Local space-time features are extracted over the face and body region for a very short time period, e.g., few milliseconds. A codebook of mi-croexpressions is learned from the data and used to encode the fea-tures in a sparse manner. This allows us to obtain a representation that captures the most salient motion patterns of the face and body at a micro-temporal scale. Experiments performed on the AVEC 2012 dataset show our approach achieving the best published per-formance on the expectation dimension based solely on visual fea-tures. We also report experimental results on audio-visual emotion recognition, comparing early and late data fusion techniques.
Taking Things at Face Value: How Stance Informs Politeness of Virtual Agents
"... Abstract. In this paper, we contend that interpersonal circumplex theories and politeness strategies may be combined to inform the generation of social behaviours for virtual agents. We show how stances from the interpersonal circumplex correspond to certain politeness strategies and present the res ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract. In this paper, we contend that interpersonal circumplex theories and politeness strategies may be combined to inform the generation of social behaviours for virtual agents. We show how stances from the interpersonal circumplex correspond to certain politeness strategies and present the results of a small pilot study that partially supports our approach. Our goal is to implement this model in a serious game for police training. 1
What Is at Play? Meta-techniques in Serious Games and Their Effects on Social Believability and Learning
- in Proceedings of the Social Believability in Games
, 2013
"... Abstract. We discuss several examples of meta-techniques, used in Live Action Role Play to communicate information outside the story world, and suggest that they may be used to make non-player characters more socially believable by providing players with insight into what is at play in characters ’ ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract. We discuss several examples of meta-techniques, used in Live Action Role Play to communicate information outside the story world, and suggest that they may be used to make non-player characters more socially believable by providing players with insight into what is at play in characters ’ minds. We discuss how the use of these techniques could influence player immersion and how this may impact the learning effects of serious games. 1
E.: Who’s afraid of job interviews? definitely a question for user modelling
- In: Proc. Conference on User Modeling, Adaptation and Personalization
, 2014
"... Abstract. We define job interviews as a domain of interaction that can be modelled automatically in a serious game for job interview skills train-ing. We present four types of studies: (1) field-based human-to-human job interviews, (2) field-based computer-mediated human-to-human in-terviews, (3) la ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract. We define job interviews as a domain of interaction that can be modelled automatically in a serious game for job interview skills train-ing. We present four types of studies: (1) field-based human-to-human job interviews, (2) field-based computer-mediated human-to-human in-terviews, (3) lab-based wizard of oz studies, (4) field-based human-to-agent studies. Together, these highlight pertinent questions for the user modelling field as it expands its scope to applications for social inclu-sion. The results of the studies show that the interviewees suppress their emotional behaviours and although our system recognises automatically a subset of those behaviours, the modelling of complex mental states in real-world contexts poses a challenge for the state-of-the-art user mod-elling technologies. This calls for the need to re-examine both the ap-proach to the implementation of the models and/or of their usage for the target contexts. 1