Results 1 - 10
of
14
BIRON - The Bielefeld Robot Companion
- in Proc. Int. Workshop on Advances in Service Robotics
, 2004
"... In the recent past, service robots that are able to interact with humans in a natural way have become increasingly popular. A special kind of service robots that are designed for personal use at home are the so-called robot companions. They are expected to communicate with nonexpert users in natural ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
In the recent past, service robots that are able to interact with humans in a natural way have become increasingly popular. A special kind of service robots that are designed for personal use at home are the so-called robot companions. They are expected to communicate with nonexpert users in natural and intuitive way. For such natural interactions with humans the robot has to detect communication partners and focus its attention on them. Moreover, the companion has to be able to understand speech and gestures of a user and to carry out dialogs in order to get instructed, i.e., introduced to its environment. We address these problems by presenting the current state of our mobile robot BIRON, the Bielefeld Robot Companion.
An Integrated System for Cooperative Man-Machine Interaction
, 2001
"... To establish robotic applications in human environments as e.g. offices or private homes the robotic systems must be instructable by ordinary users in a natural way. In interpersonal communication humans usually apply different sensory information and are capable of integrating all perceptual cues f ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
To establish robotic applications in human environments as e.g. offices or private homes the robotic systems must be instructable by ordinary users in a natural way. In interpersonal communication humans usually apply different sensory information and are capable of integrating all perceptual cues fast and consistently. Additionally, knowledge acquired during the communication process is directly used to resolve ambiguities. As a step towards realizing similar capabilities in automatic devices this paper presents an integrated system combining automatic speech processing and image understanding. The system is intended to be an intelligent interface of a robot which manipulates objects in its surroundings according to the instructions of a human. The enhanced capabilities necessary for carrying out a multimodal man-machine dialog are realized by combining statistical and declarative methods for inference and knowledge representation. The effectiveness of this approach is demonstrated using an examplary dialog from our construction task domain.
Integrated Recognition and Interpretation of Speech for a Construction Task Domain
, 1999
"... this paper a technique that tightly couples the speech recogniser and the speech understanding. We first give an overview of the system architecture and then we focus on the language modelling and the linguistic analysis respectively. Finally evaluation results are presented. ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
this paper a technique that tightly couples the speech recogniser and the speech understanding. We first give an overview of the system architecture and then we focus on the language modelling and the linguistic analysis respectively. Finally evaluation results are presented.
Evaluating Integrated Speech- and Image Understanding
- IN PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES (ICMI
, 2002
"... The capability to coordinate and interrelate speech and vision is a virtual prerequisite for adaptive, cooperative, and flexible interaction among people. It is therefore to assume that human-machine interaction, too, would benefit from intelligent interfaces for integrated speech and image processi ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
The capability to coordinate and interrelate speech and vision is a virtual prerequisite for adaptive, cooperative, and flexible interaction among people. It is therefore to assume that human-machine interaction, too, would benefit from intelligent interfaces for integrated speech and image processing. In this paper, we first sketch an interactive system that integrates automatic speech processing with image understanding. Then, we concentrate on performance assessment which we believe is an emerging key issue in multimodal interaction. We explain the benefit of time scale analysis and usability studies and evaluate our system accordingly.
Multilevel Integration of Vision and Speech Understanding Using Bayesian Networks
- Computer Vision Systems: First Int. Conf
, 1999
"... The interaction of image and speech processing is a crucial property of multimedia systems. Classical systems using inferences on pure qualitative high level descriptions miss a lot of information when concerned with erroneous, vague, or incomplete data. We propose a new architecture that integra ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
The interaction of image and speech processing is a crucial property of multimedia systems. Classical systems using inferences on pure qualitative high level descriptions miss a lot of information when concerned with erroneous, vague, or incomplete data. We propose a new architecture that integrates various levels of processing by using multiple representations of the visually observed scene. They are vertically connected by Bayesian networks in order to find the most plausible interpretation of the scene.
Combining Speech and Haptics for Intuitive and Efficient Navigation through Image Databases
- Proc. International Conference on Multimodal Interfaces
, 2003
"... Given the size of todays professional image databases, the standard approach to object- or theme-related image retrieval is to interactively navigate through the content. But as most users of such databases are designers or artists who do not have a technical background, navigation interfaces must b ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Given the size of todays professional image databases, the standard approach to object- or theme-related image retrieval is to interactively navigate through the content. But as most users of such databases are designers or artists who do not have a technical background, navigation interfaces must be intuitive to use and easy to learn. This paper reports on efforts towards this goal. We present a system for intuitive image retrieval that features different modalities for interaction. Apart from conventional input devices like mouse or keyboard it is also possible to use speech or haptic gesture to indicate what kind of images one is looking for. Seeing a selection of images on the screen, the user provides relevance feedback to narrow the choice of motifs presented next. This is done either by scoring whole images or by choosing certain image regions. In order to derive consistent reactions from multimodal user input, asynchronous integration of modalities and probabilistic reasoning based on Bayesian networks are applied. After addressing technical details, we will discuss a series of usability experiments, which we conducted to examine the impact of multimodal input facilities on interactive image retrieval. The results indicate that users appreciate multimodality. While we observed little decrease in task performance, measures of contentment exceeded those for conventional input devices.
Spontaneous Speech Understanding for Robust Multi-Modal Human-Robot Communication
"... This paper presents a speech understanding component for enabling robust situated human-robot communication. The aim is to gain semantic interpretations of utterances that serve as a basis for multi-modal dialog management also in cases where the recognized word-stream is not grammatically correct. ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper presents a speech understanding component for enabling robust situated human-robot communication. The aim is to gain semantic interpretations of utterances that serve as a basis for multi-modal dialog management also in cases where the recognized word-stream is not grammatically correct. For the understanding process, we designed semantic processable units, which are adapted to the domain of situated communication. Our framework supports the specific characteristics of spontaneous speech used in combination with gestures in a real world scenario. It also provides information about the dialog acts. Finally, we present a processing mechanism using these concept structures to generate the most likely semantic interpretation of the utterances and to evaluate the interpretation with respect to semantic coherence. 1
Using Speech in Visual Object Recognition
- Mustererkennung 2000, 22. DAGM-Symposium Kiel, Informatik Aktuell
, 2000
"... Automatic understanding of multi-modal input is the central topic in modern human computer interfaces. But the basic questions about how the interpretations provided by different modalities can be connected in a universal and robust manner is still an open problem. ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Automatic understanding of multi-modal input is the central topic in modern human computer interfaces. But the basic questions about how the interpretations provided by different modalities can be connected in a universal and robust manner is still an open problem.
Bayesian Networks for Speech . . .
- IN PROC. OF 18TH NATIONAL CONF. ON ARTIFICIAL INTELLIGENCE
, 2002
"... The realization of natural human-computer interfaces suffers from a wide range of restrictions concerning noisy data, vague meanings, and context dependence. An essential aspect of everyday communication is the ability of humans to ground verbal interpretations in visual perception. Thus, the s ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The realization of natural human-computer interfaces suffers from a wide range of restrictions concerning noisy data, vague meanings, and context dependence. An essential aspect of everyday communication is the ability of humans to ground verbal interpretations in visual perception. Thus, the system has to be able to solve the correspondence problem of relating verbal and visual descriptions of the same object. This
Modality Integration and Dialog Management for a Robotic Assistant
"... The communication with robotic assistants or companions is a challenging new domain for the use of dialog systems. In contrast to classical spoken language interfaces users interact with mobile robots mostly in a multi-modal way. In this paper we will present the integration of several modalities in ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The communication with robotic assistants or companions is a challenging new domain for the use of dialog systems. In contrast to classical spoken language interfaces users interact with mobile robots mostly in a multi-modal way. In this paper we will present the integration of several modalities in the dialog system of BIRON -- the Bielefeld Robot Companion. Besides speech as the main modality the system integrates deictic gestures and visual scene information in order to resolve object references in a task oriented dialog. We will present example interactions with BIRON and first qualitative results from the "home-tour" scenario defined within the COGNIRON project.

