Results 1 - 10
of
101
QuickSet: Multimodal Interaction for Distributed Applications
, 1997
"... This paper presents an emerging application of multimodal interface research to distributed applications. We have developed the QuickSet prototype, a pen/voice system running on a hand-held PC, communicating via wireless LAN through an agent architecture to a number of systems, including NRaD's ..."
Abstract
-
Cited by 289 (35 self)
- Add to MetaCart
This paper presents an emerging application of multimodal interface research to distributed applications. We have developed the QuickSet prototype, a pen/voice system running on a hand-held PC, communicating via wireless LAN through an agent architecture to a number of systems, including NRaD's LeatherNet system, a distributed interactive training simulator built for the US Marine Corps. The paper describes the overall system architecture, a novel multimodal integration strategy offering mutual compensation among modalities, and provides examples of multimodal simulation setup. Finally, we discuss our applications experience and evaluation.
Integration and synchronization of input modes during multimodal human-computer interaction
, 1997
"... Our ability to develop robust multimodal systems will depend on knowledge of the natural integration patterns that typify people’s combined use of different input modes. To provide a foundadon for theory and design, the present research analyzed multimodal interaction while people spoke and wrote to ..."
Abstract
-
Cited by 177 (20 self)
- Add to MetaCart
Our ability to develop robust multimodal systems will depend on knowledge of the natural integration patterns that typify people’s combined use of different input modes. To provide a foundadon for theory and design, the present research analyzed multimodal interaction while people spoke and wrote to a simulated dynamic map system. Task analysis revealed that multimodal interaction occurred most frequently during spatial location commands, and with intermediate tiequency during selection commands. In addition, microanalysis of input signals identitkd sequential, simultaneous, point-and-speak, and compound integration patterns, as well as data on the temporal precedence of modes and on inter-modal lags. In synchronizing input streams, the temporal pecedence of writing over speech was a major theme, with pen input conveying location information first in a sentence. Linguistic analysis also revealed that the spoken and written modes consistently supplied complementary semantic information, rather than redundant. One long-term goal of this research is the development of predictive models of natural modality integration to guide the design of emerging multimodal architectures. Keywords multimodal interaction, integration and synchronization, speech and pen input, dynamic interactive maps, spatial location information, predictive modeling
Conversational Interfaces: Advances and Challenges
, 2000
"... The last decade has witnessed the emergence of a new breed of human computer interfaces that combines several human language technologies to enable information access and transactional processing using spoken dialogue. In this paper, I discuss my view on the research issues involved in the developme ..."
Abstract
-
Cited by 87 (6 self)
- Add to MetaCart
(Show Context)
The last decade has witnessed the emergence of a new breed of human computer interfaces that combines several human language technologies to enable information access and transactional processing using spoken dialogue. In this paper, I discuss my view on the research issues involved in the development of such interfaces, describe the recent work done in this area at the MIT Laboratory for Computer Science, and outline some of the unmet research challenges, including the need to work in real domains, spoken language generation, and portability across domains and languages.
Unification-based Multimodal Parsing
- In COLING/ACL
, 1998
"... In order to realize their full potential, multimodal systems need to support not just input from multiple modes, but also synchronized integration of modes. Johnston et al (1997) model this integration using a unification operation over typed feature structures. This is an effective solution for a b ..."
Abstract
-
Cited by 84 (4 self)
- Add to MetaCart
In order to realize their full potential, multimodal systems need to support not just input from multiple modes, but also synchronized integration of modes. Johnston et al (1997) model this integration using a unification operation over typed feature structures. This is an effective solution for a broad class of systems, but limits multimodal utterances to combinations of a single spoken phrase with a single gesture. We show how the unification-based approach can be scaled up to provide a full multimodal grammar formalism. In conjunction with a multidimensional chart parser, this approach supports integration of multiple elements distributed across the spatial, temporal, and acoustic dimensions of multimodal interaction. Integration strategies are stated in a high level unification-based rule formalism supporting rapid prototyping and iterative development of multimodal systems. 1 Introduction Multimodal interfaces enable more natural and efficient interaction between humans and mach...
Multimodal Integration - A Statistical View
- IEEE Transactions on Multimedia
, 1999
"... This paper presents a statistical approach to developing multimodal recognition systems and, in particular, to integrating the posterior probabilities of parallel input signals involved in the multimodal system. We first identify the primary factors that influence multimodal recognition performance ..."
Abstract
-
Cited by 60 (11 self)
- Add to MetaCart
(Show Context)
This paper presents a statistical approach to developing multimodal recognition systems and, in particular, to integrating the posterior probabilities of parallel input signals involved in the multimodal system. We first identify the primary factors that influence multimodal recognition performance by evaluating the multimodal recognition probabilities. We then develop two techniques, an estimate approach and a learning approach, which are designed to optimize accurate recognition during the multimodal integration process. We evaluate these methods using Quickset, a speech/gesture multimodal system, and report evaluation results based on an empirical corpus collected with Quickset. From an architectural perspective, the integration technique presented here offers enhanced robustness. It also is premised on more realistic assumptions than previous multimodal systems using semantic fusion. From a methodological standpoint, the evaluation techniques that we describe provide a valuable too...
Designing a Human-Centered, Multimodal GIS Interface to Support Emergency Management
, 2002
"... Geospatial information is critical to effective, collaborative decision -making during emergency management situations; however conventional GIS are not suited for multi-user access and highlevel abstract queries. Currently, decision makers do not always have the real time information they need; GIS ..."
Abstract
-
Cited by 55 (8 self)
- Add to MetaCart
Geospatial information is critical to effective, collaborative decision -making during emergency management situations; however conventional GIS are not suited for multi-user access and highlevel abstract queries. Currently, decision makers do not always have the real time information they need; GIS analysts produce maps at the request of individual decision makers, often leading to overlapping requests with slow delivery times. In order to overcome these limitations, a paradigm shift in interface design for GIS is needed. The research reported upon here attempts to overcome analyst-driven, menu-controlled, keyboard and mouse operated GIS by designing a multimodal, multi-user GIS interface that puts geospatial data directly in the hands of decision makers. A large screen display is used for data visualization, and collaborative, multi-user interactions in emergency management are supported through voice and gesture recognition. Speech and gesture recognition is coupled with a knowledge-based dialogue management system for storing and retrieving geospatial data. This paper describes the first prototype and the insights gained for human-centered multimodal GIS interface design.
Towards an Information Visualization Workspace: Combining Multiple Means of Expression
- Human-Computer Interaction Journal
, 1997
"... New user interface challenges are arising because people need to explore and perform many diverse tasks involving large quantities of abstract information. Visualizing information is one approach to these challenges. But visualization must involve much more than just enabling people to "see&quo ..."
Abstract
-
Cited by 55 (8 self)
- Add to MetaCart
New user interface challenges are arising because people need to explore and perform many diverse tasks involving large quantities of abstract information. Visualizing information is one approach to these challenges. But visualization must involve much more than just enabling people to "see" information. People must also manipulate it to focus on what is relevant and reorganize it to create new information. They must also communicate and share information in collaborative settings and act directly to perform their tasks based on this information. These goals suggest the need for information visualization workspaces with new interaction approaches. We present several systems - Visage, SAGE and SDM - that comprise such a workspace and a suite of user interface techniques for creating and manipulating integrative visualizations. Our work in this area revealed the need for interfaces that enable people to communicate with systems in multiple complementary ways. We discuss four dimensions f...
Multimodal user interfaces in the open agent architecture
- Proceedings of the 1997 International Conference on Intelligent User Interfaces
, 1997
"... ABSTRACT The design and development of the Open Agent Architecture (OAA) 1 system has focused on providing access to agentbased applications through an intelligent, cooperative, distributed, and multimodal agent-based user interfaces. The current multimodal interface supports a mix of spoken lang ..."
Abstract
-
Cited by 39 (8 self)
- Add to MetaCart
(Show Context)
ABSTRACT The design and development of the Open Agent Architecture (OAA) 1 system has focused on providing access to agentbased applications through an intelligent, cooperative, distributed, and multimodal agent-based user interfaces. The current multimodal interface supports a mix of spoken language, handwriting and gesture, and is adaptable to the user's preferences, resources and environment. Only the primary user interface agents need run on the local computer, thereby simplifying the task of using a range of applications from a variety of platforms, especially low-powered computers such as Personal Digital Assistants (PDAs). An important consideration in the design of the OAA was to facilitate mix-andmatch: to facilitate the reuse of agents in new and unanticipated applications, and to support rapid prototyping by facilitating the replacement of agents by better versions. The utility of the agents and tools developed as part of this ongoing research project has been demonstrated by their use as infrastructure in unrelated projects. Keywords: agent architecture, multimodal, speech, gesture, handwriting, natural language INTRODUCTION A major component of our research on multiagent systems is in the user interface to large communities of agents. We have developed agent-based multimodal user interfaces using the same agent architecture used to build the back ends of these applications. We describe these interfaces and the larger architecture, and outline some of the applications that have been built using this architecture and interface agents.
A Real-Time Framework for Natural Multimodal Interaction with Large Screen Displays
- in Fourth IEEE International Conference on Multimodal Interfaces (ICMI 2002
, 2002
"... This paper presents a framework for designing a natural multimodal human computer interaction (HCI) system. The core of the proposed framework is a principled method for combining information derived from audio and visual cues. To achieve natural interaction, both audio and visual modalities are fus ..."
Abstract
-
Cited by 32 (4 self)
- Add to MetaCart
(Show Context)
This paper presents a framework for designing a natural multimodal human computer interaction (HCI) system. The core of the proposed framework is a principled method for combining information derived from audio and visual cues. To achieve natural interaction, both audio and visual modalities are fused along with feedback through a large screen display. Careful design along with due considerations of possible aspects of a systems interaction cycle and integration has resulted in a successful system. The performance of the proposed framework has been validated through the development of several prototype systems as well as commercial applications for the retail and entertainment industry. To assess the impact of these multimodal systems (MMS), informal studies have been conducted. It was found that the system performed according to its specifications in 95% of the cases and that users showed adhoc proficiency, indicating natural acceptance of such systems.