Results 1 - 10
of
65
Mutual Disambiguation of Recognition Errors in a Multimodal Architecture
, 1999
"... As a new generation of multimodal/media systems begins to define itself, researchers are attempting to learn how to combine different modes into strategically integrated whole systems. In theory, well designed multimodal systems should be able to integrate complementary modalities in a manner that s ..."
Abstract
-
Cited by 104 (11 self)
- Add to MetaCart
As a new generation of multimodal/media systems begins to define itself, researchers are attempting to learn how to combine different modes into strategically integrated whole systems. In theory, well designed multimodal systems should be able to integrate complementary modalities in a manner that supports mutual disambiguation (MD) of errors and leads to more robust performance. In this study, over 2,000 multimodal utterances by both native and accented speakers of English were processed by a multimodal system, and then logged and analyzed. The results confirmed that multimodal systems can indeed support significant levels of MD, and also higher levels of MD for the more challenging accented users. As a result, although speech recognition as a stand-alone performed far more poorly for accented speakers, their multimodal recognition rates did not differ from those of native speakers. Implications are discussed for the development of future multimodal architectures that can perform in a...
Multimodal Interfaces That Process What Comes Naturally
- Communications of the ACM
, 2000
"... this article, we summarize the nature of new multimodal systems and how they work, with a focus on multimodal speech and pen-based systems. The primary reasons for building multimodal systems are outlined, including expansion of the accessibility of computing for diverse users, support for new forms ..."
Abstract
-
Cited by 66 (2 self)
- Add to MetaCart
this article, we summarize the nature of new multimodal systems and how they work, with a focus on multimodal speech and pen-based systems. The primary reasons for building multimodal systems are outlined, including expansion of the accessibility of computing for diverse users, support for new forms of computing not available in the past, enhancement of performance stability and robustness, and improved expressive 3
Finite-state multimodal parsing and understanding
- In Proceedings of COLING 2000
, 2000
"... Multimodal interfaces require effective parsing and understanding of utterances whose content is distributed across multiple input modes. Johnston 1998 presents an approach in which strategies for multimodal integration are stated declaratively using a unification-based grammar that is used by a mul ..."
Abstract
-
Cited by 54 (12 self)
- Add to MetaCart
Multimodal interfaces require effective parsing and understanding of utterances whose content is distributed across multiple input modes. Johnston 1998 presents an approach in which strategies for multimodal integration are stated declaratively using a unification-based grammar that is used by a multidimensional chart parser to compose inputs. This approach is highly expressive and supports a broad class of interfaces, but offers only limited potential for mutual compensation among the input modes, is subject to significant concerns in terms of computational complexity, and complicates selection among alternative multimodal interpretations of the input. In this paper, we present an alternative approach in which multimodal parsing and understanding are achieved using a weighted finite-state device which takes speech and gesture streams as inputs and outputs their joint interpretation. This approach is significantly more efficient, enables tight-coupling of multimodal understanding with speech recognition, and provides a general probabilistic framework for multimodal ambiguity resolution. 1
Deictic believability: Coordinating gesture, locomotion, and speech in lifelike pedagogical agents
- Applied Artificial Intelligence
, 1999
"... Lifelike animated agents for knowledge-based learning environments can provide timely, cus-tomized advice to support students ' problem solving. Because of their strong visual presence, they hold signi cant promise for substantially increasing students ' enjoyment of their learning experiences. Akey ..."
Abstract
-
Cited by 42 (3 self)
- Add to MetaCart
Lifelike animated agents for knowledge-based learning environments can provide timely, cus-tomized advice to support students ' problem solving. Because of their strong visual presence, they hold signi cant promise for substantially increasing students ' enjoyment of their learning experiences. Akey problem posed by lifelike agents that inhabit arti cial worlds is deictic believability. In the same manner that humans refer to objects in their environment through judicious combinations of speech, locomotion, and gesture, animated agents should be able to move through their environment, and point to and refer to objects appropriately as they provide problem-solving advice. In this paper we describe a framework for achieving deictic believabil-ity in animated agents. A deictic behavior planner exploits a world model and the evolving explanation plan as it selects and coordinates locomotive, gestural, and speech behaviors. The resulting behaviors and utterances are believable, and the references exhibit a lack ofambiguity. This approach to spatial deixis has been implemented in a lifelike animated agent, Cosmo, who inhabits a learning environment for the domain of Internet packet routing. Cosmo provides realtime advice to students as they escort packets through a virtual world of interconnected routers. Results of an informal focus group study with the Cosmo agent suggest that the spatial deixis framework produces clear explanatory animated behaviors. 1 1
Something from nothing : Augmenting a paperbased work practice via multimodal interaction
- in Proceedings of the ACM Designing Augmented Reality Environments
, 2000
"... In this paper, we describe Rasa: an environment designed to augment, rather than replace, the work habits of its users. These work habits include drawing on Post-it notes using a symbolic language. Rasa observes and understands this language, assigning meaning simultaneously to objects in both the p ..."
Abstract
-
Cited by 30 (4 self)
- Add to MetaCart
In this paper, we describe Rasa: an environment designed to augment, rather than replace, the work habits of its users. These work habits include drawing on Post-it notes using a symbolic language. Rasa observes and understands this language, assigning meaning simultaneously to objects in both the physical and virtual worlds. With Rasa, users rollout a paper map, register it, and move the augmented objects from one place to another on it. Once an object is augmented, users can modify the meaning represented by it, ask questions about that representation, view it in virtual reality, or give directions to it, all with speech and gestures. We examine the way Rasa uses language to augment objects, and compare it with prior methods, arguing that language is a more visible, flexible, and comprehensible method for creating augmentations than other approaches. Keywords Phicons, ubiquitous computing, augmented reality, mixed reality, multimodal interfaces, tangible interfaces, invisible inter...
Creating tangible interfaces by augmenting physical objects with multimodal language
- Proc ACM Conf. Intelligent User Interfaces
"... Rasa is a tangible augmented reality environment that digitally enhances the existing paper-based command and control capability in a military command post. By observing and understanding the users’ speech, pen, and touch-based multimodal language, Rasa computationally augments the physical objects ..."
Abstract
-
Cited by 29 (0 self)
- Add to MetaCart
Rasa is a tangible augmented reality environment that digitally enhances the existing paper-based command and control capability in a military command post. By observing and understanding the users’ speech, pen, and touch-based multimodal language, Rasa computationally augments the physical objects on a command post map, linking these items to digital representations of the same—for example, linking a paper map to the world and Post-it ™ notes to military units. Herein, we give a thorough account of Rasa’s underlying multiagent framework, and its recognition, understanding, and multimodal integration components. Moreover, we examine five properties of language—generativity, comprehensibility, compositionality, referentiality, and, at times, persistence—that render it suitable as an augmentation approach, contrasting these properties to those of other augmentation methods. It is these properties of language that allow users of Rasa to augment physical objects, transforming them into tangible interfaces.
Recognizing Time Pressure and Cognitive Load on the Basis of Speech: An Experimental Study
- In
, 2001
"... In an experimental environment, we simulated the situation of a user who gives speech input to a system while walking through an airport. The time pressure on the subjects and the requirement to navigate while speaking were manipulated orthogonally. Each of the 32 subjects generated 80 utterances ..."
Abstract
-
Cited by 26 (9 self)
- Add to MetaCart
In an experimental environment, we simulated the situation of a user who gives speech input to a system while walking through an airport. The time pressure on the subjects and the requirement to navigate while speaking were manipulated orthogonally. Each of the 32 subjects generated 80 utterances, which were coded semi-automatically with respect to a wide range of features, such as filled pauses. The experiment yielded new results concerning the effects of time pressure and cognitive load on speech. To see whether a system can automatically identify these conditions on the basis of speech input, we had this task performed for each subject by a Bayesian network that had been learned on the basis of the experimental data for the other subjects. The results shed light on the conditions that determine the accuracy of such recognition. 1 Background and Issues This paper is an experimental follow-up to the UM99 paper by Berthold and Jameson ([2]). Those authors argued the follo...
When do we interact multimodally? Cognitive load and multimodal communication patterns
- In Proc. of International Conference on Multimodal Interfaces
, 2004
"... Mobile usage patterns often entail high and fluctuating levels of difficulty as well as dual tasking. One major theme explored in this research is whether a flexible multimodal interface supports users in managing cognitive load. Findings from this study reveal that multimodal interface users sponta ..."
Abstract
-
Cited by 26 (1 self)
- Add to MetaCart
Mobile usage patterns often entail high and fluctuating levels of difficulty as well as dual tasking. One major theme explored in this research is whether a flexible multimodal interface supports users in managing cognitive load. Findings from this study reveal that multimodal interface users spontaneously respond to dynamic changes in their own cognitive load by shifting to multimodal communication as load increases with task difficulty and communicative complexity. Given a flexible multimodal interface, users ’ ratio of multimodal (versus unimodal) interaction increased substantially from 18.6 % when referring to established dialogue context to 77.1 % when required to establish a new context, a +315 % relative increase. Likewise, the ratio of users’ multimodal interaction increased significantly as the tasks became more difficult, from 59.2 % during low difficulty tasks, to 65.5%
The Efficiency Of Multimodal Interaction: A Case Study
, 1998
"... This paper reports on a case study comparison of a directmanipulation -based graphical user interface (GUI) with the QuickSet pen/voice multimodal interface for supporting the task of military force "laydown." In this task, a user places military units and "control measures," such as various types o ..."
Abstract
-
Cited by 24 (4 self)
- Add to MetaCart
This paper reports on a case study comparison of a directmanipulation -based graphical user interface (GUI) with the QuickSet pen/voice multimodal interface for supporting the task of military force "laydown." In this task, a user places military units and "control measures," such as various types of lines, obstacles, objectives, etc., on a map. A military expert designed his own scenario and entered it via both interfaces. Usage of QuickSet led to a speed improvement of 3.2 to 8.7fold, depending on the kind of object being created. These results suggest that there may be substantial efficiency advantages to multimodal interaction over GUIs for map-based tasks.

