Results 1 -
9 of
9
P.: Pro-active Meeting Assistants: Attention Please
- AI & Society, The Journal of Human-Centred Systems. Springer-Verlag London Ltd
"... This paper gives an overview of pro-active meeting assistants, what they are and when they can be useful. We explain how to develop such assistants with respect to requirement definitions and elaborate on a set of Wizard of Oz experiments, aiming to find out in which form a meeting assistant should ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
(Show Context)
This paper gives an overview of pro-active meeting assistants, what they are and when they can be useful. We explain how to develop such assistants with respect to requirement definitions and elaborate on a set of Wizard of Oz experiments, aiming to find out in which form a meeting assistant should operate to be accepted by participants and whether the meeting effectiveness and efficiency can be improved by an assistant at all.
Human-Centered Collaborative Interaction
, 2006
"... Recent years have witnessed an increasing shift in interest from single user multimedia/multimodal interfaces towards support for interaction among groups of people working closely together, e.g. during meetings or problem solving sessions. However, the introduction of technology to support collabor ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
Recent years have witnessed an increasing shift in interest from single user multimedia/multimodal interfaces towards support for interaction among groups of people working closely together, e.g. during meetings or problem solving sessions. However, the introduction of technology to support collaborative practices has not been devoid of problems. It is not uncommon that technology meant to support collaboration may introduce disruptions and reduce group effectiveness. Human-centered multimedia and multimodal approaches hold a promise of providing substantially enhanced user experiences by focusing attention on human perceptual and motor capabilities, and on actual user practices. In this paper we examine the problem of providing effective support for collaboration, focusing on the role of human-centered approaches that take advantage of multimodality and multimedia. We show illustrative examples that demonstrate human-centered multimodal and multimedia solutions that provide mechanisms for dealing with the intrinsic complexity of human-human interaction support.
Collaborative multimodal photo annotation over digital paper
- In Proc. of ICMI ’06
, 2006
"... The availability of metadata annotations over media content such as photos is known to enhance retrieval and organization, particularly for large data sets. The greatest challenge for obtaining annotations remains getting users to perform the large amount of tedious man-ual work that is required. In ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
(Show Context)
The availability of metadata annotations over media content such as photos is known to enhance retrieval and organization, particularly for large data sets. The greatest challenge for obtaining annotations remains getting users to perform the large amount of tedious man-ual work that is required. In this paper we introduce an approach for semi-automated labeling based on extraction of metadata from naturally occurring conversa-tions of groups of people discussing pictures among themselves. As the burden for structuring and extracting metadata is shifted from users to the system, new recognition challenges arise. We explore how multimodal language can help in 1) detecting a con-cise set of meaningful labels to be associated with each photo, 2) achieving robust recognition of these key semantic terms, and 3) facilitating label propagation via multimodal shortcuts. Analysis of the data of a preliminary pilot collection suggests that hand-written labels may be highly indicative of the semantics of each photo, as indicated by the correlation of handwritten terms with high frequency spoken ones. We point to initial directions explor-ing a multimodal fusion technique to recover robust spelling and pronunciation of these high-value terms from redundant speech and handwriting.
Edge-splitting in a cumulative multimodal system, for a no-wait temporal threshold on information fusion, combined with an under-specified display
- In Ninth International Conference on Spoken Language Processing (Interspeech 2006 - ICSLP) (Pittsburgh, PA, Available: http://www.isca-speech.org/archive/interspeech 2006/i06 2016.html [Viewed: March 10
, 2007
"... Predicting the end of user input turns in a multimodal system can be complex. User interactions vary across a spectrum from single, unimodal inputs to multimodal combinations delivered either simultaneously or sequentially. Early multimodal systems used a fixed duration temporal threshold to determi ..."
Abstract
-
Cited by 5 (5 self)
- Add to MetaCart
(Show Context)
Predicting the end of user input turns in a multimodal system can be complex. User interactions vary across a spectrum from single, unimodal inputs to multimodal combinations delivered either simultaneously or sequentially. Early multimodal systems used a fixed duration temporal threshold to determine how long to wait for the next input before processing and integration. Several recent studies have proposed using dynamic or adaptive temporal thresholds to predict turn segmentation and thus achieve faster system response times. We introduce an approach that requires no temporal threshold. First we contrast current multimodal command interfaces to a new class of cumulative-observant multimodal systems that we introduce. Within that new system class we show how our technique of edge-splitting combined with our strategy for under-specified, no-wait, visual feedback resolves parsing problems that underlie turn segmentation errors. Test results show a 46.2 % significant reduction in multimodal recognition errors, compared to not using these techniques.
Multimodal redundancy across handwriting and speech during computer mediated human-human interactions
- IN CHI ’07
, 2007
"... Lecturers, presenters and meeting participants often say what they publicly handwrite. In this paper, we report on three empirical explorations of such multimodal redundancy — during whiteboard presentations, during a spontaneous brainstorming meeting, and during the informal annotation and discussi ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
Lecturers, presenters and meeting participants often say what they publicly handwrite. In this paper, we report on three empirical explorations of such multimodal redundancy — during whiteboard presentations, during a spontaneous brainstorming meeting, and during the informal annotation and discussion of photographs. We show that redundantly presented words, compared to other words used during a presentation or meeting, tend to be topic specific and thus are likely to be out-of-vocabulary. We also show that they have significantly higher tf-idf (term frequency–inverse document frequency) weights than other words, which we argue supports the hypothesis that they are dialogue-critical words. We frame the import of these empirical findings by describing SHACER, our recently introduced Speech and HAndwriting reCognizER, which can combine information from instances of redundant handwriting and speech to dynamically learn new vocabulary.
Multimodal Play Back of Collaborative Multiparty Corpora
"... A basic capability driving the development of systems is the ability to execute multiple versions of a system against the same set of data. That is essential for instance to verify that coding problems were corrected, to provide guarantees that changes to the code did not introduce errors (regressio ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
(Show Context)
A basic capability driving the development of systems is the ability to execute multiple versions of a system against the same set of data. That is essential for instance to verify that coding problems were corrected, to provide guarantees that changes to the code did not introduce errors (regression testing) and for comparison of performance among different algorithmic solutions. Multimodal systems process a combination of inputs and their results are susceptible to timing issues, which determine for instance whether a pair of inputs from different modalities combine or not. When a multimodal system is executed against corpus data, it becomes necessary to synchronize processing of the various input streams and to handle time-related information so as to emulate real-time execution. This is particularly complex when multiple components of the various input stream processors take longer than real-time. In this paper we describe a multimodal play back mechanism that addresses these problems. We report on two data collection and play back software and hardware environments — one used for system development and evaluation and the other used for supporting manual annotation of multimodal data.
Managing extrinsic costs via multimodal natural interaction systems
- In CHI’06 Workshop: What is the Next Generation of Human-Computer Interaction
, 2006
"... Modern day interactions, whether between remote humans or humans and computers, involve extrinsic costs to the participants. Extrinsic costs are activities that, although unrelated to a person’s primary task, must be accomplished to complete the primary task. In this paper we provide a framework for ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Modern day interactions, whether between remote humans or humans and computers, involve extrinsic costs to the participants. Extrinsic costs are activities that, although unrelated to a person’s primary task, must be accomplished to complete the primary task. In this paper we provide a framework for discussing certain extrinsic costs by describing those we term over-specification, repetition, and interruption. Natural interaction systems seek to reduce or eliminate these costs by leveraging peoples ’ innate communication abilities. However, in conceiving these interfaces, it’s critical to acknowledge that humans are naturally multimodal communicators, using speech, gesture, body position, gaze, writing, etc., to share information and intent. By recognizing and taking advantage of humans’ innate ability to communicate multimodally, extrinsic interaction costs can be reduced or eliminated. In this paper we review previous and ongoing work that demonstrates how multimodal interfaces can reduce extrinsic costs. Author Keywords Natural interaction system, multimodal interfaces, mutual
Leveraging Multimodal Redundancy for Dynamic Learning, with SHACER, a Speech and Handwriting Recognizer
- in Ph.D. Thesis, Computer Science and Electrical Engineering. 2007
"... Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number.
Presentation Trainer, your Public Speaking Multimodal Coach
"... ABSTRACT The Presentation Trainer is a multimodal tool designed to support the practice of public speaking skills, by giving the user real-time feedback about different aspects of her nonverbal communication. It tracks the user's voice and body to interpret her current performance. Based on th ..."
Abstract
- Add to MetaCart
(Show Context)
ABSTRACT The Presentation Trainer is a multimodal tool designed to support the practice of public speaking skills, by giving the user real-time feedback about different aspects of her nonverbal communication. It tracks the user's voice and body to interpret her current performance. Based on this performance the Presentation Trainer selects the type of intervention that will be presented as feedback to the user. This feedback mechanism has been designed taking in consideration the results from previous studies that show how difficult it is for learners to perceive and correctly interpret realtime feedback while practicing their speeches. In this paper we present the user experience evaluation of participants who used the Presentation Trainer to practice for an elevator pitch, showing that the feedback provided by the Presentation Trainer has a significant influence on learning.