Results 1 -
8 of
8
Automatic Analysis of Multimodal Group Actions in Meetings
, 2003
"... This paper investigates the recognition of group actions in meetings. A framework is employed in which group actions result from the interactions of the individual participants. The group actions are modelled using different HMM-based approaches, where the observations are provided by a set of audio ..."
Abstract
-
Cited by 90 (26 self)
- Add to MetaCart
This paper investigates the recognition of group actions in meetings. A framework is employed in which group actions result from the interactions of the individual participants. The group actions are modelled using different HMM-based approaches, where the observations are provided by a set of audio-visual features monitoring the actions of individuals. Experiments demonstrate the importance of taking interactions into account in modelling the group actions. It is also shown that the visual modality contains useful information, even for predominantly audio-based events, motivating a multimodal approach to meeting analysis.
Head Pose Estimation in Computer Vision: A Survey
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2008
"... The capacity to estimate the head pose of another person is a common human ability that presents a unique challenge for computer vision systems. Compared to face detection and recognition, which have been the primary foci of face-related vision research, identity-invariant head pose estimation has ..."
Abstract
-
Cited by 40 (6 self)
- Add to MetaCart
The capacity to estimate the head pose of another person is a common human ability that presents a unique challenge for computer vision systems. Compared to face detection and recognition, which have been the primary foci of face-related vision research, identity-invariant head pose estimation has fewer rigorously evaluated systems or generic solutions. In this paper, we discuss the inherent difficulties in head pose estimation and present an organized survey describing the evolution of the field. Our discussion focuses on the advantages and disadvantages of each approach and spans 90 of the most innovative and characteristic papers that have been published on this topic. We compare these systems by focusing on their ability to estimate coarse and fine head pose, highlighting approaches that are well suited for unconstrained environments.
Memory cues for meeting video retrieval
- In CARPE’04: Proceedings of the the 1st ACM workshop on Continuous archival and retrieval of personal experiences
, 2004
"... We advocate a new approach to meeting video retrieval based on the use of memory cues. First we present a new survey involving 519 people in which we investigate the types of items people use to review meeting contents (e.g., minutes, video, etc.). Then we present a novel memory study involving 15 s ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
We advocate a new approach to meeting video retrieval based on the use of memory cues. First we present a new survey involving 519 people in which we investigate the types of items people use to review meeting contents (e.g., minutes, video, etc.). Then we present a novel memory study involving 15 subjects in which we investigate what people remember about past meetings (e.g., seating position, etc). Based on these studies and related research we propose a novel framework for meeting video retrieval based on memory cues. Our proposed system graphically represents important memory retrieval cues such as room layout, participant’s faces and sitting positions, etc.. Queries are formulated dynamically: as the user graphically manipulates the cues, the query results are shown. Our system (1) helps users easily express the cues they recall about a particular meeting, and (2) helps users remember new cues for meeting video retrieval. Finally, we present our approach to automatic indexing of meeting videos, present experiments, and discuss research issues in automatic indexing for retrieval using memory cues.
Speech Enhancement and Recognition in Meetings With an Audio–Visual Sensor Array
"... reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained
An overview of technologies for e-meeting and e-lecture
- in Proc. IEEE Int. Conf. on Multimedia and Expo (ICME
, 2005
"... Over the past few years, with the rapid adoption of broadband communication and advances in multimedia content capture and delivery, web-based meetings and lectures, also referred to as emeeting and e-lecture, have become popular among businesses and academic institutions because of their cost savin ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Over the past few years, with the rapid adoption of broadband communication and advances in multimedia content capture and delivery, web-based meetings and lectures, also referred to as emeeting and e-lecture, have become popular among businesses and academic institutions because of their cost savings and capabilities in providing self-paced education and convenient content access and retrieval. In fact, the technological achievements in capture, analysis, access, and delivery of emeeting and e-lecture media have already resulted in several working systems that are currently of regular usage. This paper gives an overview of existing work as well as state-of-the-art in these two research areas which are bound to affect the way we teach, learn, and collaborate. 1.
3D TRACKING AND DYNAMIC ANALYSIS OF HUMAN HEAD MOVEMENTS AND ATTENTIONAL TARGETS
"... We present a new system-level framework for the automatic detection and tracking of multiple persons ’ heads in intelligent meeting rooms. We implement this approach with a distributed array of cameras that detect the meeting participants and continuously estimate their head orientation and head mov ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
We present a new system-level framework for the automatic detection and tracking of multiple persons ’ heads in intelligent meeting rooms. We implement this approach with a distributed array of cameras that detect the meeting participants and continuously estimate their head orientation and head movements in 6 degrees-of-freedom with fine precision. The initial position of each person is obtained with a set of face detectors coupled with a new iterative approach to resolve the 3D ambiguities from overlapping epipolar lines. The head pose is obtained from a hybrid head pose estimation and tracking scheme that combines support vector regressors with a new multi-view 3D model-based tracking system. The purpose of this system is to facilitate the automatic semantic analysis of group meetings. As an example application, we evaluate the ability of the system to estimate the person that receives the most visual attention in the form of head direction. 1.
Semantic Multi-modal Analysis, Structuring, and Visualization for Candid Personal Interaction Videos
"... Videos are rich in multimedia content and semantics, which should be used by video browsers to better present the audio-visual information to the viewer. Ubiquitous video players allow for content to be scanned linearly, rarely providing summaries or methods for searching. Through analysis of audio ..."
Abstract
- Add to MetaCart
Videos are rich in multimedia content and semantics, which should be used by video browsers to better present the audio-visual information to the viewer. Ubiquitous video players allow for content to be scanned linearly, rarely providing summaries or methods for searching. Through analysis of audio and video tracks, it is possible to extract text transcripts from audio, displayed text from video, and higher-level semantics through speaker identification and scene analysis. External data sources, when available, can be used to cross-reference the video content and impose a structure for organization. Various research tools have addressed video summarization and browsing using one or more of these modalities; however, most of them assume edited videos as input. We focus our research on genres in personal interaction videos and collections of such videos in their unedited form. We present and verify formal models for their structure, and develop methods for their automatic analysis, summarization and indexing. We specify the characteristic semantic components of three related genres of candidly captured videos: formal instructions or lectures, student team project presentations, and discussions. For each genre, we design and
2009 10th International Conference on Document Analysis and Recognition GMs in On-Line Handwritten Whiteboard Note Recognition: The Influence of Implementation and Modeling
"... We present a comparison of two state-of-the-art toolboxes for implementing Graphical Models (GMs), namely the HTK and the GMTK, and their use for discrete on-line handwritten whiteboard note recognition. We then motivate a GM that is capable of modeling the statistical dependencies between the pen’s ..."
Abstract
- Add to MetaCart
We present a comparison of two state-of-the-art toolboxes for implementing Graphical Models (GMs), namely the HTK and the GMTK, and their use for discrete on-line handwritten whiteboard note recognition. We then motivate a GM that is capable of modeling the statistical dependencies between the pen’s pressure information and the remaining features after vector quantization. Since the number of variable parameters rises when more codebook entries are used for quantization, the proposed model outperforms standard HMMs for low numbers of codebook entries. 1.

