Results 11 - 20
of
79
Costume: A New Feature for Automatic Video Content Indexing
- In Proc. RIAO
, 2004
"... This paper deals with the introduction of costume as a new feature for automatic video content indexing. We present in this paper an application of person recognition using costumes, in order to show the relevance of costume for indexation. ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
This paper deals with the introduction of costume as a new feature for automatic video content indexing. We present in this paper an application of person recognition using costumes, in order to show the relevance of costume for indexation.
Multimedia event-based video indexing using time intervals
- IEEE TRANS. MULTIMEDIA
, 2005
"... We propose the time interval multimedia event (TIME) framework as a robust approach for classification of semantic events in multimodal video documents. The representation used in TIME extends the Allen temporal interval relations and allows for proper inclusion of context and synchronization of the ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
We propose the time interval multimedia event (TIME) framework as a robust approach for classification of semantic events in multimodal video documents. The representation used in TIME extends the Allen temporal interval relations and allows for proper inclusion of context and synchronization of the heterogeneous information sources involved in multimodal video analysis. To demonstrate the viability of our approach, it was evaluated on the domains of soccer and news broadcasts. For automatic classification of semantic events, we compare three different machine learning techniques, i.c. C4.5 decision tree, maximum entropy, and support vector machine. The results show that semantic video indexing results significantly benefit from using the TIME framework.
Time Interval Maximum Entropy Based Event Indexing In Soccer Video
- IN IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO
, 2003
"... Multimodal indexing of events in video documents poses problems with respect to representation, inclusion of contextual information, and synchronization of the heterogeneous information sources involved. In this paper we present the Time Interval Maximum Entropy (TIME) framework that tackles aforeme ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
Multimodal indexing of events in video documents poses problems with respect to representation, inclusion of contextual information, and synchronization of the heterogeneous information sources involved. In this paper we present the Time Interval Maximum Entropy (TIME) framework that tackles aforementioned problems. To demonstrate the viability of TIME for event classification in multimodal video, an evaluation was performed on the domain of soccer broadcasts. It was found that by applying TIME, the amount of video a user has to watch in order to see almost all highlights can be reduced considerably.
I.: Visual Signatures in Video Visualization
- In Proc. IEEE Visualization
, 2006
"... Abstract — Video visualization is a computation process that extracts meaningful information from original video data sets and conveys the extracted information to users in appropriate visual representations. This paper presents a broad treatment of the subject, following a typical research pipeline ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
Abstract — Video visualization is a computation process that extracts meaningful information from original video data sets and conveys the extracted information to users in appropriate visual representations. This paper presents a broad treatment of the subject, following a typical research pipeline involving concept formulation, system development, a path-finding user study, and a field trial with real application data. In particular, we have conducted a fundamental study on the visualization of motion events in videos. We have, for the first time, deployed flow visualization techniques in video visualization. We have compared the effectiveness of different abstract visual representations of videos. We have conducted a user study to examine whether users are able to learn to recognize visual signatures of motions, and to assist in the evaluation of different visualization techniques. We have applied our understanding and the developed techniques to a set of application video clips. Our study has demonstrated that video visualization is both technically feasible and cost-effective. It has provided the first set of evidence confirming that ordinary users can be accustomed to the visual features depicted in video visualizations, and can learn to recognize visual signatures of a variety of motion events. Index Terms—Video visualization, volume visualization, flow visualization, human factors, user study, visual signatures, video processing, optical flow, GPU rendering. 1
A Comparison of Text and Shape Matching for Retrieval of Online 3D Models
- In Proc. European Conference on Digital Libraries
, 2004
"... Because of recent advances in graphics hard- and software, both the production and use of 3D models are increasing at a rapid pace. As a result, a large number of 3D models have become available on the web, and new research is being done on 3D model retrieval methods. Query and retrieval can be d ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Because of recent advances in graphics hard- and software, both the production and use of 3D models are increasing at a rapid pace. As a result, a large number of 3D models have become available on the web, and new research is being done on 3D model retrieval methods. Query and retrieval can be done solely based on associated text, as in image retrieval, for example (e.g. Google Image Search [1] and [2, 3]). Other research focuses on shape-based retrieval, based on methods that measure shape similarity between 3D models (e.g., [4]).
I2T: Image Parsing to Text Description
"... In this paper, we present an image parsing to text generation (I2T) framework that generates natural language descriptions from image and video content. This framework converts the harder content based image and video retrieval problem into an easier text search problem with potential applications ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
In this paper, we present an image parsing to text generation (I2T) framework that generates natural language descriptions from image and video content. This framework converts the harder content based image and video retrieval problem into an easier text search problem with potential applications in Internet search and visual data mining. The proposed I2T framework follows three steps. 1) Input images or video frames are decomposed into their constituent visual patterns through an image parsing engine, which outputs a scene as a parse graph representation, in a spirit similar to parsing sentences in speech and natural language. 2) The parse graphs are converted into semantic representation using the Web Ontology Language (OWL) format, which is a formal and unambiguous knowledge representation. 3) A text generation engine converts the semantic representation into a semantically meaningful, human readable and query-able text report. Success of the above framework relies on two knowledge bases. The first one is a visual knowledge base that provides top-down hypotheses for image parsing and serves as an image ontology for translating parse graphs into semantic representations. The core of the visual knowledge base is an And-Or graph representation. It entails vocabularies of visual elements including pixels, primitives, parts, objects and scenes and a stochastic image grammar specifying compositional, spatial, temporal and functional relations between visual elements. We developed a large-scale ground-truth image database and an interactive image annotation software to build the And-Or graph from real-world image instances. The second knowledge base is a general knowledge base that interconnects several domain specific ontologies in the form of the Semantic Web. This knowledge base further enriches the semantic representation of visual content with domain specific information. Finally, we demonstrate a case study in video surveillance, an end-to-end system that automatically infers video events and generates natural language descriptions of video scenes. Experiments with maritime and urban scenes indicate the feasibility of the proposed approach.
A Framework For Aligning And Indexing Movies With Their Script
, 2003
"... A continuity script describes very carefully the content of a movie shot by shot. This paper introduces a framework for extracting structural units such as shots, scenes, actions and dialogs from the script, and aligning them to the movie based on the longest matching subsequence between them. We pr ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
A continuity script describes very carefully the content of a movie shot by shot. This paper introduces a framework for extracting structural units such as shots, scenes, actions and dialogs from the script, and aligning them to the movie based on the longest matching subsequence between them. We present experimental results and applications of the framework with a full-length movie and discuss its applicability to large-scale film repositories.
A Review on Multimodal Video Indexing
- Proceedings of the 8th Annual Conference of the Advanced School for Computing and Imaging
, 2002
"... Efficient and effective handling of video documents depends on the availability of indexes. Manual indexing is unfeasible for large video collections. Efficient, single modality based, video indexing methods have appeared in literature. Effective indexing, however, requires a multimodal approach in ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Efficient and effective handling of video documents depends on the availability of indexes. Manual indexing is unfeasible for large video collections. Efficient, single modality based, video indexing methods have appeared in literature. Effective indexing, however, requires a multimodal approach in which either the most appropriate modality is selected or the different modalities are used in collaborative fashion. In this paper we present a framework for multimodal video indexing, which views a video document from the perspective of its author. The framework serves as a blueprint for a generic and flexible multimodal video indexing system, and generalizes different state-of-the-art video indexing methods. It furthermore forms the basis for categorizing these different methods.
Event Mining in Multimedia Streams
, 2008
"... Events are real-world occurrences that unfold over space and time. Event mining from multimedia streams improves the access and reuse of large media collections, and it has been an active area of research with notable recent progress. This paper contains a survey on the problems and solutions in eve ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Events are real-world occurrences that unfold over space and time. Event mining from multimedia streams improves the access and reuse of large media collections, and it has been an active area of research with notable recent progress. This paper contains a survey on the problems and solutions in event mining, approached from three aspects: event description, event-modeling components, and current event mining systems. We present a general characterization of multimedia events, motivated by the maxim of five BW[s and one BH [ for reporting real-world events in journalism: when, where, who, what, why, and how. We discuss the causes for semantic variability in real-world descriptions, including multilevel
Semantic shot classification in sports video
- in Proceedings of the International Conference on Electronic Imaging: Storage and Retrieval for Media Databases
, 2003
"... In this paper, we present a unified framework for semantic shot classification in sports videos. Unlike previous approaches, which focus on clustering by aggregating shots with similar low-level features, the proposed scheme makes use of domain knowledge of specific sport to perform a top-down video ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
In this paper, we present a unified framework for semantic shot classification in sports videos. Unlike previous approaches, which focus on clustering by aggregating shots with similar low-level features, the proposed scheme makes use of domain knowledge of specific sport to perform a top-down video shot classification, including identification of video shot classes for each sport, and supervised learning and classification of given sports video with low-level and middle-level features extracted from the sports video. It is observed that for each sport we can predefine a small number of semantic shot classes, 5~10, which covers 90~95 % of sports broadcasting video. With supervised learning method, we can map the low-level features to middle-level semantic video shot attributes such as dominant object motion (a player), camera motion patterns, and court shape, etc. On the basis of the appropriate fusion of those middle-level shot classes, we classify video shots into the predefined video shot classes, each of which has a clear semantic meaning. The proposed method has been tested over 4 types of sports videos: tennis, basketball, volleyball and soccer. Good classification accuracy 85~95 % has been achieved. With correctly classified sports video shots further structural and temporal analysis will be greatly facilitated.

