Results 1 - 10
of
17
How many high-level concepts will fill the semantic gap in video retrieval
- in International Conference on Image and Video Retrieval (CIVR). 2007
, 2007
"... video retrieval? ..."
A Reranking Approach for Context-based Concept Fusion in Video Indexing and Retrieval
- In Conference on Image and Video Retrieval
, 2007
"... We propose to incorporate hundreds of pre-trained concept detectors to provide contextual information for improving the performance of multimodal video search. The approach takes initial search results from established video search methods (which typically are conservative in usage of concept detect ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
We propose to incorporate hundreds of pre-trained concept detectors to provide contextual information for improving the performance of multimodal video search. The approach takes initial search results from established video search methods (which typically are conservative in usage of concept detectors) and mines these results to discover and leverage co-occurrence patterns with detection results for hundreds of other concepts, thereby refining and reranking the initial video search result. We test the method on TRECVID 2005 and 2006 automatic video search tasks and find improvements in mean average precision (MAP) of 15%-30%. We also find that the method is adept at discovering contextual relationships that are unique to news stories occurring in the search set, which would be difficult or impossible to discover even if external training data were available.
Multimodal fusion for multimedia analysis: a survey
, 2010
"... This survey aims at providing multimedia researchers with a state-of-the-art overview of fusion strategies, which are used for combining multiple modalities in order to accomplish various multimedia analysis tasks. The existing literature on multimodal fusion research is presented through several c ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
This survey aims at providing multimedia researchers with a state-of-the-art overview of fusion strategies, which are used for combining multiple modalities in order to accomplish various multimedia analysis tasks. The existing literature on multimodal fusion research is presented through several classifications based on the fusion methodology and the level of fusion (feature, decision, and hybrid). The fusion methods are described from the perspective of the basic concept, advantages, weaknesses, and their usage in various analysis tasks as reported in the literature. Moreover, several distinctive issues that influence a multimodal fusion process such as, the use of correlation and independence, confidence level, contextual information, synchronization between different modalities, and the optimal modality selection are also highlighted. Finally, we present the open issues for further research in the area of multimodal fusion.
Building Detectors to Support Searches on Combined . . .
"... Bridging the semantic gap is one of the big challenges in multimedia information retrieval. It exists between the extraction of low-level features of a video and its conceptual contents. In order to understand the conceptual content of a video a common approach is building concept detectors. A prob ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Bridging the semantic gap is one of the big challenges in multimedia information retrieval. It exists between the extraction of low-level features of a video and its conceptual contents. In order to understand the conceptual content of a video a common approach is building concept detectors. A problem of this approach is that the number of detectors is impossible to determine. This paper presents a set of 8 methods on how to combine two existing concepts into a new one, which occurs when both concepts appear at the same time. The scores for each shot of a video for the combined concept are computed from the output of the underlying detectors. The findings are evaluated on basis of the output of the 101 detectors including a comparison to the theoretical possibility to train a classifier on each combined concept. The precision gains are significant, specially for methods which also consider the chronological surrounding of a shot promising.
Extracting Semantics from Multimedia Content: Challenges and Solutions
"... Abstract Multimedia content accounts for over 60 % of traffic in the current internet [74]. With many users willing to spend their leisure time watching videos on YouTube or browsing photos through Flickr, sifting through large multimedia collections for useful information, especially those outside ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract Multimedia content accounts for over 60 % of traffic in the current internet [74]. With many users willing to spend their leisure time watching videos on YouTube or browsing photos through Flickr, sifting through large multimedia collections for useful information, especially those outside of the open web, is still an open problem. The lack of effective indexes to describe the content of multimedia data is a main hurdle to multimedia search, and extracting semantics from multimedia content is the bottleneck for multimedia indexing. In this chapter, we present a review on extracting semantics from a large amount of multimedia data as a statistical learning problem. Our goal is to present the current challenges and solutions from a few different perspectives and cover a sample of related work. We start with an system overview with the five major components that extracts and uses semantic metadata: data annotation, multimedia ontology, feature representation, model learning and retrieval systems. We then present challenges for each of the five components along with their existing solutions: designing multimedia lexicons and using them for concept detection, handling multiple media sources and resolving correspondence across modalities, learning structured (generative) models to account for natural data dependency or model hidden topics, handling rare classes, leveraging unlabeled data, scaling to large amounts of training data, and finally leveraging media semantics in retrieval systems. 1
Examining User Interactions with Video Retrieval Systems *
"... The Informedia group at Carnegie Mellon University has since 1994 been developing and evaluating surrogates, summary interfaces, and visualizations for accessing digital video collections containing thousands of documents, millions of shots, and terabytes of data. This paper reports on TRECVID 2005 ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The Informedia group at Carnegie Mellon University has since 1994 been developing and evaluating surrogates, summary interfaces, and visualizations for accessing digital video collections containing thousands of documents, millions of shots, and terabytes of data. This paper reports on TRECVID 2005 and 2006 interactive search tasks conducted with the Informedia system by users having no knowledge of Informedia or other video retrieval interfaces, but being experts in analyst activities. Think-aloud protocols, questionnaires, and interviews were also conducted with this user group to assess the contributions of various video summarization and browsing techniques with respect to broadcast news test corpora. Lessons learned from these user interactions are reported, with recommendations on both interface improvements for video retrieval systems and enhancing the ecological validity of video retrieval interface evaluations.
Learning to Adapt Across Multimedia Domains
, 2007
"... In multimedia, machine learning techniques are often applied to build models to map low-level feature vectors into semantic labels. As data such as images and videos come from a variety of domains (e.g., genres, sources) with different distributions, there is a benefit of adapting models trained fro ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In multimedia, machine learning techniques are often applied to build models to map low-level feature vectors into semantic labels. As data such as images and videos come from a variety of domains (e.g., genres, sources) with different distributions, there is a benefit of adapting models trained from one domain to other domains in terms of improving performance and reducing computational and human cost. In this thesis, we focus on a generic adaptation setting in multimedia, where supervised classifiers trained from one or more auxiliary domains are adapted to a new classifier that works well on a target domain with limited labeled examples. Our main contribution is a discriminative framework for function-level classifier adaptation based on regularized loss minimization, which adapts classifiers of any type by modifying their decision functions in an efficient and principled way. Two adaptation algorithms derived from this general framework, adaptive support vector machines (aSVM) and adaptive kernel logistic regression (aKLR), are discussed in details. We further extend this framework by integrating domain analysis approaches that measure and weight the utility of auxiliary
Abstract IBM Research TRECVID-2007 Video Retrieval System
"... In this paper, we describe the IBM Research system for indexing, analysis, and retrieval of video as applied to the TREC-2007 video retrieval benchmark. This year, focus of the system improvement was on cross-domain learning, automation, scalability, and interactive search. Keywords—Multimedia index ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In this paper, we describe the IBM Research system for indexing, analysis, and retrieval of video as applied to the TREC-2007 video retrieval benchmark. This year, focus of the system improvement was on cross-domain learning, automation, scalability, and interactive search. Keywords—Multimedia indexing, content-based retrieval,
A Probabilistic Ranking Framework using Unobservable Binary Events for Video Search
"... Recent content-based video retrieval systems combine output of concept detectors (also known as high-level features) with text obtained through automatic speech recognition. This paper concerns the problem of search using the noisy concept detector output only. Unlike term occurrence in text documen ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Recent content-based video retrieval systems combine output of concept detectors (also known as high-level features) with text obtained through automatic speech recognition. This paper concerns the problem of search using the noisy concept detector output only. Unlike term occurrence in text documents, the event of the occurrence of an audiovisual concept is only indirectly observable. We develop a probabilistic ranking framework for unobservable binary events to search in videos, called PR-FUBE. The framework explicitly models the probability of relevance of a video shot through the presence and absence of concepts. From our framework, we derive a ranking formula and show its relationship to previously proposed formulas. We evaluate our framework against two other retrieval approaches using the TRECVID 2005 and 2007 datasets. Especially using large numbers of concepts in retrieval results in good performance. We attribute the observed robustness against the noise introduced by less related concepts to the effective combination of concept presence and absence in our method. The experiments show that an accurate estimate for the probability of occurrence of a particular concept in relevant shots is crucial to obtain effective retrieval results.
Exploring Concept Selection Strategies for Interactive Video Search
"... Ranked shot lists from 39 automated LSCOM-Lite concept classifiers are investigated with respect to 24 TRECVID 2006 topics. Selecting the best fitting concept or pair of concepts produces the shot set with greatest utility, rather than drawing fewer shots from a larger set of concepts. Mean average ..."
Abstract
- Add to MetaCart
Ranked shot lists from 39 automated LSCOM-Lite concept classifiers are investigated with respect to 24 TRECVID 2006 topics. Selecting the best fitting concept or pair of concepts produces the shot set with greatest utility, rather than drawing fewer shots from a larger set of concepts. Mean average precision measures show concept-based shot sets have great utility for topics when perfectly traversed by a user. Using empirical data, however, shows that realistic ability to separate relevant shots from irrelevant ones and recall all the relevant ones is topic-dependent and far from perfect. Concept-based strategies including user-driven selection strategies not using idealized oracle prioritization are also discussed, with implications for query-by-concept in interactive video retrieval as concept spaces grow from tens to thousands. 1.

