Results 1 - 10
of
22
Bridging the gap: Query by semantic example
- IEEE TRANS. MULTIMEDIA
, 2007
"... A combination of query-by-visual-example (QBVE) and semantic retrieval (SR), denoted as query-by-semantic-example (QBSE), is proposed. Images are labeled with respect to a vocabulary of visual concepts, as is usual in SR. Each image is then represented by a vector, referred to as a semantic multinom ..."
Abstract
-
Cited by 73 (8 self)
- Add to MetaCart
A combination of query-by-visual-example (QBVE) and semantic retrieval (SR), denoted as query-by-semantic-example (QBSE), is proposed. Images are labeled with respect to a vocabulary of visual concepts, as is usual in SR. Each image is then represented by a vector, referred to as a semantic multinomial, of posterior concept probabilities. Retrieval is based on the query-by-example paradigm: the user provides a query image, for which 1) a semantic multinomial is computed and 2) matched to those in the database. QBSE is shown to have two main properties of interest, one mostly practical and the other philosophical. From a practical standpoint, because it inherits the generalization ability of SR inside the space of known visual concepts (referred to as the semantic space) but performs much better outside of it, QBSE produces retrieval systems that are more accurate than what was previously possible. Philosophically, because it allows a direct comparison of visual and semantic representations under a common query paradigm, QBSE enables the design of experiments that explicitly test the value of semantic representations for image retrieval. An implementation of QBSE under the minimum probability of error (MPE) retrieval framework, previously applied with success to both QBVE and SR, is proposed, and used to demonstrate the two properties. In particular, an extensive objective comparison of QBSE with QBVE is presented, showing that the former significantly outperforms the latter both inside and outside the semantic space. By carefully controlling the structure of the semantic space, it is also shown that this improvement can only be attributed to the semantic nature of the representation on which QBSE is based.
To search or to label?: predicting the performance of search-based automatic image classifiers
- Proceedings of the 8th ACM international workshop on Multimedia information retrieval
, 2006
"... In this work we explore the trade-offs in acquiring training data for image classification models through automated web search as opposed to human annotation. Automated web search comes at no cost in human labor, but sometimes leads to decreased classification performance, while human annotations co ..."
Abstract
-
Cited by 59 (6 self)
- Add to MetaCart
(Show Context)
In this work we explore the trade-offs in acquiring training data for image classification models through automated web search as opposed to human annotation. Automated web search comes at no cost in human labor, but sometimes leads to decreased classification performance, while human annotations come at great expense in human labor but result in better performance. The primary contribution of this work is a system for predicting which visual concepts will show the greatest increase in performance from investing human effort in obtaining annotations. We propose to build this system as an estimation of the absolute gain in average precision (AP) experienced from using human annotations instead of web search. To estimate the AP gain, we rely on statistical classifiers built on top of a number of quality prediction features. We employ a feature selection algorithm to compare the quality of each of the predictors and find that cross-domain image similarity and cross-domain model generalization metrics are strong predictors, while concept frequency and within-domain model quality are weak predictors. In a test application, we find that the prediction scheme can result in a savings in annotation effort of up to 75%, while only incurring marginal damage (10 % relative decrease in mean average precision) to the overall performance of the concept models.
Techniques Used and Open Challenges to the Analysis, Indexing and Retrieval of Digital Video
- Information Systems
"... NOTICE: this is the author’s version of a work that was accepted for publication in Information Systems. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Change ..."
Abstract
-
Cited by 28 (6 self)
- Add to MetaCart
(Show Context)
NOTICE: this is the author’s version of a work that was accepted for publication in Information Systems. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Information Systems, [VOL#, ISSUE#, (DATE)] DOI#] (to be added) Video in digital format is now commonplace and widespread in both professional use, and in domestic consumer products from camcorders to mobile phones. Video content is growing in volume and while we can capture, compress, store, transmit and display video with great facility, editing videos and manipulating them based on their content is still a non-trivial activity. In this paper we give a brief review of the state of the art of video analysis, indexing and retrieval and we point to research directions which we think are promising and could make searching and browsing of video archives based on video content, as easy as searching and browsing (text) web pages. We conclude the paper with a list of grand challenges for researchers working in the area.
A Reranking Approach for Context-based Concept Fusion in Video Indexing and Retrieval
- In Conference on Image and Video Retrieval
, 2007
"... We propose to incorporate hundreds of pre-trained concept detectors to provide contextual information for improving the performance of multimodal video search. The approach takes initial search results from established video search methods (which typically are conservative in usage of concept detect ..."
Abstract
-
Cited by 28 (3 self)
- Add to MetaCart
(Show Context)
We propose to incorporate hundreds of pre-trained concept detectors to provide contextual information for improving the performance of multimodal video search. The approach takes initial search results from established video search methods (which typically are conservative in usage of concept detectors) and mines these results to discover and leverage co-occurrence patterns with detection results for hundreds of other concepts, thereby refining and reranking the initial video search result. We test the method on TRECVID 2005 and 2006 automatic video search tasks and find improvements in mean average precision (MAP) of 15%-30%. We also find that the method is adept at discovering contextual relationships that are unique to news stories occurring in the search set, which would be difficult or impossible to discover even if external training data were available.
Merging Storyboard Strategies and Automatic Retrieval for Improving Interactive Video Search
- In Proc. Image and Video Retrieval (CIVR
, 2007
"... The Carnegie Mellon University Informedia group has enjoyed consistent success with TRECVID interactive search using traditional storyboard interfaces for shot-based retrieval. For TRECVID 2006 the output of automatic search was included for the first time with storyboards, both as an option for an ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
(Show Context)
The Carnegie Mellon University Informedia group has enjoyed consistent success with TRECVID interactive search using traditional storyboard interfaces for shot-based retrieval. For TRECVID 2006 the output of automatic search was included for the first time with storyboards, both as an option for an interactive user and in a different run as the sole means of access. The automatic search makes use of relevance-based probabilistic retrieval models to determine weights for combining retrieval sources when addressing a given topic. Storyboard-based access using automatic search output outperformed extreme video retrieval interfaces of manual browsing with resizable pages and rapid serial visualization of keyframes that used the same output. Further, the full Informedia interface with automatic search results as an option along with other query mechanisms scored significantly better than all other TRECVID 2006 interactive search systems. Attributes of the automatic search and interactive search systems are discussed to further optimize shot-based retrieval from news corpora.
Avrithis - "A Region Thesaurus Approach for High-Level Concept Detection
- in the Natural Disaster Domain", 2nd international conference on Semantics And digital Media Technologies (SAMT
, 2007
"... Abstract. This paper presents an approach on high-level feature de-tection using a region thesaurus. MPEG-7 features are locally extracted from segmented regions and for a large set of images. A hierarchical clustering approach is applied and a relatively small number of region types is selected. Th ..."
Abstract
-
Cited by 9 (9 self)
- Add to MetaCart
(Show Context)
Abstract. This paper presents an approach on high-level feature de-tection using a region thesaurus. MPEG-7 features are locally extracted from segmented regions and for a large set of images. A hierarchical clustering approach is applied and a relatively small number of region types is selected. This set of region types defines the region thesaurus. Using this thesaurus, low-level features are mapped to high-level con-cepts as model vectors. This representation is then used to train support vector machine-based feature detectors. As a next step, latent semantic analysis is applied on the model vectors, to further improve the anal-ysis performance. High-level concepts detected derive from the natural disaster domain. 1
Establishing the Utility of Non-Text Search for News Video Retrieval with Real World Users
- In Proc. ACM Multimedia
, 2007
"... TRECVID participants have enjoyed consistent success using storyboard interfaces for shot-based retrieval, as measured by TRECVID interactive search mean average precision (MAP). However, much is lost by only looking at MAP, and especially by neglecting to bring in representatives of the target user ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
(Show Context)
TRECVID participants have enjoyed consistent success using storyboard interfaces for shot-based retrieval, as measured by TRECVID interactive search mean average precision (MAP). However, much is lost by only looking at MAP, and especially by neglecting to bring in representatives of the target user communities to conduct such tasks. This paper reports on the use of within-subjects experiments to reduce subject variability and emphasize the examination of specific video search interface features for their effectiveness in interactive retrieval and user satisfaction. A series of experiments is surveyed to illustrate the gradual realization of getting non-experts to utilize non-textual query features through interface adjustments. Notably, the paper explores the use of the search system by government intelligence analysts, concluding that a variety of search methods are useful
Examining User Interactions with Video Retrieval Systems
- in SPIE. 2007
"... The Informedia group at Carnegie Mellon University has since 1994 been developing and evaluating surrogates, summary interfaces, and visualizations for accessing digital video collections containing thousands of documents, millions of shots, and terabytes of data. This paper reports on TRECVID 2005 ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The Informedia group at Carnegie Mellon University has since 1994 been developing and evaluating surrogates, summary interfaces, and visualizations for accessing digital video collections containing thousands of documents, millions of shots, and terabytes of data. This paper reports on TRECVID 2005 and 2006 interactive search tasks conducted with the Informedia system by users having no knowledge of Informedia or other video retrieval interfaces, but being experts in analyst activities. Think-aloud protocols, questionnaires, and interviews were also conducted with this user group to assess the contributions of various video summarization and browsing techniques with respect to broadcast news test corpora. Lessons learned from these user interactions are reported, with recommendations on both interface improvements for video retrieval systems and enhancing the ecological validity of video retrieval interface evaluations.
High-level concept detection based on mid-level semantic information and contextual adaptation
- Proceedings of the Second International Workshop on Semantic Media Adaptation and Personalization
"... In this paper we propose the use of enhanced mid-level information, such as information obtained from the application of supervised or unsupervised learning methodologies on low-level characteristics, in order to improve semantic multimedia analysis. High-level, a priori contextual knowledge about t ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
In this paper we propose the use of enhanced mid-level information, such as information obtained from the application of supervised or unsupervised learning methodologies on low-level characteristics, in order to improve semantic multimedia analysis. High-level, a priori contextual knowledge about the semantic meaning of objects and their low-level visual descriptions are combined in an integrated approach that handles in a uniform way the gap between semantics and low-level features. Prior work on low-level feature extraction is extended and a region thesaurus containing all mid-level features is constructed using a hierarchical clustering method. A model vector that contains the distances from each mid-level element is formed and a neural network-based detector is trained for each semantic concept. Contextual adaptation improves the quality of the produced results, by utilizing fuzzy algebra, fuzzy sets and relations. The novelty of the presented work is the contextdriven mid-level manipulation of region types, utilizing a domain-independent ontology infrastructure to handle the knowledge. Early experimental results are presented using data derived from the beach domain. 1
USING REGION SEMANTICS AND VISUAL CONTEXT FOR SCENE CLASSIFICATION
"... In this paper we focus on scene classification and detection of high-level concepts within multimedia documents, by introducing an intermediate contextual approach as a means of exploiting the visual context of images. More specifically, we introduce and model a novel relational knowledge representa ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
In this paper we focus on scene classification and detection of high-level concepts within multimedia documents, by introducing an intermediate contextual approach as a means of exploiting the visual context of images. More specifically, we introduce and model a novel relational knowledge representation, founded on topological and semantic relations between the concepts of an image. We further develop an algorithm to address computationally efficient handling of visual context and extraction of mid-level region characteristics. Based on the proposed knowledge model, we combine the notion of visual context with region semantics, in order to exploit their efficacy in dealing with scene classification problems. Finally, initial experimental results are presented, in order to demonstrate possible applications of the proposed methodology. Index Terms — scene classification, concept detection, visual context, region semantics 1.