Results 1 -
8 of
8
Successful approaches in the trec video retrieval evaluations
- In Proc. ACM Multimedia
, 2004
"... This paper reviews successful approaches in evaluations of video retrieval over the last three years. The task involves the search and retrieval of shots from MPEG digitized video recordings using a combination of automatic speech, image and video analysis and information retrieval technologies. The ..."
Abstract
-
Cited by 30 (5 self)
- Add to MetaCart
This paper reviews successful approaches in evaluations of video retrieval over the last three years. The task involves the search and retrieval of shots from MPEG digitized video recordings using a combination of automatic speech, image and video analysis and information retrieval technologies. The search evaluations are grouped into interactive (with a human in the loop) and noninteractive (where the human merely enters the query into the system) submissions. Most non-interactive search approaches have relied extensively on text retrieval, and only recently have image-based features contributed reliably to improved search performance. Interactive approaches have substantially outperformed all non-interactive approaches, with most systems relying heavily on the user’s ability to refine queries and reject spurious answers. We will examine both the successful automatic search approaches and the user interface techniques that have enabled high performance video retrieval.
Bridging the gap: Query by semantic example
- IEEE TRANS. MULTIMEDIA
, 2007
"... A combination of query-by-visual-example (QBVE) and semantic retrieval (SR), denoted as query-by-semantic-example (QBSE), is proposed. Images are labeled with respect to a vocabulary of visual concepts, as is usual in SR. Each image is then represented by a vector, referred to as a semantic multinom ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
A combination of query-by-visual-example (QBVE) and semantic retrieval (SR), denoted as query-by-semantic-example (QBSE), is proposed. Images are labeled with respect to a vocabulary of visual concepts, as is usual in SR. Each image is then represented by a vector, referred to as a semantic multinomial, of posterior concept probabilities. Retrieval is based on the query-by-example paradigm: the user provides a query image, for which 1) a semantic multinomial is computed and 2) matched to those in the database. QBSE is shown to have two main properties of interest, one mostly practical and the other philosophical. From a practical standpoint, because it inherits the generalization ability of SR inside the space of known visual concepts (referred to as the semantic space) but performs much better outside of it, QBSE produces retrieval systems that are more accurate than what was previously possible. Philosophically, because it allows a direct comparison of visual and semantic representations under a common query paradigm, QBSE enables the design of experiments that explicitly test the value of semantic representations for image retrieval. An implementation of QBSE under the minimum probability of error (MPE) retrieval framework, previously applied with success to both QBVE and SR, is proposed, and used to demonstrate the two properties. In particular, an extensive objective comparison of QBSE with QBVE is presented, showing that the former significantly outperforms the latter both inside and outside the semantic space. By carefully controlling the structure of the semantic space, it is also shown that this improvement can only be attributed to the semantic nature of the representation on which QBSE is based.
A probabilistic semantic model for image annotation and multi-modal image retrieval
- IN PROC INT’L CONF COMPUTER VISION
, 2005
"... This paper addresses automatic image annotation problem and its application to multi-modal image retrieval. The contribution of our work is three-fold. (1) We propose a probabilistic semantic model in which the visual features and the textual words are connected via a hidden layer which constitutes ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
This paper addresses automatic image annotation problem and its application to multi-modal image retrieval. The contribution of our work is three-fold. (1) We propose a probabilistic semantic model in which the visual features and the textual words are connected via a hidden layer which constitutes the semantic concepts to be discovered to explicitly exploit the synergy among the modalities. (2) The association of visual features and textual words is determined in a Bayesian framework such that the confidence of the association can be provided. (3) Extensive evaluation on a large-scale, visually and semantically diverse image collection crawled from Web is reported to evaluate the prototype system based on the model. In the proposed probabilistic model, a hidden concept layer which connects the visual feature and the word layer is discovered by fitting a generative model to the training image and annotation words through an Expectation-Maximization (EM) based iterative learning procedure. The evaluation of the prototype system on 17,000 images and 7,736 automatically extracted annotation words from crawled Web pages for multi-modal image retrieval has indicated that the proposed semantic model and the developed Bayesian framework are superior to a state-of-the-art peer system in the literature.
A comparative Study of Evidence Combination Strategies
- In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004
, 2004
"... This paper reports on experimental results obtained from a performance comparison of feature combinations strategies in content based image retrieval. The use of Support Vector Machines is compared to CombMIN, CombMAX, CombSUM and BordaFuse combination strategies, all of which are evaluated on a car ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
This paper reports on experimental results obtained from a performance comparison of feature combinations strategies in content based image retrieval. The use of Support Vector Machines is compared to CombMIN, CombMAX, CombSUM and BordaFuse combination strategies, all of which are evaluated on a carefully compiled set of Corel images and the TRECVID 2003 search task collection. 1.
Image annotation: which approach for realistic databases?
, 2007
"... This paper describes an efficient approach to image annotation. It ranked first on the recent scene categorization track of the ImagEVAL benchmark. We show how homogeneous global image descriptors combined with a pool of Support Vector Machines achieve very good results. We also used this approach o ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
This paper describes an efficient approach to image annotation. It ranked first on the recent scene categorization track of the ImagEVAL benchmark. We show how homogeneous global image descriptors combined with a pool of Support Vector Machines achieve very good results. We also used this approach on several well known object recognition databases to emphasize two main aspects of this research domain: the importance of contextual information in object recognition and the unsuitability of many standard databases for this task.
Statistical Models For Automatic Video Annotation And Retrieval
- In Proceedings of the IEEE ICASSP International Conference on Acoustics, Speech and Signal Processing
, 2004
"... We apply a continuous relevance model (CRM) to the problem of directly retrieving the visual content of videos using text queries. The model computes a joint probability model for image features and words using a training set of annotated images. The model may then be used to annotate unseen test im ..."
Abstract
- Add to MetaCart
We apply a continuous relevance model (CRM) to the problem of directly retrieving the visual content of videos using text queries. The model computes a joint probability model for image features and words using a training set of annotated images. The model may then be used to annotate unseen test images. The probabilistic annotations are used for retrieval using text queries. We also propose a modified model - the normalized CRM - which substantially improves performance on a subset of the TREC Video dataset. 1.
Holistic Context Models for Visual Recognition
, 2002
"... A novel framework to context modeling, based on the probability of co-occurrence of objects and scenes is proposed. The modeling is quite simple, and builds upon the availability of robust appearance classifiers. Images are represented by their posterior probabilities with respect to a set of conte ..."
Abstract
- Add to MetaCart
A novel framework to context modeling, based on the probability of co-occurrence of objects and scenes is proposed. The modeling is quite simple, and builds upon the availability of robust appearance classifiers. Images are represented by their posterior probabilities with respect to a set of contextual models, built upon the bag-of-features image representation, through two layers of probabilistic modeling. The first layer represents the image in a semantic space, where each dimension encodes an appearance-based posterior probability with respect to a concept. Due to the inherent ambiguity of classifying image patches, this representation suffers from a certain amount of contextual noise. The second layer enables robust inference in the presence of this noise, by modeling the distribution of each concept in the semantic space. A thorough and systematic experimental evaluation of the proposed context modeling is presented. It is shown that it captures the contextual “gist ” of natural images. Scene classification experiments show that contextual classifiers outperform their appearance-based counterparts, irrespective of the precise choice and accuracy of the latter. The effectiveness of the proposed approach to context modeling is further demonstrated through a comparison to existing approaches on scene classification and image retrieval, on benchmark datasets. In all cases, the proposed approach achieves superior results.

