Results 1 - 10
of
47
Image retrieval: ideas, influences, and trends of the new age
- ACM COMPUTING SURVEYS
, 2008
"... We have witnessed great interest and a wealth of promise in content-based image retrieval as an emerging technology. While the last decade laid foundation to such promise, it also paved the way for a large number of new techniques and systems, got many new people involved, and triggered stronger ass ..."
Abstract
-
Cited by 157 (3 self)
- Add to MetaCart
We have witnessed great interest and a wealth of promise in content-based image retrieval as an emerging technology. While the last decade laid foundation to such promise, it also paved the way for a large number of new techniques and systems, got many new people involved, and triggered stronger association of weakly related fields. In this article, we survey almost 300 key theoretical and empirical contributions in the current decade related to image retrieval and automatic image annotation, and in the process discuss the spawning of related subfields. We also discuss significant challenges involved in the adaptation of existing image retrieval techniques to build systems that can be useful in the real world. In retrospect of what has been achieved so far, we also conjecture what the future may hold for image retrieval research.
Zwol. Flickr tag recommendation based on collective knowledge
- In WWW ’08: Proc. of the 17th International Conference on World Wide Web
, 2008
"... Online photo services such as Flickr and Zooomr allow users to share their photos with family, friends, and the online community at large. An important facet of these services is that users manually annotate their photos using so called tags, which describe the contents of the photo or provide addit ..."
Abstract
-
Cited by 59 (2 self)
- Add to MetaCart
Online photo services such as Flickr and Zooomr allow users to share their photos with family, friends, and the online community at large. An important facet of these services is that users manually annotate their photos using so called tags, which describe the contents of the photo or provide additional contextual and semantical information. In this paper we investigate how we can assist users in the tagging phase. The contribution of our research is twofold. We analyse a representative snapshot of Flickr and present the results by means of a tag characterisation focussing on how users tags photos and what information is contained in the tagging. Based on this analysis, we present and evaluate tag recommendation strategies to support the user in the photo annotation task by recommending a set of tags that can be added to the photo. The results of the empirical evaluation show that we can effectively recommend relevant tags for a variety of photos with different levels of exhaustiveness of original tagging.
Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation
- In ICCV
, 2009
"... Image auto-annotation is an important open problem in computer vision. For this task we propose TagProp, a discriminatively trained nearest neighbor model. Tags of test images are predicted using a weighted nearest-neighbor model to exploit labeled training images. Neighbor weights are based on neig ..."
Abstract
-
Cited by 23 (8 self)
- Add to MetaCart
Image auto-annotation is an important open problem in computer vision. For this task we propose TagProp, a discriminatively trained nearest neighbor model. Tags of test images are predicted using a weighted nearest-neighbor model to exploit labeled training images. Neighbor weights are based on neighbor rank or distance. TagProp allows the integration of metric learning by directly maximizing the log-likelihood of the tag predictions in the training set. In this manner, we can optimally combine a collection of image similarity metrics that cover different aspects of image content, such as local shape descriptors, or global color histograms. We also introduce a word specific sigmoidal modulation of the weighted neighbor tag predictions to boost the recall of rare words. We investigate the performance of different variants of our model and compare to existing work. We present experimental results for three challenging data sets. On all three, TagProp makes a marked improvement as compared to the current state-of-the-art. 1.
Learning tag relevance by neighbor voting for social image retrieval
- In ACM MIR
, 2008
"... Social image retrieval is important for exploiting the increasing amounts of amateur-tagged multimedia such as Flickr images. Since amateur tagging is known to be uncontrolled, ambiguous, and personalized, a fundamental problem is how to reliably interpret the relevance of a tag with respect to the ..."
Abstract
-
Cited by 13 (5 self)
- Add to MetaCart
Social image retrieval is important for exploiting the increasing amounts of amateur-tagged multimedia such as Flickr images. Since amateur tagging is known to be uncontrolled, ambiguous, and personalized, a fundamental problem is how to reliably interpret the relevance of a tag with respect to the visual content it is describing. Intuitively, if different persons label similar images using the same tags, these tags are likely to reflect objective aspects of the visual content. Starting from this intuition, we propose a novel algorithm that scalably and reliably learns tag relevance by accumulating votes from visually similar neighbors. Further, treated as tag frequency, learned tag relevance is seamlessly embedded into current tag-based social image retrieval paradigms. Preliminary experiments on one million Flickr images demonstrate the potential of the proposed algorithm. Overall comparisons for both single-word queries and multiple-word queries show substantial improvement over the baseline by learning and using tag relevance. Specifically, compared with the baseline using the original tags, on average, retrieval using improved tags increases mean average precision by 24%, from 0.54 to 0.67. Moreover, simulated experiments indicate that performance can be improved further by scaling up the amount of images used in the proposed neighbor voting algorithm.
Annotating images by mining image search results
- TPAMI
"... Abstract—Although it has been studied for years by the computer vision and machine learning communities, image annotation is still far from practical. In this paper, we propose a novel attempt at model-free image annotation, which is a data-driven approach that annotates images by mining their searc ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
Abstract—Although it has been studied for years by the computer vision and machine learning communities, image annotation is still far from practical. In this paper, we propose a novel attempt at model-free image annotation, which is a data-driven approach that annotates images by mining their search results. Some 2.4 million images with their surrounding text are collected from a few photo forums to support this approach. The entire process is formulated in a divide-and-conquer framework where a query keyword is provided along with the uncaptioned image to improve both the effectiveness and efficiency. This is helpful when the collected data set is not dense everywhere. In this sense, our approach contains three steps: 1) the search process to discover visually and semantically similar search results, 2) the mining process to identify salient terms from textual descriptions of the search results, and 3) the annotation rejection process to filter out noisy terms yielded by Step 2. To ensure real-time annotation, two key techniques are leveraged—one is to map the high-dimensional image visual features into hash codes, the other is to implement it as a distributed system, of which the search and mining processes are provided as Web services. As a typical result, the entire process finishes in less than 1 second. Since no training data set is required, our approach enables annotating with unlimited vocabulary and is highly scalable and robust to outliers. Experimental results on both real Web images and a benchmark image data set show the effectiveness and efficiency of the proposed algorithm. It is also worth noting that, although the entire approach is illustrated within the divide-andconquer framework, a query keyword is not crucial to our current implementation. We provide experimental results to prove this.
Textual Query of Personal Photos Facilitated by Large-scale Web Data
"... Abstract—The rapid popularization of digital cameras and mobile phone cameras has lead to an explosive growth of personal photo collections by consumers. In this paper, we present a real-time textual query based personal photo retrieval system by leveraging millions of web images and their associate ..."
Abstract
-
Cited by 8 (6 self)
- Add to MetaCart
Abstract—The rapid popularization of digital cameras and mobile phone cameras has lead to an explosive growth of personal photo collections by consumers. In this paper, we present a real-time textual query based personal photo retrieval system by leveraging millions of web images and their associated rich textual descriptions (captions, categories, etc.). After a user provides a textual query (e.g., “water”), our system exploits the inverted file to automatically find the positive web images that are related to the textual query “water ” as well as the negative web images that are irrelevant to the textual query. Based on these automatically retrieved relevant and irrelevant web images, we employ three simple but effective classification methods, k Nearest Neighbor (kNN), decision stumps and linear SVM, to rank personal photos. To further improve the photo retrieval performance, we propose two relevance feedback methods via cross-domain learning, which effectively utilize both the web images and personal images. In particular, our proposed cross-domain learning methods can learn robust classifiers with only a very limited amount of labeled personal photos from the user by leveraging the pre-learned linear SVM classifiers in real time. We further propose an incremental cross-domain learning method in order to significantly accelerate the relevance feedback process on large consumer photo databases. Extensive experiments on two consumer photo datasets demonstrate the effectiveness and efficiency of our system, which is also inherently not limited by any predefined lexicon.
Visual Modelling of
- Complex Business Processes with Trees, Overlays and Distortion-Based Displays, Proc VLHCC’07, IEEE CS
"... evolution laws for thin crystalline films: ..."
I2T: Image Parsing to Text Description
"... In this paper, we present an image parsing to text generation (I2T) framework that generates natural language descriptions from image and video content. This framework converts the harder content based image and video retrieval problem into an easier text search problem with potential applications ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
In this paper, we present an image parsing to text generation (I2T) framework that generates natural language descriptions from image and video content. This framework converts the harder content based image and video retrieval problem into an easier text search problem with potential applications in Internet search and visual data mining. The proposed I2T framework follows three steps. 1) Input images or video frames are decomposed into their constituent visual patterns through an image parsing engine, which outputs a scene as a parse graph representation, in a spirit similar to parsing sentences in speech and natural language. 2) The parse graphs are converted into semantic representation using the Web Ontology Language (OWL) format, which is a formal and unambiguous knowledge representation. 3) A text generation engine converts the semantic representation into a semantically meaningful, human readable and query-able text report. Success of the above framework relies on two knowledge bases. The first one is a visual knowledge base that provides top-down hypotheses for image parsing and serves as an image ontology for translating parse graphs into semantic representations. The core of the visual knowledge base is an And-Or graph representation. It entails vocabularies of visual elements including pixels, primitives, parts, objects and scenes and a stochastic image grammar specifying compositional, spatial, temporal and functional relations between visual elements. We developed a large-scale ground-truth image database and an interactive image annotation software to build the And-Or graph from real-world image instances. The second knowledge base is a general knowledge base that interconnects several domain specific ontologies in the form of the Semantic Web. This knowledge base further enriches the semantic representation of visual content with domain specific information. Finally, we demonstrate a case study in video surveillance, an end-to-end system that automatically infers video events and generates natural language descriptions of video scenes. Experiments with maritime and urban scenes indicate the feasibility of the proposed approach.
A nonparametric statistical approach to clustering via mode identification
- Journal of Machine Learning Research
"... A new clustering approach based on mode identification is developed by applying new optimization techniques to a nonparametric density estimator. A cluster is formed by those sample points that ascend to the same local maximum (mode) of the density function. The path from a point to its associated m ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
A new clustering approach based on mode identification is developed by applying new optimization techniques to a nonparametric density estimator. A cluster is formed by those sample points that ascend to the same local maximum (mode) of the density function. The path from a point to its associated mode is efficiently solved by an EM-style algorithm, namely, the Modal EM (MEM). This method is then extended for hierarchical clustering by recursively locating modes of kernel density estimators with increasing bandwidths. Without model fitting, the mode-based clustering yields a density description for every cluster, a major advantage of mixture-model-based clustering. Moreover, it ensures that every cluster corresponds to a bump of the density. The issue of diagnosing clustering results is also investigated. Specifically, a pairwise separability measure for clusters is defined using the ridgeline between the density bumps of two clusters. The ridgeline is solved for by the Ridgeline EM (REM) algorithm, an extension of MEM. Based upon this new measure, a cluster merging procedure is created to enforce strong separation. Experiments on simulated and real data demonstrate that the mode-based clustering approach tends to combine the strengths of linkage and mixture-model-based clustering. In addition, the approach is robust in high dimensions and when clusters deviate substantially from Gaussian distributions. Both of these cases pose difficulty for parametric mixture modeling. A C package on the new algorithms is developed for public access at

