Results 1 - 10
of
110
Evaluating Color Descriptors for Object and Scene Recognition
, 2010
"... Image category recognition is important to access visual information on the level of objects and scene types. So far, intensity-based descriptors have been widely used for feature extraction at salient points. To increase illumination invariance and discriminative power, color descriptors have been ..."
Abstract
-
Cited by 99 (14 self)
- Add to MetaCart
Image category recognition is important to access visual information on the level of objects and scene types. So far, intensity-based descriptors have been widely used for feature extraction at salient points. To increase illumination invariance and discriminative power, color descriptors have been proposed. Because many different descriptors exist, a structured overview is required of color invariant descriptors in the context of image category recognition. Therefore, this paper studies the invariance properties and the distinctiveness of color descriptors (software to compute the color descriptors from this paper is available from
The challenge problem for automated detection of 101 semantic concepts in multimedia
- In Proceedings of the ACM International Conference on Multimedia
, 2006
"... We introduce the challenge problem for generic video indexing to gain insight in intermediate steps that affect performance of multimedia analysis methods, while at the same time fostering repeatability of experiments. To arrive at a challenge problem, we provide a general scheme for the systematic ..."
Abstract
-
Cited by 89 (18 self)
- Add to MetaCart
We introduce the challenge problem for generic video indexing to gain insight in intermediate steps that affect performance of multimedia analysis methods, while at the same time fostering repeatability of experiments. To arrive at a challenge problem, we provide a general scheme for the systematic examination of automated concept detection methods, by decomposing the generic video indexing problem into 2 unimodal analysis experiments, 2 multimodal analysis experiments, and 1 combined analysis experiment. For each experiment, we evaluate generic video indexing performance on 85 hours of international broadcast news data, from the TRECVID 2005/2006 benchmark, using a lexicon of 101 semantic concepts. By establishing a minimum performance on each experiment, the challenge problem allows for component-based optimization of the generic indexing issue, while simultaneously offering other researchers a reference for comparison during indexing methodology development. To stimulate further investigations in intermediate analysis steps that influence video indexing performance, the challenge offers to the research community a manually annotated concept lexicon, pre-computed low-level multimedia features, trained classifier models, and five experiments together with baseline performance, which are all available at
Adding semantics to detectors for video retrieval
- IEEE Transactions on Multimedia
, 2007
"... Abstract — In this paper, we propose an automatic video retrieval method based on high-level concept detectors. Research in video analysis has reached the point where over 100 concept detectors can be learned in a generic fashion, albeit with mixed performance. Such a set of detectors is very small ..."
Abstract
-
Cited by 36 (11 self)
- Add to MetaCart
Abstract — In this paper, we propose an automatic video retrieval method based on high-level concept detectors. Research in video analysis has reached the point where over 100 concept detectors can be learned in a generic fashion, albeit with mixed performance. Such a set of detectors is very small still compared to ontologies aiming to capture the full vocabulary a user has. We aim to throw a bridge between the two fields by building a multimedia thesaurus, i.e. a set of machine learned concept detectors that is enriched with semantic descriptions and semantic structure obtained from WordNet. Given a multimodal user query, we identify three strategies to select a relevant detector from this thesaurus, namely: text matching, ontology querying, and semantic visual querying. We evaluate the methods against the automatic search task of the TRECVID 2005 video retrieval benchmark, using a news video archive of 85 hours in combination with a thesaurus of 363 machine learned concept detectors. We assess the influence of thesaurus size on video search performance, evaluate and compare the multimodal selection strategies for concept detectors, and finally discuss their combined potential using oracle fusion. The set of queries in the TRECVID 2005 corpus is too small to be definite in our conclusions, but the results suggest promising new lines of research. Index Terms — Video retrieval, concept learning, knowledge modeling, content analysis and indexing, multimedia information systems I.
Video corpus annotation using active learning
- In 30h European Conference on Information Retrieval (ECIR’08
, 2008
"... Abstract. Concept indexing in multimedia libraries is very useful for users searching and browsing but it is a very challenging research problem as well. Beyond the systems ’ implementations issues, semantic indexing is strongly dependent upon the size and quality of the training examples. In this p ..."
Abstract
-
Cited by 24 (3 self)
- Add to MetaCart
Abstract. Concept indexing in multimedia libraries is very useful for users searching and browsing but it is a very challenging research problem as well. Beyond the systems ’ implementations issues, semantic indexing is strongly dependent upon the size and quality of the training examples. In this paper, we describe the collaborative annotation system used to annotate the High Level Features (HLF) in the development set of TRECVID 2007. This system is web-based and takes advantage of Active Learning approach. We show that Active Learning allows simultaneously getting the most useful information from the partial annotation and significantly reducing the annotation effort per participant relatively to previous collaborative annotations. 1
High level feature detection from video in TRECVid: a 5-year retrospective of achievements
- In Ajay Divakaran, editor, Multimedia Content Analysis, Theory and Applications
, 2008
"... Summary. *Successful and effective content-based access to digital video requires fast, accurate and scalable methods to determine the video content automatically. A variety of contemporary approaches to this rely on text taken from speech within the video, or on matching one video frame against oth ..."
Abstract
-
Cited by 24 (3 self)
- Add to MetaCart
Summary. *Successful and effective content-based access to digital video requires fast, accurate and scalable methods to determine the video content automatically. A variety of contemporary approaches to this rely on text taken from speech within the video, or on matching one video frame against others using low-level characteristics like colour, texture, or shapes, or on determining and matching objects appearing within the video. Possibly the most important technique, however, is one which determines the presence or absence of a high-level or semantic feature, within a video clip or shot. By utilizing dozens, hundreds or even thousands of such semantic features we can support many kinds of content-based video navigation. Critically however, this depends on being able to determine whether each feature is or is not present in a video clip. The last 5 years have seen much progress in the development of techniques to determine the presence of semantic features within video. This progress can be tracked in the annual TRECVid benchmarking activity where dozens of research groups measure the effectiveness of their techniques on common data and using an open, metrics-based approach. In this chapter we summarise the work done on the TRECVid high-level feature task, showing the progress made year-on-year. This provides a fairly comprehensive statement on where
NUS-WIDE: A real-world web image database from National University of Singapore
- In CIVR
, 2009
"... This paper introduces a web image dataset created by NUS’s Lab for Media Search. The dataset includes: (1) 269,648 images and the associated tags from Flickr, with a total of 5,018 unique tags; (2) six types of low-level features extracted from these images, including 64-D color histogram, 144-D col ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
This paper introduces a web image dataset created by NUS’s Lab for Media Search. The dataset includes: (1) 269,648 images and the associated tags from Flickr, with a total of 5,018 unique tags; (2) six types of low-level features extracted from these images, including 64-D color histogram, 144-D color correlogram, 73-D edge direction histogram, 128-D wavelet texture, 225-D block-wise color moments extracted over 5×5 fixed grid partitions, and 500-D bag of words based on SIFT descriptions; and (3) ground-truth for 81 concepts that can be used for evaluation. Based on this dataset, we highlight characteristics of Web image collections and identify four research issues on web image annotation and retrieval. We also provide the baseline results for web image annotation by learning from the tags using the traditional k-NN algorithm. The benchmark results indicate that it is possible to learn effective models from sufficiently large image dataset to facilitate general image retrieval.
Probabilistic Models for Combining Diverse Knowledge Sources in Multimedia Retrieval
- In Ph.D Thesis
, 2006
"... In recent years, the multimedia retrieval community is gradually shifting its emphasis from analyzing one media source at a time to exploring the opportunities of combining diverse knowledge sources from correlated media types and context. This thesis presents a conditional probabilistic retrieval m ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
In recent years, the multimedia retrieval community is gradually shifting its emphasis from analyzing one media source at a time to exploring the opportunities of combining diverse knowledge sources from correlated media types and context. This thesis presents a conditional probabilistic retrieval model as a principled framework to combine diverse knowledge sources. An efficient rank-based learning approach has been developed to explicitly model the ranking relations in the learning process. Under this retrieval framework, we overview and develop a number of state-of-the-art approaches for extracting ranking features from multimedia knowledge sources. To incorporate query information in the combination model, this thesis develops a number of query analysis models that can automatically discover mixing structure of the query space based on previous retrieval results. To adapt the combination function on a per query basis, this thesis also presents a probabilistic local context analysis(pLCA) model to automatically leverage additional retrieval sources to improve initial retrieval outputs. All the proposed approaches are evaluated on multimedia retrieval tasks with large-scale video collections as well as meta-search tasks with large-scale text collections. 1
How many high-level concepts will fill the semantic gap in video retrieval
- in International Conference on Image and Video Retrieval (CIVR). 2007
, 2007
"... video retrieval? ..."
A comparison of color features for visual concept classification
- IN CIVR
, 2008
"... Concept classification is important to access visual information on the level of objects and scene types. So far, intensity-based features have been widely used. To increase discriminative power, color features have been proposed only recently. As many features exist, a structured overview is requir ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
Concept classification is important to access visual information on the level of objects and scene types. So far, intensity-based features have been widely used. To increase discriminative power, color features have been proposed only recently. As many features exist, a structured overview is required of color features in the context of concept classification. Therefore, this paper studies 1. the invariance properties and 2. the distinctiveness of color features in a structured way. The invariance properties of color features with respect to photometric changes are summarized. The distinctiveness of color features is assessed experimentally using an image and a video benchmark: the PASCAL VOC Challenge 2007 and the Mediamill Challenge. Because color features cannot be studied independently from the points at which they are extracted, different point sampling strategies based on Harris-Laplace salient points, dense sampling and the spatial pyramid are also studied. From the experimental results, it can be derived that invariance to light intensity changes and light color changes affects concept classification. The results reveal further that the usefulness of invariance is concept-specific.

