Results 11 - 20
of
215
Domain transfer svm for video concept detection
- In CVPR
, 2009
"... Cross-domain learning methods have shown promising results by leveraging labeled patterns from auxiliary domains to learn a robust classifier for target domain, which has a limited number of labeled samples. To cope with the tremendous change of feature distribution between different domains in vide ..."
Abstract
-
Cited by 14 (10 self)
- Add to MetaCart
Cross-domain learning methods have shown promising results by leveraging labeled patterns from auxiliary domains to learn a robust classifier for target domain, which has a limited number of labeled samples. To cope with the tremendous change of feature distribution between different domains in video concept detection, we propose a new cross-domain kernel learning method. Our method, referred to as Domain Transfer SVM (DTSVM), simultaneously learns a kernel function and a robust SVM classifier by minimizing the both structural risk functional of SVM and distribution mismatch of labeled and unlabeled samples between the auxiliary and target domains. Comprehensive experiments on the challenging TRECVID corpus demonstrate that DTSVM outperforms existing crossdomain learning and multiple kernel learning methods. 1.
INRIA-LEARs video copy detection system
- in: Proceedings of the TREC Video Retrieval Evaluation (TRECVID
, 2008
"... 1 Copyright detection task 2 High-level feature extraction task ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
1 Copyright detection task 2 High-level feature extraction task
The MediaMill TRECVID 2008 Semantic Video Search Engine Draft notebook paper
"... In this paper we describe our TRECVID 2008 video retrieval experiments. The MediaMill team participated in three tasks: concept detection, automatic search, and interactive search. Rather than continuing to increase the number of concept detectors available for retrieval, our TRECVID 2008 experiment ..."
Abstract
-
Cited by 10 (7 self)
- Add to MetaCart
In this paper we describe our TRECVID 2008 video retrieval experiments. The MediaMill team participated in three tasks: concept detection, automatic search, and interactive search. Rather than continuing to increase the number of concept detectors available for retrieval, our TRECVID 2008 experiments focus on increasing the robustness of a small set of detectors. To that end, our concept detection experiments emphasize in particular the role of sampling, the value of color invariant features, the influence of codebook construction, and the effectiveness of kernel-based learning parameters. For retrieval, a robust but limited set of concept detectors necessitates the need to rely on as many auxiliary information channels as possible. Therefore, our automatic search experiments focus on predicting which information channel to trust given a certain topic, leading to a novel framework for predictive video retrieval. To improve the video retrieval results further, our interactive search experiments investigate the roles of visualizing preview results for a certain browse-dimension and active learning mechanisms that learn to solve complex search topics by analysis from user browsing behavior. The 2008 edition of the TRECVID benchmark has been the most successful MediaMill participation to date, resulting in the top ranking for both concept detection and interactive search, and a runner-up ranking for automatic retrieval. Again a lot has been learned during this year’s TRECVID campaign; we highlight the most important lessons at the end of this paper. 1
The MediaMill TRECVID 2009 Semantic Video Search Engine Draft notebook paper
"... In this paper we describe our TRECVID 2009 video retrieval experiments. The MediaMill team participated in three tasks: concept detection, automatic search, and interactive search. Starting point for the MediaMill concept detection approach is our top-performing bag-of-words system of last year, whi ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
In this paper we describe our TRECVID 2009 video retrieval experiments. The MediaMill team participated in three tasks: concept detection, automatic search, and interactive search. Starting point for the MediaMill concept detection approach is our top-performing bag-of-words system of last year, which uses multiple color descriptors, codebooks with soft-assignment, and kernel-based supervised learning. We improve upon this baseline system by exploring two novel research directions. Firstly, we study a multi-modal extension by the inclusion of 20 audio concepts and fusing using two novel multi-kernel supervised learning methods. Secondly, with the help of recently proposed algorithmic refinements of bag-of-words, a bag-of-words GPU implementation, and compute clusters, we scale-up the amount of visual information analyzed by an order of magnitude, to a total of 1,000,000 i-frames. Our experiments evaluate the merit of these new components, ultimately leading to 64 robust concept detectors for video retrieval. For retrieval, a robust but limited set of concept detectors necessitates the need to rely on as many auxiliary information channels as possible. For automatic search we therefore explore how we can learn to rank various information channels simultaneously to maximize video search results for a given topic. To improve the video retrieval results further, our interactive search experiments investigate the roles of visualizing preview results for a certain browse-dimension and relevance feedback mechanisms that learn to solve complex search topics by analysis from user browsing behavior. The 2009 edition of the TRECVID benchmark has again been a fruitful participation for the MediaMill team, resulting in the top ranking for both concept detection and interactive search. 1
TRECVID 2007 collaborative annotation using active learning
- In Proceedings of the TRECVID 2007 Workshop
, 2007
"... Concept indexing in multimedia libraries is very useful for users searching and browsing but it is a very challenging research problem as well. Beyond the systems’ implementations issues, semantic indexing is strongly dependant upon the size and quality of the training examples. In this paper, we de ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Concept indexing in multimedia libraries is very useful for users searching and browsing but it is a very challenging research problem as well. Beyond the systems’ implementations issues, semantic indexing is strongly dependant upon the size and quality of the training examples. In this paper, we describe the collaborative annotation system used to annotate the High Level Features (HLF) in the development set of TRECVID 2007. This system is web-based and takes advantage of Active Learning approach. We show that Active Learning allows simultaneously getting the most useful information form the partial annotation and significantly reducing the annotation effort per participant relatively to previous collaborative annotations. 1
CROSS-DOMAIN LEARNING METHODS FOR HIGH-LEVEL VISUAL CONCEPT CLASSIFICATION
"... Exploding amounts of multimedia data increasingly require automatic indexing and classification, e.g. training classifiers to produce high-level features, or semantic concepts, chosen to represent image content, like car, person, etc. When changing the applied domain (i.e. from news domain to consum ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Exploding amounts of multimedia data increasingly require automatic indexing and classification, e.g. training classifiers to produce high-level features, or semantic concepts, chosen to represent image content, like car, person, etc. When changing the applied domain (i.e. from news domain to consumer home videos), the classifiers trained in one domain often perform poorly in the other domain due to changes in feature distributions. Additionally, classifiers trained on the new domain alone may suffer from too few positive training samples. Appropriately adapting data/models from an old domain to help classify data in a new domain is an important issue. In this work, we develop a new cross-domain SVM (CDSVM) algorithm for adapting previously learned support vectors from one domain to help classification in another domain. Better precision is obtained with almost no additional computational cost. Also, we give a comprehensive summary and comparative study of the stateof-the-art SVM-based cross-domain learning methods. Evaluation over the latest large-scale TRECVID benchmark data set shows that our CDSVM method can improve mean average precision over 36 concepts by 7.5%. For further performance gain, we also propose an intuitive selection criterion to determine which cross-domain learning method to use for each concept. Index Terms — learning systems, feature extraction, adaptive systems, image processing 1.
Comparing compact codebooks for visual categorization
- COMPUTER VISION AND IMAGE UNDERSTANDING 114 (2010) 450–462
, 2010
"... ..."
Depth Information by Stage Classification
, 2007
"... Recently, methods for estimating 3D scene geometry or absolute scene depth information from 2D image content have been proposed. However, general applicability of these methods in depth estimation may not be realizable, as inconsistencies may be introduced due to a large variety of possible pictoria ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
Recently, methods for estimating 3D scene geometry or absolute scene depth information from 2D image content have been proposed. However, general applicability of these methods in depth estimation may not be realizable, as inconsistencies may be introduced due to a large variety of possible pictorial content. We identify scene categorization as the first step towards efficient and robust depth estimation from single images. To that end, we describe a limited number of typical 3D scene geometries, called stages, each having a unique depth pattern and thus providing a specific context for stage objects. This type of scene information narrows down the possibilities with respect to individual objects ’ locations, scales and identities. We show how these stage types can be efficiently learned and how they can lead to robust extraction of depth information. Our results indicate that stages without much variation and object clutter can be detected robustly, with up to 60 % success rate.
Automatic video classification: A survey of the literature
- IEEE Transactions on Systems, Man, and Cybernetics, Part C
"... Abstract—There is much video available today. To help viewers find video of interest, work has begun on methods of automatic video classification. In this paper, we survey the video classification literature. We find that features are drawn from three modalities–text, audio, and visual–and that a la ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Abstract—There is much video available today. To help viewers find video of interest, work has begun on methods of automatic video classification. In this paper, we survey the video classification literature. We find that features are drawn from three modalities–text, audio, and visual–and that a large variety of combinations of features and classification have been explored. We describe the general features chosen and summarize the research in this area. We conclude with ideas for further research. Index Terms—video classification I.
Real-time bag of words, approximately
- In Proc. ACM Int’l Conf. Image and Video Retrieval
, 2009
"... We start from the state-of-the-art Bag of Words pipeline that in the 2008 benchmarks of TRECvid and PASCAL yielded the best performance scores. We have contributed to that pipeline, which now forms the basis to compare various fast alternatives for all of its components: (i) For descriptor extractio ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
We start from the state-of-the-art Bag of Words pipeline that in the 2008 benchmarks of TRECvid and PASCAL yielded the best performance scores. We have contributed to that pipeline, which now forms the basis to compare various fast alternatives for all of its components: (i) For descriptor extraction we propose a fast algorithm to densely sample SIFT and SURF, and we compare several variants of these descriptors. (ii) For descriptor projection we compare a k-means visual vocabulary with a Random Forest. As a preprojection step we experiment with PCA on the descriptors to decrease projection time. (iii) For classification we use Support Vector Machines and compare the χ 2 kernel with the RBF kernel. Our results lead to a 10-fold speed increase without any loss of accuracy and to a 30-fold speed increase with 17 % loss of accuracy, where the latter system does real-time classification at 26 images per second. Categories andSubjectDescriptors

