Results 1 -
9 of
9
Empowering Visual Categorization With the GPU
, 2011
"... Visual categorization is important to manage large collections of digital images and video, where textual metadata is often incomplete or simply unavailable. The bag-of-words model has become the most powerful method for visual categorization of images and video. Despite its high accuracy, a severe ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Visual categorization is important to manage large collections of digital images and video, where textual metadata is often incomplete or simply unavailable. The bag-of-words model has become the most powerful method for visual categorization of images and video. Despite its high accuracy, a severe drawback of this model is its high computational cost. As the trend to increase computational power in newer CPU and GPU architectures is to increase their level of parallelism, exploiting this parallelism becomes an important direction to handle the computational cost of the bag-of-words approach. When optimizing a system based on the bag-of-words approach, the goal is to minimize the time it takes to process batches of images. In this paper, we analyze the bag-of-words model for visual categorization in terms of computational cost and identify two major bottlenecks: the quantization step and the classification step. We address these two bottlenecks by proposing two efficient algorithms for quantization and classification by exploiting the GPU hardware and the CUDA parallel programming model. The algorithms are designed to 1) keep categorization accuracy intact, 2) decompose the problem, and 3) give the same numerical results. In the experiments on large scale datasets, it is shown that, by using a parallel implementation on the Geforce GTX260 GPU, classifying unseen images is 4.8 times faster than a quad-core CPU version on the Core i7 920, while giving the exact same numerical results. In addition, we show how the algorithms can be generalized to other applications, such as text retrieval and video retrieval. Moreover, when the obtained speedup is used to process extra video frames in a video retrieval benchmark, the accuracy of visual categorization is improved by 29%.
The MediaMill TRECVID 2010 Semantic Video Search Engine
"... In this paper we describe our TRECVID 2010 video retrieval experiments. The MediaMill team participated in three tasks: semantic indexing, known-item search, and instance search. The starting point for the MediaMill concept detection approach is our top-performing bag-of-words system of TRECVID 2009 ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
In this paper we describe our TRECVID 2010 video retrieval experiments. The MediaMill team participated in three tasks: semantic indexing, known-item search, and instance search. The starting point for the MediaMill concept detection approach is our top-performing bag-of-words system of TRECVID 2009, which uses multiple color SIFT descriptors, sparse codebooks with spatial pyramids, kernelbased machine learning, and multi-frame video processing. We improve upon this baseline system by further speeding up its execution times for both training and classification using GPU-optimized algorithms, approximated histogram intersection kernels, and several multi-frame combination methods. Being more efficient allowed us to supplement the Internet video training collection with positively labeled examples from international news broadcasts and Dutch documentary video from the TRECVID 2005-2009 benchmarks. Our experimental setup covered a huge training set of 170 thousand keyframes and a test set of 600 thousand keyframes in total. Ultimately leading to 130 robust concept detectors for video retrieval. For retrieval, a robust but limited set of concept detectors justifies the need to rely on as many auxiliary information channels as possible. For automatic known item search we therefore explore how we can learn to rank various information channels simultaneously to maximize video search results for a given topic. To further improve the video retrieval results, our interactive known item search experiments investigate how to combine metadata search and visualization into a single interface. The 2010 edition of the TRECVID benchmark has again been a fruitful participation for the MediaMill team, resulting in the top ranking for concept detection in the semantic indexing task. 1
Differences in Video Search Behavior between Novices and Archivists
"... Abstract. Improving the user’s interaction with a video retrieval system requires to examine the search behavior of real users. We present in this article a study comparing the video search behavior between professional archivists and novice users. The comparison focuses on the use and effectiveness ..."
Abstract
- Add to MetaCart
Abstract. Improving the user’s interaction with a video retrieval system requires to examine the search behavior of real users. We present in this article a study comparing the video search behavior between professional archivists and novice users. The comparison focuses on the use and effectiveness of different state-of-the-art video search methods offered by our retrieval system, and the result investigation behavior of the two user groups. We conducted our experiments in the context of TRECVID’s 2009 interactive search task, using the provided collection and topics for our evaluation. The findings are based on a qualitative questionnaire analysis and a quantitative examination of the logged user actions on the search interface. The experimental results indicate that today’s visual search techniques have improved in effectiveness, confirming a trend found in previous user studies. To our surprise, professional archivists used visual concept search in many of their searches. Queries containing visual concepts were more effective, resulting in more relevant shots found than the alternative methods. Overall, we conclude that professional archivists are more focused on recall in carrying out their search tasks, and are better at reflecting on their own search performance. 1
The University of Amsterdam’s Concept
"... Our group within the University of Amsterdam participated in the large-scale visual concept detection task of ImageCLEF 2010. The submissions from our visual concept detection system have resulted in the best visual-only run in the per-concept evaluation. In the per-image evaluation, it achieves the ..."
Abstract
- Add to MetaCart
Our group within the University of Amsterdam participated in the large-scale visual concept detection task of ImageCLEF 2010. The submissions from our visual concept detection system have resulted in the best visual-only run in the per-concept evaluation. In the per-image evaluation, it achieves the highest score in terms of example-based F-measure across all types of runs.
Florida International University and University of Miami TRECVID 2010- Semantic Indexing
"... The paper presents the framework and results of team Florida International University- University of Miami (FIU-UM) for the task of semantic indexing of TRECVID 2010. In this task, we submitted four runs of results: • F A FIU-UM-1 1: KF+RERANK- apply subspace learning and classification using key fr ..."
Abstract
- Add to MetaCart
The paper presents the framework and results of team Florida International University- University of Miami (FIU-UM) for the task of semantic indexing of TRECVID 2010. In this task, we submitted four runs of results: • F A FIU-UM-1 1: KF+RERANK- apply subspace learning and classification using key framebased low-level features (KF) and co-occurrence probability re-ranking method (RERANK). • F A FIU-UM-2 2: LF+KF+SF+RERANK- apply subspace learning and classification using late fusion (LF), i.e., key frame-based low-level features (KF) and shot based low-level features (SF) separately. Then co-occurrence probability re-ranking method (RERANK) is used for both keyframe based model and shot based model. Finally, a fusion method combines ranking scores from each model and generates the final ranked shots. • F A FIU-UM-3 3: EF+KF+SF+RERANK- apply subspace learning and classification using early fusion (EF), i.e., combined features from the selected key frame-based low-level features (KF) and shot based low-level features (SF). Then co-occurrence probability re-ranking method (RERANK) is used.
TRECVID 2010 Known-item Search (KIS) Task by I 2 R
"... The KIS task can be regarded as an extreme case of target-specific video search, in which the query aims to uniquely locate a single true answer. Locating the unique video for a query, however, poses new challenges over existing information retrieval approaches. Our participation in TRECVID this yea ..."
Abstract
- Add to MetaCart
The KIS task can be regarded as an extreme case of target-specific video search, in which the query aims to uniquely locate a single true answer. Locating the unique video for a query, however, poses new challenges over existing information retrieval approaches. Our participation in TRECVID this year focuses on how to adapt traditional information retrieval, specifically video search, methods to KIS in both automatic and interactive setting. In automatic KIS, as there exists a single true answer for each query, the input queries are expected to present distinctive information locating a unique entity but not a broad topic covering a number of relevant videos. Therefore, query formulation is one of our focuses in automatic KIS. On the other end of the spectrum, our emphasis in interactive KIS is two-fold. First, an intuitive and user-friendly user interface is developed to facilitate the browsing of returned videos. As the query is usually specific, we postulate that searchers can quickly reject most of the negative videos after seeing a few keyframes of the video. This premise of “fast rejection ” motivates us to leverage the storyboard to pre-visualize a video. When users can not reject a returned video as negative, he/she may indicate it as a relevant one. By collecting a number of relevant videos, the searchers can perform relevance feedback to refine the retrieval and continue the search. The automatic and interactive KIS achieve MAP of 0.454 and 0.727 respectively, showing the effectiveness of the proposed methods. 1.
Long Term Activity Analysis in Surveillance Video Archives
, 2010
"... Surveillance video recording is becoming ubiquitous in daily life for public areas such as supermarkets, banks, and airports. The rate at which surveillance video is being generated has accelerated demand for machine understanding to enable better content-based search capabilities. Analyzing human a ..."
Abstract
- Add to MetaCart
Surveillance video recording is becoming ubiquitous in daily life for public areas such as supermarkets, banks, and airports. The rate at which surveillance video is being generated has accelerated demand for machine understanding to enable better content-based search capabilities. Analyzing human activity is one of the key tasks to understand and search surveillance videos. In this thesis, we perform a comprehensive study on analyzing human activities from short term to long term and from simple to complicated activities in surveillance video achieves. A general, efficient and robust human activity recognition framework is proposed. We extract local descriptors at salient points from videos to represent human activities. The local descriptor is called Motion SIFT (MoSIFT) which explicitly augments appearance features with motion information. A quantization and classification framework then applies the descriptors to recognize activities of interest in surveillance
The MediaMill TRECVID 2011 Semantic Video Search Engine
"... In this paper we describe our TRECVID 2011 video retrieval experiments. The MediaMill team participated in two tasks: semantic indexing and multimedia event detection. The starting point for the MediaMill detection approach is our top-performing bag-of-words system of TRECVID 2010, which uses multip ..."
Abstract
- Add to MetaCart
In this paper we describe our TRECVID 2011 video retrieval experiments. The MediaMill team participated in two tasks: semantic indexing and multimedia event detection. The starting point for the MediaMill detection approach is our top-performing bag-of-words system of TRECVID 2010, which uses multiple color SIFT descriptors, sparse codebooks with spatial pyramids, and kernel-based machine learning. All supported by GPU-optimized algorithms, approximated histogram intersection kernels, and multi-frame video processing. This year our experiments focus on 1) the soft assignment of descriptors with the use of difference coding, 2) the exploration of bag-of-words for event detection, and 3) the selection of informative concepts out of 1,346 concept detectors as a representation for event detection. The 2011 edition of the TRECVID benchmark has again been a fruitful participation for the MediaMill team, resulting in the runner-up ranking for concept detection in the semantic indexing task. 1
Accelerating Visual Categorization with the GPU
, 2010
"... Visual categorization is important to manage large collections of digital images and video, where textual meta-data is often incomplete or simply unavailable. The bag-of-words model has become the most powerful method for visual categorization of images and video. Despite its high accuracy, a sever ..."
Abstract
- Add to MetaCart
Visual categorization is important to manage large collections of digital images and video, where textual meta-data is often incomplete or simply unavailable. The bag-of-words model has become the most powerful method for visual categorization of images and video. Despite its high accuracy, a severe drawback of this model is its high computational cost. As the trend to increase computational power in newer CPU and GPU architectures is to increase their level of parallelism, exploiting this parallelism becomes an important direction to handle the computational cost of the bag-of-words approach. In this paper, we analyze the bag-of-words model for visual categorization in terms of computational cost and identify two major bottlenecks: the quantization step and the classification step. We address these two bottlenecks by proposing two efficient algorithms for quantization and classification by exploiting the GPU hardware and the CUDA parallel programming model. The algorithms are designed to keep categorization accuracy intact and give the same numerical results. In the experiments on large scale datasets it is shown that, by using a parallel implementation on the GPU, quantization is 28 times faster and classification is 35 faster than a single-threaded CPU version, while giving the exact same numerical results. The GPU accelerations are applicable to both the learning phase and the testing phase of visual categorization systems. For software visit

