• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

A survey on visual content-based video indexing and retrieval. Systems, Man, and Cybernetics, Part C: Applications and Reviews, (2011)

by W Hu, N Xie, L Li, X Zeng, S Maybank
Venue:IEEE Transactions on,
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 26
Next 10 →

Zero-Shot Video Retrieval Using Content and Concepts

by Jeffrey Dalton, James Allan, Pranav Mirajkar
"... Recent research in video retrieval has been successful at finding videos when the query consists of tens or hundreds of sample relevant videos for training supervised models. Instead, we investigate unsupervised zero-shot retrieval where no training videos are provided: a query consists only of a te ..."
Abstract - Cited by 11 (0 self) - Add to MetaCart
Recent research in video retrieval has been successful at finding videos when the query consists of tens or hundreds of sample relevant videos for training supervised models. Instead, we investigate unsupervised zero-shot retrieval where no training videos are provided: a query consists only of a text statement. For retrieval, we use text extracted from images in the videos, text recognized in the speech of its audio track, as well as automatically detected semantically meaningful visual video concepts identified with widely varying confidence in the videos. In this work we introduce a new method for automatically identifying relevant concepts given a text query using the Markov Random Field (MRF) retrieval framework. We use source expansion to build rich textual representations of semantic video concepts from large external sources such as the web. We find that concept-based retrieval significantly outperforms text based approaches in recall. Using an evaluation derived from the TRECVID MED’11 track, we present early results that an approach using multi-modal fusion can compensate for inadequacies in each modality, resulting in substantial effectiveness gains. With relevance feedback, our approach provides additional improvements of over 50%.
(Show Context)

Citation Context

...lt of human judgment of the top ten returned results.2. RELATED WORK For an overview of recent work we refer the reader to surveys of content-based video indexing and retrieval provided by Hu et al. =-=[5]-=- and Snoek and Worring [16]. Previous work mapping text to concepts relies upon exact or approximate string matching or by associating ASR transcripts [10] with the concepts. Snoek et al. [18] use the...

Saying what you’re looking for: Linguistics meets video search

by Andrei Barbu, N. Siddharth, Jeffrey Mark Siskind , 2013
"... We present an approach to searching large video corpora for video clips which depict a natural-language query in the form of a sentence. This approach uses compositional semantics to encode subtle meaning that is lost in other systems, such as the difference between two sentences which have identica ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
We present an approach to searching large video corpora for video clips which depict a natural-language query in the form of a sentence. This approach uses compositional semantics to encode subtle meaning that is lost in other systems, such as the difference between two sentences which have identical words but en-tirely different meaning: The person rode the horse vs. The horse rode the person. Given a video-sentence pair and a natural-language parser, along with a grammar that describes the space of sentential queries, we produce a score which indicates how well the video depicts the sentence. We produce such a score for each video clip in a corpus and return a ranked list of clips. Furthermore, this approach ad-dresses two fundamental problems simultaneously: detecting and tracking objects, and recognizing whether those tracks depict the query. Because both tracking and object detection are unreliable, this uses knowledge about the intended sentential query to focus the tracker on the relevant participants and ensures that the result-ing tracks are described by the sentential query. While earlier work was limited to single-word queries which correspond to either verbs or nouns, we show how one can search for complex queries which contain multiple phrases, such as preposi-tional phrases, and modifiers, such as adverbs. We demonstrate this approach by searching for 141 queries involving people and horses interacting with each other in 10 full-length Hollywood movies. 1

Evaluation of Local Spatio-temporal Salient Feature Detectors for Human Action Recognition

by Amir H. Shabani
"... Local spatio-temporal salient features are used for a sparse and compact representation of video contents in many computer vision tasks such as human action recog-nition. To localize these features (i.e., key point detection), existing methods perform either symmetric or asymmetric multi-resolution ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Local spatio-temporal salient features are used for a sparse and compact representation of video contents in many computer vision tasks such as human action recog-nition. To localize these features (i.e., key point detection), existing methods perform either symmetric or asymmetric multi-resolution temporal filtering and use a structural or a motion saliency criteria. In a common discriminative framework for action classification, different saliency cri-teria of the structured-based detectors and different tempo-ral filters of the motion-based detectors are compared. We have two main observations. (1) The motion-based detec-tors localize features which are more effective than those of structured-based detectors. (2) The salient motion fea-tures detected using an asymmetric temporal filtering per-form better than all other sparse salient detectors and dense sampling. Based on these two observations, we recommend the use of asymmetric motion features for effective sparse video content representation and action recognition. 1
(Show Context)

Citation Context

... computer vision applications such as human action recognition [1, 2, 3, 4], video super-resolution [5], unusual event detection [6], human-computer interaction [7], and content-based video retrieval =-=[8]-=-. These features are typically localized in spatio-temporal key points where a sudden change in both space and time occurs. For example, 3D Harris corners occur when a spatially salient structure such...

1Broadcasting oneself: Visual Discovery of Vlogging Styles

by Oya Aran, Joan-isaac Biel, Daniel Gatica-perez
"... We present a data-driven approach to discover different styles that people use to present themselves in online video blogging (vlogging). By vlogging style, we denote the combination of conscious and unconscious choices that the vlogger made during the production of the vlog, affecting the video qua ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
We present a data-driven approach to discover different styles that people use to present themselves in online video blogging (vlogging). By vlogging style, we denote the combination of conscious and unconscious choices that the vlogger made during the production of the vlog, affecting the video quality, appearance, and structure. A compact set of vlogging styles is discovered using clustering methods based on a fast and robust spatio-temporal descriptor to characterize the visual activity in a vlog. On 2268 YouTube vlogs, our results show that the vlogging styles are differentiated with respect to the vloggers ’ level of editing and conversational activity in the video. Furthermore, we show that these automatically discovered styles relate to vloggers with different personality trait impressions and to vlogs that receive different levels of social attention. I.
(Show Context)

Citation Context

...classification. In [29], spatio-temporal video retrieval is reviewed. In [33], a review on concept based video retrieval is presented. A more recent survey on video indexing and retrieval is given in =-=[20]-=-. In video classification, the task is to classify a given video into one of the categories. The categories can be broadly defined, such as movie or news, or can be more specific, e.g. identifying var...

Analysis of Histogram Based Shot Segmentation Techniques for Video Summarization

by P. Mankodia, Prof Satish, K. Shah , 2014
"... Content based video indexing and retrieval has its foundations in the analyses of the prime video temporal structures. Thus, technologies for video segmentation have become important for the development of such digital video systems. Dividing a video sequence into shots is the first step towards VCA ..."
Abstract - Add to MetaCart
Content based video indexing and retrieval has its foundations in the analyses of the prime video temporal structures. Thus, technologies for video segmentation have become important for the development of such digital video systems. Dividing a video sequence into shots is the first step towards VCA and content-based video browsing and retrieval. This paper presents analysis of histogram based techniques on the compressed video features. Graphical User Interface is also designed in MATLAB to demonstrate the performance using the common performance parameters like, precision, recall and F1.
(Show Context)

Citation Context

...ogram Comparison. 1 Introduction There are many approaches for shot boundary detection like pixel based methods, block based methods, feature based methods, histogram based methods etc [1][2][3][4][5]=-=[24]-=- [25] [26][27][28][29] [30]. Histogram is a graphical representation of the tonal distribution in digital image or frame. It plots the number of pixel for each tonal value. By viewing at the histogram...

Semi-automated Query Construction for Content-based Endomicroscopy Video Retrieval

by Marzieh Koh, Ani Tafreshi, Nicolas Linard, Ayache Tom Vercauteren, Marzieh Koh, Ani Tafreshi, Nicolas Linard, Nicholas Ayache, Tom Ver, Hal Id Hal, Marzieh Koh, Ani Tafresh, Nicolas Linard, Tom Vercauteren , 2014
"... HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract - Add to MetaCart
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
(Show Context)

Citation Context

...le visual signature to query the database. The most visually similar cases are presented to the physician along with their annotations. 3 Temporal Segmentation from Kinematic Stability As outlined in =-=[6,7]-=-, temporal video segmentation is a key step in most existing video management tools. Many different types of algorithms have been developed to perform temporal video segmentation. Early techniques foc...

Computer-Aided Retinal Surgery using Data from the Video Compressed Stream

by Zakarya Droueche, Mathieu Lamard, Guy Cazuguel, Cochener Christian Roux
"... This paper introduces ongoing research on computer-aided ophthalmic surgery. We propose a Content-Based Video Retrieval (CBVR) system for surgeons decision aid: given the video stream captured by a digital camera monitoring a surgery, the system retrieves similar annotated video streams in video arc ..."
Abstract - Add to MetaCart
This paper introduces ongoing research on computer-aided ophthalmic surgery. We propose a Content-Based Video Retrieval (CBVR) system for surgeons decision aid: given the video stream captured by a digital camera monitoring a surgery, the system retrieves similar annotated video streams in video archives. For comparing videos, we propose to characterize them by features extracted from compression data. First, motion vectors are extracted from the MPEG-4 AVC/H.264 compressed video stream. Second, images are segmented into regions with homogeneous motion vectors, using region growing. Third, region displacements between consecutive frames are tracked, using the well-known Kalman filter, in order to extract features characterizing region trajectories. Other features are also extracted from the residual information encoded in the MPEG-4 AVC/H.264 compressed video stream. This residual information consists of the difference between original input images and predicted images. Once features are extracted, videos are compared using an extension of the fast dynamic time warping to multidimensional time series. In this paper, the system is applied to two medical datasets: a small dataset of 69 video-recorded retinal surgery steps and a dataset of 1,400 video-recorded cataract surgery steps. In order to assess its generality, the system is also applied to a large dataset of 1,707 movie clips with
(Show Context)

Citation Context

...ems Initially popularized in video surveillance applications [10], the importance and popularity of Content based video retrieval (CBVR) have led to several survey papers, a recent review is given in =-=[11]-=-. Other approaches for video browsing have been proposed in [12]. CBVR recently started developing in other applications. For instance, very few systems have addressed in the issue of medical training...

TOWARDS CROSS-MODAL SEARCH AND SYNCHRONIZATION OF MUSIC AND VIDEO STREAMS

by Holger Grossmann , Anna Kruspe , Jakob Abeßer , Hanna Lukashevich
"... With music markets shifting, the commercial use of music in video productions for film, TV or advertisement worldwide grows increasingly important. Our novel research project «SyncGlobal» addresses this global music licensing opportunity by developing adequate time-aware search technologies to effi ..."
Abstract - Add to MetaCart
With music markets shifting, the commercial use of music in video productions for film, TV or advertisement worldwide grows increasingly important. Our novel research project «SyncGlobal» addresses this global music licensing opportunity by developing adequate time-aware search technologies to efficiently handle sync licensing requests. The goal is to find the best acoustic or semantic matches to any video sequence from large-scale intercultural music catalogs with minimum human efforts involved. In this paper we outline conceptual issues and technology requirements derived from different application scenarios. We briefly introduce music and video segmentation, cross-modal semantic mapping strategies and time-aware music search techniques based on audio signal analysis.
(Show Context)

Citation Context

...earch task. It is natural when similar video content is accompanied with a similar music. In order to tackle this problem it is important to find the segment borders and the similar and/or repeating sections for both audio and video. The task of structure analysis is well-studied for both video and music. A survey of shot boundary detection (SBD) of the research work of the annual Text Retrieval Conference Video Retrieval Evaluation (TrecVid) can be found in [14]. A less specific survey on scene segmentation and many other important issues to video analysis and video retrieval can be found in [7]. In SyncGlobal project we propose to use a two-pass algorithm for scene segmentation which uses content-based features in the first step and video production grammar rules as scene merging criteria in the second step. An overview of state-of-the- art methods for computational music structure analysis is given in [11]. Here an audio recording is divided into temporal segments corresponding to musical parts which are further grouped into musically meaningful categories. For music segmentation in SyncGlobal project we apply a method derived from the DISTBIC algorithm successfully applied for Spe...

Memory Recall Based Video Search: Finding Videos You Have Seen Before Based on Your Memory

by unknown authors
"... We often remember images and videos that we have seen or recorded before but cannot quite recall the exact venues or details of the contents. We typically have vague memories of the contents, which can often be expressed as a textual description and/or rough visual descriptions of the scenes. Using ..."
Abstract - Add to MetaCart
We often remember images and videos that we have seen or recorded before but cannot quite recall the exact venues or details of the contents. We typically have vague memories of the contents, which can often be expressed as a textual description and/or rough visual descriptions of the scenes. Using these vague memories, we then want to search for the corresponding videos of interest. We call this “Memory Recall based Video Search ” (MRVS). To tackle this problem, we propose a video search system that permits a user to input his/her vague and incomplete query as a combination of text query, a sequence of visual queries, and/or concept queries. Here, a visual query is often in the form of a visual sketch depicting the outline of scenes within the desired video, while each corresponding concept query depicts a list of visual concepts that appears in that scene. As the query specified by users is generally approximate or incomplete, we need to develop techniques to handle this inexact and incomplete specification by also leveraging on user feedback to refine the specification. We utilize several innovative approaches to enhance the automatic search. First, we employ a visual query suggestion model to automatically suggest potential visual features to users as better queries. Second, we utilize a color similarity matrix to help compensate for inexact color specification in visual queries. Third, we leverage on the ordering of visual queries and/or concept queries to rerank the results by using a greedy algorithm. Moreover, as the query is inexact and there is likely to be only one or few possible answers, we incorporate an interactive feedback loop to permit the users to label related samples which are visually similar or semantically close to the relevant sample. Based on the labeled samples, we then propose optimization algorithms to update visual queries and concept
(Show Context)

Citation Context

...leteness of text annotations, however, it also introduces noises, which may decrease the search performance. To complement the text-based video search, “Content-based video search” approaches (CBVR) [=-=Hu et al. 2011-=-; Liu et al. 2009b] was proposed. In CBVR, users first present a query in the form of a video/image sample (query-by-example) [Snoek et al. 2008] or a sketch (query-by-sketch) [Hu et al. 2007]. The sy...

The Case for Offload Shaping

by Wenlu Hu , Brandon Amos , Zhuo Chen , Kiryong Ha , Wolfgang Richter , Padmanabhan Pillai , Benjamin Gilbert , Jan Harkes , Mahadev Satyanarayanan
"... ABSTRACT When offloading computation from a mobile device, we show that it can pay to perform additional on-device work in order to reduce the offloading workload. We call this offload shaping, and demonstrate its application at many different levels of abstraction using a variety of techniques. We ..."
Abstract - Add to MetaCart
ABSTRACT When offloading computation from a mobile device, we show that it can pay to perform additional on-device work in order to reduce the offloading workload. We call this offload shaping, and demonstrate its application at many different levels of abstraction using a variety of techniques. We show that offload shaping can produce significant reduction in resource demand, with little loss of application-level fidelity.
(Show Context)

Citation Context

...64 encoding, we did not find a clear relationship with the size of the compressed frame, but did find a statistical correlation between similarity and data size when normalized to the preceding keyframe size (Figure 12). Here, the encoding was based on GOP (interval between keyframes) of 10, used an x264 “medium” preset with a “zero-latency” tuning option, and omitted B-frames. The correlation is very noisy, and whether normalized encoded frame size is useful for predicting similarity is left for future research. Furthermore, given the vast literature on video indexing and key frame selection [12], there may be other encoding techniques that provide a better correlation with similarity that can be leveraged for offload shaping. (a) Video with a large Coke can (446 frames) (b) Video with a normalsize Coke can (510 frames) Figure 13: Example frames for red filter No Send Improveshaping red only ment Bytes transferred 8.6M 2.8M 67% Frames recognized 396(3) 380(5) -4% E2E latency (ms) 471(12) 153(2) 68% Glass power (W) 1.80(0.01) 1.99(0.02) -11% Glass energy (J/frame) 0.84(0.01) 0.28(0.01) 67% MJPEG encoding is used to deal with varying frame size. Figure 14: Red filter with MOPED server 5...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University