Results 1 - 10
of
26
Zero-Shot Video Retrieval Using Content and Concepts
"... Recent research in video retrieval has been successful at finding videos when the query consists of tens or hundreds of sample relevant videos for training supervised models. Instead, we investigate unsupervised zero-shot retrieval where no training videos are provided: a query consists only of a te ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
(Show Context)
Recent research in video retrieval has been successful at finding videos when the query consists of tens or hundreds of sample relevant videos for training supervised models. Instead, we investigate unsupervised zero-shot retrieval where no training videos are provided: a query consists only of a text statement. For retrieval, we use text extracted from images in the videos, text recognized in the speech of its audio track, as well as automatically detected semantically meaningful visual video concepts identified with widely varying confidence in the videos. In this work we introduce a new method for automatically identifying relevant concepts given a text query using the Markov Random Field (MRF) retrieval framework. We use source expansion to build rich textual representations of semantic video concepts from large external sources such as the web. We find that concept-based retrieval significantly outperforms text based approaches in recall. Using an evaluation derived from the TRECVID MED’11 track, we present early results that an approach using multi-modal fusion can compensate for inadequacies in each modality, resulting in substantial effectiveness gains. With relevance feedback, our approach provides additional improvements of over 50%.
Saying what you’re looking for: Linguistics meets video search
, 2013
"... We present an approach to searching large video corpora for video clips which depict a natural-language query in the form of a sentence. This approach uses compositional semantics to encode subtle meaning that is lost in other systems, such as the difference between two sentences which have identica ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We present an approach to searching large video corpora for video clips which depict a natural-language query in the form of a sentence. This approach uses compositional semantics to encode subtle meaning that is lost in other systems, such as the difference between two sentences which have identical words but en-tirely different meaning: The person rode the horse vs. The horse rode the person. Given a video-sentence pair and a natural-language parser, along with a grammar that describes the space of sentential queries, we produce a score which indicates how well the video depicts the sentence. We produce such a score for each video clip in a corpus and return a ranked list of clips. Furthermore, this approach ad-dresses two fundamental problems simultaneously: detecting and tracking objects, and recognizing whether those tracks depict the query. Because both tracking and object detection are unreliable, this uses knowledge about the intended sentential query to focus the tracker on the relevant participants and ensures that the result-ing tracks are described by the sentential query. While earlier work was limited to single-word queries which correspond to either verbs or nouns, we show how one can search for complex queries which contain multiple phrases, such as preposi-tional phrases, and modifiers, such as adverbs. We demonstrate this approach by searching for 141 queries involving people and horses interacting with each other in 10 full-length Hollywood movies. 1
Evaluation of Local Spatio-temporal Salient Feature Detectors for Human Action Recognition
"... Local spatio-temporal salient features are used for a sparse and compact representation of video contents in many computer vision tasks such as human action recog-nition. To localize these features (i.e., key point detection), existing methods perform either symmetric or asymmetric multi-resolution ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Local spatio-temporal salient features are used for a sparse and compact representation of video contents in many computer vision tasks such as human action recog-nition. To localize these features (i.e., key point detection), existing methods perform either symmetric or asymmetric multi-resolution temporal filtering and use a structural or a motion saliency criteria. In a common discriminative framework for action classification, different saliency cri-teria of the structured-based detectors and different tempo-ral filters of the motion-based detectors are compared. We have two main observations. (1) The motion-based detec-tors localize features which are more effective than those of structured-based detectors. (2) The salient motion fea-tures detected using an asymmetric temporal filtering per-form better than all other sparse salient detectors and dense sampling. Based on these two observations, we recommend the use of asymmetric motion features for effective sparse video content representation and action recognition. 1
1Broadcasting oneself: Visual Discovery of Vlogging Styles
"... We present a data-driven approach to discover different styles that people use to present themselves in online video blogging (vlogging). By vlogging style, we denote the combination of conscious and unconscious choices that the vlogger made during the production of the vlog, affecting the video qua ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
We present a data-driven approach to discover different styles that people use to present themselves in online video blogging (vlogging). By vlogging style, we denote the combination of conscious and unconscious choices that the vlogger made during the production of the vlog, affecting the video quality, appearance, and structure. A compact set of vlogging styles is discovered using clustering methods based on a fast and robust spatio-temporal descriptor to characterize the visual activity in a vlog. On 2268 YouTube vlogs, our results show that the vlogging styles are differentiated with respect to the vloggers ’ level of editing and conversational activity in the video. Furthermore, we show that these automatically discovered styles relate to vloggers with different personality trait impressions and to vlogs that receive different levels of social attention. I.
Analysis of Histogram Based Shot Segmentation Techniques for Video Summarization
, 2014
"... Content based video indexing and retrieval has its foundations in the analyses of the prime video temporal structures. Thus, technologies for video segmentation have become important for the development of such digital video systems. Dividing a video sequence into shots is the first step towards VCA ..."
Abstract
- Add to MetaCart
(Show Context)
Content based video indexing and retrieval has its foundations in the analyses of the prime video temporal structures. Thus, technologies for video segmentation have become important for the development of such digital video systems. Dividing a video sequence into shots is the first step towards VCA and content-based video browsing and retrieval. This paper presents analysis of histogram based techniques on the compressed video features. Graphical User Interface is also designed in MATLAB to demonstrate the performance using the common performance parameters like, precision, recall and F1.
Semi-automated Query Construction for Content-based Endomicroscopy Video Retrieval
, 2014
"... HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract
- Add to MetaCart
(Show Context)
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Computer-Aided Retinal Surgery using Data from the Video Compressed Stream
"... This paper introduces ongoing research on computer-aided ophthalmic surgery. We propose a Content-Based Video Retrieval (CBVR) system for surgeons decision aid: given the video stream captured by a digital camera monitoring a surgery, the system retrieves similar annotated video streams in video arc ..."
Abstract
- Add to MetaCart
(Show Context)
This paper introduces ongoing research on computer-aided ophthalmic surgery. We propose a Content-Based Video Retrieval (CBVR) system for surgeons decision aid: given the video stream captured by a digital camera monitoring a surgery, the system retrieves similar annotated video streams in video archives. For comparing videos, we propose to characterize them by features extracted from compression data. First, motion vectors are extracted from the MPEG-4 AVC/H.264 compressed video stream. Second, images are segmented into regions with homogeneous motion vectors, using region growing. Third, region displacements between consecutive frames are tracked, using the well-known Kalman filter, in order to extract features characterizing region trajectories. Other features are also extracted from the residual information encoded in the MPEG-4 AVC/H.264 compressed video stream. This residual information consists of the difference between original input images and predicted images. Once features are extracted, videos are compared using an extension of the fast dynamic time warping to multidimensional time series. In this paper, the system is applied to two medical datasets: a small dataset of 69 video-recorded retinal surgery steps and a dataset of 1,400 video-recorded cataract surgery steps. In order to assess its generality, the system is also applied to a large dataset of 1,707 movie clips with
TOWARDS CROSS-MODAL SEARCH AND SYNCHRONIZATION OF MUSIC AND VIDEO STREAMS
"... With music markets shifting, the commercial use of music in video productions for film, TV or advertisement worldwide grows increasingly important. Our novel research project «SyncGlobal» addresses this global music licensing opportunity by developing adequate time-aware search technologies to effi ..."
Abstract
- Add to MetaCart
(Show Context)
With music markets shifting, the commercial use of music in video productions for film, TV or advertisement worldwide grows increasingly important. Our novel research project «SyncGlobal» addresses this global music licensing opportunity by developing adequate time-aware search technologies to efficiently handle sync licensing requests. The goal is to find the best acoustic or semantic matches to any video sequence from large-scale intercultural music catalogs with minimum human efforts involved. In this paper we outline conceptual issues and technology requirements derived from different application scenarios. We briefly introduce music and video segmentation, cross-modal semantic mapping strategies and time-aware music search techniques based on audio signal analysis.
Memory Recall Based Video Search: Finding Videos You Have Seen Before Based on Your Memory
"... We often remember images and videos that we have seen or recorded before but cannot quite recall the exact venues or details of the contents. We typically have vague memories of the contents, which can often be expressed as a textual description and/or rough visual descriptions of the scenes. Using ..."
Abstract
- Add to MetaCart
(Show Context)
We often remember images and videos that we have seen or recorded before but cannot quite recall the exact venues or details of the contents. We typically have vague memories of the contents, which can often be expressed as a textual description and/or rough visual descriptions of the scenes. Using these vague memories, we then want to search for the corresponding videos of interest. We call this “Memory Recall based Video Search ” (MRVS). To tackle this problem, we propose a video search system that permits a user to input his/her vague and incomplete query as a combination of text query, a sequence of visual queries, and/or concept queries. Here, a visual query is often in the form of a visual sketch depicting the outline of scenes within the desired video, while each corresponding concept query depicts a list of visual concepts that appears in that scene. As the query specified by users is generally approximate or incomplete, we need to develop techniques to handle this inexact and incomplete specification by also leveraging on user feedback to refine the specification. We utilize several innovative approaches to enhance the automatic search. First, we employ a visual query suggestion model to automatically suggest potential visual features to users as better queries. Second, we utilize a color similarity matrix to help compensate for inexact color specification in visual queries. Third, we leverage on the ordering of visual queries and/or concept queries to rerank the results by using a greedy algorithm. Moreover, as the query is inexact and there is likely to be only one or few possible answers, we incorporate an interactive feedback loop to permit the users to label related samples which are visually similar or semantically close to the relevant sample. Based on the labeled samples, we then propose optimization algorithms to update visual queries and concept
The Case for Offload Shaping
"... ABSTRACT When offloading computation from a mobile device, we show that it can pay to perform additional on-device work in order to reduce the offloading workload. We call this offload shaping, and demonstrate its application at many different levels of abstraction using a variety of techniques. We ..."
Abstract
- Add to MetaCart
(Show Context)
ABSTRACT When offloading computation from a mobile device, we show that it can pay to perform additional on-device work in order to reduce the offloading workload. We call this offload shaping, and demonstrate its application at many different levels of abstraction using a variety of techniques. We show that offload shaping can produce significant reduction in resource demand, with little loss of application-level fidelity.