Results 1 - 10
of
12
Learning Mixtures of Submodular Functions for Image Collection Summarization
"... We address the problem of image collection summarization by learning mixtures of submodular functions. Submodularity is useful for this problem since it naturally represents characteristics such as fidelity and diversity, desirable for any summary. Several previously proposed image summarization sco ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
(Show Context)
We address the problem of image collection summarization by learning mixtures of submodular functions. Submodularity is useful for this problem since it naturally represents characteristics such as fidelity and diversity, desirable for any summary. Several previously proposed image summarization scoring methodologies, in fact, instinctively arrived at submodularity. We provide classes of submodular compo-nent functions (including some which are instantiated via a deep neural network) over which mixtures may be learnt. We formulate the learning of such mixtures as a supervised problem via large-margin structured prediction. As a loss function, and for automatic summary scoring, we introduce a novel summary evaluation method called V-ROUGE, and test both submodular and non-submodular optimization (using the submodular-supermodular procedure) to learn a mixture of submodular functions. Interestingly, using non-submodular optimization to learn submodular functions provides the best results. We also provide a new data set consisting of 14 real-world image collections along with many human-generated ground truth summaries collected using Amazon Mechanical Turk. We compare our method with previous work on this problem and show that our learning approach outper-forms all competitors on this new data set. This paper provides, to our knowledge, the first systematic approach for quantifying the problem of image collection sum-marization, along with a new data set of image collections and human summaries. 1
Category-specific video summarization
"... Abstract. In large video collections with clusters of typical categories, such as “birthday party ” or “flash-mob”, category-specific video summa-rization can produce higher quality video summaries than unsupervised approaches that are blind to the video category. Given a video from a known category ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
(Show Context)
Abstract. In large video collections with clusters of typical categories, such as “birthday party ” or “flash-mob”, category-specific video summa-rization can produce higher quality video summaries than unsupervised approaches that are blind to the video category. Given a video from a known category, our approach first efficiently per-forms a temporal segmentation into semantically-consistent segments, delimited not only by shot boundaries but also general change points. Then, equipped with an SVM classifier, our approach assigns importance scores to each segment. The resulting video assembles the sequence of segments with the highest scores. The obtained video summary is there-fore both short and highly informative. Experimental results on videos from the multimedia event detection (MED) dataset of TRECVID’11 show that our approach produces video summaries with higher relevance than the state of the art.
E.P.: Joint summarization of large-scale collections of web images and videos for storyline reconstruction
- In: CVPR
"... In this paper, we address the problem of jointly sum-marizing large sets of Flickr images and YouTube videos. Starting from the intuition that the characteristics of the two media types are different yet complementary, we de-velop a fast and easily-parallelizable approach for creating not only high- ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
(Show Context)
In this paper, we address the problem of jointly sum-marizing large sets of Flickr images and YouTube videos. Starting from the intuition that the characteristics of the two media types are different yet complementary, we de-velop a fast and easily-parallelizable approach for creating not only high-quality video summaries but also novel struc-tural summaries of online images as storyline graphs. The storyline graphs can illustrate various events or activities associated with the topic in a form of a branching network. The video summarization is achieved by diversity ranking on the similarity graphs between images and video frames. The reconstruction of storyline graphs is formulated as the inference of sparse time-varying directed graphs from a set of photo streams with assistance of videos. For evaluation, we collect the datasets of 20 outdoor activities, consisting of 2.7M Flickr images and 16K YouTube videos. Due to the large-scale nature of our problem, we evaluate our algo-rithm via crowdsourcing using Amazon Mechanical Turk. In our experiments, we demonstrate that the proposed joint summarization approach outperforms other baselines and our own methods using videos or images only. 1.
Video Summarization by Learning Submodular Mixtures of Objectives
- IEEE Conf. Comput. Vis. Pattern Recognit
, 2015
"... We present a novel method for summarizing raw, casu-ally captured videos. The objective is to create a short sum-mary that still conveys the story. It should thus be both, interesting and representative for the input video. Previous methods often used simplified assumptions and only opti-mized for o ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
We present a novel method for summarizing raw, casu-ally captured videos. The objective is to create a short sum-mary that still conveys the story. It should thus be both, interesting and representative for the input video. Previous methods often used simplified assumptions and only opti-mized for one of these goals. Alternatively, they used hand-defined objectives that were optimized sequentially by mak-ing consecutive hard decisions. This limits their use to a particular setting. Instead, we introduce a new method that (i) uses a supervised approach in order to learn the im-portance of global characteristics of a summary and (ii) jointly optimizes for multiple objectives and thus creates summaries that posses multiple properties of a good sum-mary. Experiments on two challenging and very diverse datasets demonstrate the effectiveness of our method, where we outperform or match current state-of-the-art. 1.
Creating summaries from user videos
- In ECCV
, 2014
"... Abstract. This paper proposes a novel approach and a new benchmark for video summarization. Thereby we focus on user videos, which are raw videos containing a set of interesting events. Our method starts by seg-menting the video by using a novel “superframe ” segmentation, tailored to raw videos. Th ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
Abstract. This paper proposes a novel approach and a new benchmark for video summarization. Thereby we focus on user videos, which are raw videos containing a set of interesting events. Our method starts by seg-menting the video by using a novel “superframe ” segmentation, tailored to raw videos. Then, we estimate visual interestingness per superframe using a set of low-, mid- and high-level features. Based on this scoring, we select an optimal subset of superframes to create an informative and interesting summary. The introduced benchmark comes with multiple human created summaries, which were acquired in a controlled psycho-logical experiment. This data paves the way to evaluate summarization methods objectively and to get new insights in video summarization. When evaluating our method, we find that it generates high-quality re-sults, comparable to manual, human-created summaries.
Detecting Snap Points in Egocentric Video with a Web Photo Prior
"... Abstract. Wearable cameras capture a first-person view of the world, and offer a hands-free way to record daily experiences or special events. Yet, not every frame is worthy of being captured and stored. We propose to automatically predict “snap points ” in unedited egocentric video— that is, those ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
Abstract. Wearable cameras capture a first-person view of the world, and offer a hands-free way to record daily experiences or special events. Yet, not every frame is worthy of being captured and stored. We propose to automatically predict “snap points ” in unedited egocentric video— that is, those frames that look like they could have been intentionally taken photos. We develop a generative model for snap points that relies on a Web photo prior together with domain-adapted features. Critically, our approach avoids strong assumptions about the particular content of snap points, focusing instead on their composition. Using 17 hours of egocentric video from both human and mobile robot camera wearers, we show that the approach accurately isolates those frames that human judges would believe to be intentionally snapped photos. In addition, we demonstrate the utility of snap point detection for improving object detection and keyframe selection in egocentric video. 1
Diverse Sequential Subset Selection for Supervised Video Summarization
"... Video summarization is a challenging problem with great application potential. Whereas prior approaches, largely unsupervised in nature, focus on sampling use-ful frames and assembling them as summaries, we consider video summarization as a supervised subset selection problem. Our idea is to teach t ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Video summarization is a challenging problem with great application potential. Whereas prior approaches, largely unsupervised in nature, focus on sampling use-ful frames and assembling them as summaries, we consider video summarization as a supervised subset selection problem. Our idea is to teach the system to learn from human-created summaries how to select informative and diverse subsets, so as to best meet evaluation metrics derived from human-perceived quality. To this end, we propose the sequential determinantal point process (seqDPP), a proba-bilistic model for diverse sequential subset selection. Our novel seqDPP heeds the inherent sequential structures in video data, thus overcoming the deficiency of the standard DPP, which treats video frames as randomly permutable items. Mean-while, seqDPP retains the power of modeling diverse subsets, essential for summa-rization. Our extensive results of summarizing videos from 3 datasets demonstrate the superior performance of our method, compared to not only existing unsuper-vised methods but also naive applications of the standard DPP model. 1
Kernel Methods for Unsupervised Domain Adaptation
, 2015
"... This thesis concludes a wonderful four-year journey at USC. I would like to take the chance to express my sincere gratitude to my amazing mentors and friends during my Ph.D. training. First and foremost I would like to thank my adviser, Prof. Fei Sha, without whom there would be no single page of th ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
This thesis concludes a wonderful four-year journey at USC. I would like to take the chance to express my sincere gratitude to my amazing mentors and friends during my Ph.D. training. First and foremost I would like to thank my adviser, Prof. Fei Sha, without whom there would be no single page of this thesis. Fei is smart, knowledgeable, and inspiring. Being truly fortunate, I got an enormous amount of guidance and support from him, financially, academically, and emotionally. He consistently and persuasively conveyed the spirit of adventure in research and academia of which I appreciate very much and from which my interests in trying out the faculty life start. On one hand, Fei is tough and sets a high standard on my research at “home”— the TEDS lab he leads. On the other hand, Fei is enthusiastically supportive when I reach out to conferences and the job market. These combined make a wonderful mix. I cherish every mind-blowing discussion with him, which sometimes lasted for hours. I would like to thank our long-term collaborator, Prof. Kristen Grauman, whom I see as my other academic adviser. Like Fei, she has set such a great model for me to follow on the road of becoming a good researcher. She is a deep thinker, a fantastic writer, and a hardworking professor. I will never forget how she praised our good work, how she hesitated on my poor
Assessing the Quality of Actions
"... Abstract. While recent advances in computer vision have provided reli-able methods to recognize actions in both images and videos, the problem of assessing how well people perform actions has been largely unexplored in computer vision. Since methods for assessing action quality have many real-world ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. While recent advances in computer vision have provided reli-able methods to recognize actions in both images and videos, the problem of assessing how well people perform actions has been largely unexplored in computer vision. Since methods for assessing action quality have many real-world applications in healthcare, sports, and video retrieval, we be-lieve the computer vision community should begin to tackle this challeng-ing problem. To spur progress, we introduce a learning-based framework that takes steps towards assessing how well people perform actions in videos. Our approach works by training a regression model from spa-tiotemporal pose features to scores obtained from expert judges. More-over, our approach can provide interpretable feedback on how people can improve their action. We evaluate our method on a new Olympic sports dataset, and our experiments suggest our framework is able to rank the athletes more accurately than a non-expert human. While promising, our method is still a long way to rivaling the performance of expert judges, indicating that there is significant opportunity in computer vision re-search to improve on this difficult yet important task. 1
Fusion of Foreground Object, Spatial and Frequency Domain Motion Information for Video Summarization
"... Abstract. Surveillance video camera captures a large amount of continuous video stream every day. To analyze or investigate any significant events from the huge video data, it is laborious and boring job to identify these events. To solve this problem, a video summarization technique combining foreg ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. Surveillance video camera captures a large amount of continuous video stream every day. To analyze or investigate any significant events from the huge video data, it is laborious and boring job to identify these events. To solve this problem, a video summarization technique combining foreground objects as well as motion information in spatial and frequency domain is proposed in this paper. We extract foreground object using background modeling and motion information in spatial domain and frequency domain. Frame transition is applied for obtaining motion information in spatial domain. For acquiring motion information in frequency domain, phase correlation (PC) technique is applied. Later, foreground objects and motions in spatial and frequency domain are fused and key frames are extracted. Experimental results reveal that the proposed method performs better than the state-of-the-art method.