Results 1 - 10
of
12
Transductive multi-view embedding for zero-shot recognition and annotation
, 2014
"... Most existing zero-shot learning approaches exploit transfer learning via an intermediate-level semantic representation such as visual attributes or semantic word vectors. Such a semantic representation is shared between an annotated auxiliary dataset and a target dataset with no annotation. A proj ..."
Abstract
-
Cited by 10 (7 self)
- Add to MetaCart
(Show Context)
Most existing zero-shot learning approaches exploit transfer learning via an intermediate-level semantic representation such as visual attributes or semantic word vectors. Such a semantic representation is shared between an annotated auxiliary dataset and a target dataset with no annotation. A projection from a low-level feature space to the seman-tic space is learned from the auxiliary dataset and is applied without adaptation to the target dataset. In this paper we identify an inher-ent limitation with this approach. That is, due to having disjoint and potentially unrelated classes, the projection functions learned from the auxiliary dataset/domain are biased when applied directly to the target dataset/domain. We call this problem the projection domain shift prob-lem and propose a novel framework, transductive multi-view embedding, to solve it. It is ‘transductive’ in that unlabelled target data points are explored for projection adaptation, and ‘multi-view ’ in that both low-level feature (view) and multiple semantic representations (views) are embedded to rectify the projection shift. We demonstrate through ex-tensive experiments that our framework (1) rectifies the projection shift between the auxiliary and target domains, (2) exploits the complementarity of multiple semantic representations, (3) achieves state-of-the-art recognition results on image and video benchmark datasets, and (4) en-ables novel cross-view annotation tasks.
Leveraging multiview video coding in clustered multimedia sensor networks
- in Proc. IEEE Globecom 2012
, 2012
"... Abstract—We experimentally characterize the compression efficiency of Multiview Video Coding (MVC) techniques in Wireless Multimedia Sensor network (WMSN) composed of multiple video cameras with possibly overlapping field of views. We derive an empirical model that predicts the compression efficienc ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
(Show Context)
Abstract—We experimentally characterize the compression efficiency of Multiview Video Coding (MVC) techniques in Wireless Multimedia Sensor network (WMSN) composed of multiple video cameras with possibly overlapping field of views. We derive an empirical model that predicts the compression efficiency as a function of the common sensed area (CSA) between different camera views. We show that the CSA depends not only on geometrical relationships among the relative positions of different cameras, but also on several object-related phenomena, e.g., occlusions and motion, and on low-level phenomena such as variations in illumination. We then apply the model to a WMSN, where we create clusters based on the CSA as estimated by exchanging local data. Based on this estimates, we form clusters and measure the resulting transmission rate. Numerical simulation results show that building clusters based on a CSA criterion can bring significant performance gains in terms of bandwidth efficiency. The herein presented promising results pave the way for clustering optimization taking into account different networks constraints and conditions.
L.: Scene-Based Movie Summarization Via Role-Community Networks
- IIEEE Trans. Circuits Syst. Video Technol
, 1927
"... Abstract—Video summarization techniques aim at condensing a full-length video to a significantly shortened version that still preserves the major semantic content of the original video. Movie summarization, being a special class of video summarization, is particularly challenging since a large varie ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Abstract—Video summarization techniques aim at condensing a full-length video to a significantly shortened version that still preserves the major semantic content of the original video. Movie summarization, being a special class of video summarization, is particularly challenging since a large variety of movie scenarios and film styles complicate the problem. In this paper, we propose a two-stage scene-based movie summarization method based on mining the relationship between role-communities since the role-communities in earlier scenes are usually used to develop the role relationship in later scenes. In the analysis stage, we con-struct a social network to characterize the interactions between role-communities. As a result, the social power of each role-community is evaluated by the community’s centrality value and the role communities are clustered into relevant groups based on the centrality values. In the summarization stage, a set of feasible summary combinations of scenes is identified and an information-rich summary is selected from these candidates based on social power preservation. Our evaluation results show that in at most test cases the proposed method achieves better subjective performance than attention-based and role-based summarization methods in terms of semantic content preservation for a movie summary. Index Terms—Movie analysis, movie summarization, social network analysis, video adaptation, video summarization. I.
Multi-camera scheduling for video production
"... We present a novel algorithm for automated video production based on content ranking. The proposed algorithm generates videos by performing camera selection while minimizing the number of inter-camera switches. We model the problem as a finite horizon Partially Observable Markov Decision Process ove ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
We present a novel algorithm for automated video production based on content ranking. The proposed algorithm generates videos by performing camera selection while minimizing the number of inter-camera switches. We model the problem as a finite horizon Partially Observable Markov Decision Process over temporal windows and we use a multivariate Gaussian distribution to represent the content-quality score for each camera. The performance of the proposed approach is demonstrated on a multi-camera setup of fixed cameras with partially overlapping fields of view. Subjective experiments based on the Turing test confirmed the quality of the automatically produced videos. The proposed approach is also compared with recent methods based on Recursive Decision and on Dynamic Bayesian Networks and its results outperform both methods.
An Empirical Model of Multiview Video Coding Efficiency for Wireless Multimedia Sensor Networks
- Proc
"... Abstract—We develop an empirical model of the Multiview Video Coding (MVC) performance that can be used to identify and separate situations when MVC is beneficial from cases when its use is detrimental in wireless multimedia sensor networks (WMSN). The model predicts the compression performance of M ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Abstract—We develop an empirical model of the Multiview Video Coding (MVC) performance that can be used to identify and separate situations when MVC is beneficial from cases when its use is detrimental in wireless multimedia sensor networks (WMSN). The model predicts the compression performance of MVC as a function of the correlation between cameras with overlapping fields of view. We define the common sensed area (CSA) between different views, and emphasize that it depends not only on geometrical relationships among the relative positions of different cameras, but also on various object-related phenomena, e.g., occlusions and motion, and on low-level phenomena such as variations in illumination. With these premises, we first experimentally characterize the relationship between MVC compression gain (with respect to single view video coding) and the CSA between views. Our experiments are based on the H.264 MVC standard, and on a low-complexity estimator of the CSA that can be computed with low inter-node signaling overhead. Then, we propose a compact empirical model of the efficiency of MVC as afunctionof the CSA between views, and we validate the model with different multiview video sequences. Finally, we show how the model can be applied to typical scenarios in WMSN, i.e., to clustered or multi-hop topologies, and we show a few promising results of its application in the definition of cross-layer clustering and data aggregation procedures. Index Terms—Multiview video coding, MVC efficiency model, video sensor networks. I.
Within and Between Shot Information Utilisation in Video Key Frame Extraction
- In: Journal of Information & Knowledge Management (JIKM
"... Abstract. In consequence of the popularity of family video recorders and the surge of Web 2.0, increasing amounts of videos have made the management and integration of the information in videos an urgent and important issue in video retrieval. Key frames, as a high-quality summary of videos, play an ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract. In consequence of the popularity of family video recorders and the surge of Web 2.0, increasing amounts of videos have made the management and integration of the information in videos an urgent and important issue in video retrieval. Key frames, as a high-quality summary of videos, play an important role in the areas of video browsing, searching, categorisation, and indexing. An e®ective set of key frames should include major objects and events of the video sequence, and should contain minimum content redundancies. In this paper, an innovative key frame extraction method is proposed to select representative key frames for a video. By analysing the di®erences between frames and utilising the clustering technique, a set of key frame candi-dates (KFCs) is ¯rst selected at the shot level, and then the information within a video shot and between video shots is used to ¯lter the candidate set to generate the ¯nal set of key frames. Experimental results on the TRECVID 2007 video dataset have demonstrated the e®ectiveness of our proposed key frame extraction method in terms of the percentage of the extracted key frames and the retrieval precision.
DISPARITY MAP GENERATION BASED ON TRAPEZOIDAL CAMERA ARCHITECTURE FOR MULTI-VIEW VIDEO
"... Visual content acquisition is a strategic functional block of any visual system. Despite its wide possibilities, the arrangement of cameras for the acquisition of good quality visual content for use in multi-view video remains a huge challenge. This paper presents the mathematical description of tra ..."
Abstract
- Add to MetaCart
(Show Context)
Visual content acquisition is a strategic functional block of any visual system. Despite its wide possibilities, the arrangement of cameras for the acquisition of good quality visual content for use in multi-view video remains a huge challenge. This paper presents the mathematical description of trapezoidal camera architecture and relationships which facilitate the determination of camera position for visual content acquisition in multi-view video, and depth map generation. The strong point of Trapezoidal Camera Architecture is that it allows for adaptive camera topology by which points within the scene, especially the occluded ones can be optically and geometrically viewed from several different viewpoints either on the edge of the trapezoid or inside it. The concept of maximum independent set, trapezoid characteristics, and the fact that the positions of cameras (with the exception of few) differ in their vertical coordinate description could very well be used to address the issue of occlusion which continues to be a major problem in computer vision with regards to the generation of depth map.
Transductive Multi-view Zero-Shot Learning
, 2015
"... Most existing zero-shot learning approaches exploit transfer learning via an intermediate semantic representation shared between an annotated auxiliary dataset and a target dataset with different classes and no annotation. A projection from a low-level feature space to the semantic representation s ..."
Abstract
- Add to MetaCart
Most existing zero-shot learning approaches exploit transfer learning via an intermediate semantic representation shared between an annotated auxiliary dataset and a target dataset with different classes and no annotation. A projection from a low-level feature space to the semantic representation space is learned from the auxiliary dataset and applied without adaptation to the target dataset. In this paper we identify two inherent limitations with these approaches. First, due to having disjoint and potentially unrelated classes, the projection functions learned from the auxiliary dataset/domain are biased when applied directly to the target dataset/domain. We call this problem the projection domain shift problem and propose a novel framework, transductive multi-view embedding, to solve it. The second limitation is the prototype sparsity problem which refers to the fact that for each target class, only a single prototype is available for zero-shot learning given a semantic representation. To overcome this problem, a novel heterogeneous multi-view hypergraph label propagation method is formulated for zero-shot learning in the transductive embedding space. It effectively exploits the complementary information offered by different semantic representations and takes advantage of the manifold structures of multiple representation spaces in a coherent manner. We demonstrate through extensive experiments that the proposed approach (1) rectifies the projection shift between the auxiliary and target domains, (2) exploits the complementarity of multiple semantic representations, (3) significantly outperforms existing methods for both zero-shot and N-shot recognition on three image and video benchmark datasets, and (4) enables novel cross-view annotation tasks.
Improving Video Streams Summarization Using Synthetic Noisy Video Data
"... Abstract—For monitoring public domains, surveillance camera systems are used. Reviewing and processing any subsequences from large amount of raw video streams is time and space consuming. Many efficient approaches of video summarization were proposed to reduce the amount of irrelevant information. M ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—For monitoring public domains, surveillance camera systems are used. Reviewing and processing any subsequences from large amount of raw video streams is time and space consuming. Many efficient approaches of video summarization were proposed to reduce the amount of irrelevant information. Most of these approaches do not take into consideration the illumination or lighting changes that cause noise in video sequences. In this work, video summarization algorithm for video streams has been proposed using Histogram of Oriented Gradient and Correlation coefficients techniques. This algorithm has been applied on the proposed multi-model dataset which is created by combining the original data and the dynamic synthetic data. This dynamic data is proposed using Random Number Generator function. Experiments on this dataset showed the effectiveness of the proposed algorithm compared with traditional dataset. Keywords—Video summarization; Histogram of Oriented Gradient (HOG); Correlation coefficients (R); key frames;
Gangneung-Wonju National
"... Mobile platform based multi view video applications have gained significant attention due to increase in processing power of mobile processors. Significant performance improvements have been reported by using H.264/AVC based video encoding-decoding procedures. Multi-view coding (MVC) is an extension ..."
Abstract
- Add to MetaCart
(Show Context)
Mobile platform based multi view video applications have gained significant attention due to increase in processing power of mobile processors. Significant performance improvements have been reported by using H.264/AVC based video encoding-decoding procedures. Multi-view coding (MVC) is an extension of H.264/AVC scheme employed for high performance compression of multi view videos. The increase in spatial/temporal information has increased the difficulties in real-time decoding. In existing implementations, multi-processor based architectures have been employed to achieve real-time processing in H.264/AVC. In this paper, a parallel MVC decoding scheme is presented. The implementation is based on pipelined architecture of Cortex-A8 processor and graphics processing unit (GPU). The decoding procedure is divided into block based and pixel based operations. The GPU is employed for processing pixel based operations while the CPU performs the block based sequential operations. The implementation results shows that the proposed scheme can achieve same peak signal-to-noise ratio (PSNR) performance with more than 29 % increase in decoding speed. Hence, the proposed scheme can achieve real-time performance and can be used in mobile platform multi-video processing applications.