Results 1 - 10
of
16
Image and Video Segmentation by Anisotropic Kernel Mean Shift
- In Proc. ECCV
, 2004
"... Mean shift is a nonparametric estimator of density which has been applied to image and video segmentation. Traditional mean shift based segmentation uses a radially symmetric kernel to estimate local density, which is not optimal in view of the often structured nature of image and more particula ..."
Abstract
-
Cited by 34 (1 self)
- Add to MetaCart
Mean shift is a nonparametric estimator of density which has been applied to image and video segmentation. Traditional mean shift based segmentation uses a radially symmetric kernel to estimate local density, which is not optimal in view of the often structured nature of image and more particularly video data. In this paper we present an anisotropic kernel mean shift in which the shape, scale, and orientation of the kernels adapt to the local structure of the image or video. We decompose the anisotropic kernel to provide handles for modifying the segmentation based on simple heuristics. Experimental results show that the anisotropic kernel mean shift outperforms the original mean shift on image and video segmentation in the following aspects: 1) it gets better results on general images and video in a smoothness sense; 2) the segmented results are more consistent with human visual saliency; 3) the algorithm is robust to initial parameters.
Probabilistic Space-Time Video Modeling via Piecewise GMM
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2004
"... Abstract—In this paper, we describe a statistical video representation and modeling scheme. Video representation schemes are needed to segment a video stream into meaningful video-objects, useful for later indexing and retrieval applications. In the proposed methodology, unsupervised clustering via ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
Abstract—In this paper, we describe a statistical video representation and modeling scheme. Video representation schemes are needed to segment a video stream into meaningful video-objects, useful for later indexing and retrieval applications. In the proposed methodology, unsupervised clustering via Gaussian mixture modeling extracts coherent space-time regions in feature space, and corresponding coherent segments (video-regions) in the video content. A key feature of the system is the analysis of video input as a single entity as opposed to a sequence of separate frames. Space and time are treated uniformly. The probabilistic space-time video representation scheme is extended to a piecewise GMM framework in which a succession of GMMs are extracted for the video sequence, instead of a single global model for the entire sequence. The piecewise GMM framework allows for the analysis of extended video sequences and the description of nonlinear, nonconvex motion patterns. The extracted space-time regions allow for the detection and recognition of video events. Results of segmenting video content into static versus dynamic video regions and video content editing are presented. Index Terms—Video representation, video segmentation, detection of events in video, Gaussian mixture model. 1
H.: Context-based segmentation of image sequences
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2006
"... We describe an algorithm for context-based segmentation of visual data. New frames in an image sequence (video) are segmented based on the prior segmentation of earlier frames in the sequence. The segmentation is performed by adapting a probabilistic model learned on previous frames, according to th ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
We describe an algorithm for context-based segmentation of visual data. New frames in an image sequence (video) are segmented based on the prior segmentation of earlier frames in the sequence. The segmentation is performed by adapting a probabilistic model learned on previous frames, according to the content of the new frame. We utilize the maximum a-posteriori version of the EM algorithm to segment the new image. The Gaussian mixture distribution that is used to model the current frame, is transformed into a conjugate-prior distribution for the parametric model describing the segmentation of the new frame. This semi-supervised method improves the segmentation quality and consistency and enables a propagation of segments along the segmented images. The performance of the proposed approach is illustrated on both simulated and real image data.
Segmentation Framework Based on Label Field Fusion
"... Abstract—In this paper, we put forward a novel fusion framework that mixes together label fields instead of observation data as is usually the case. Our framework takes as input two label fields: a quickly estimated and to-be-refined segmentation map and a spatial region map that exhibits the shape ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract—In this paper, we put forward a novel fusion framework that mixes together label fields instead of observation data as is usually the case. Our framework takes as input two label fields: a quickly estimated and to-be-refined segmentation map and a spatial region map that exhibits the shape of the main objects of the scene. These two label fields are fused together with a global energy function that is minimized with a deterministic iterative conditional mode algorithm. As explained in the paper, the energy function may implement a pure fusion strategy or a fusion-reaction function. In the latter case, a data-related term is used to make the optimization problem well posed. We believe that the conceptual simplicity, the small number of parameters, the use of a simple and fast deterministic optimizer that admits a natural implementation on a parallel architecture are among the main advantages of our approach. Our fusion framework is adapted to various computer vision applications among which are motion segmentation, motion estimation and occlusion detection. Index Terms—Color segmentation, label fusion, motion estimation, motion segmentation, occlusion. I.
Motion segmentation using a k-nearest-neighbor-based fusion procedure of spatial and temporal label cues
- in Proc. ICIAR, 2005
"... Abstract. Traditional motion segmentation techniques generally depend on a pre-estimated optical flow. Unfortunately, the lack of precision over edges of most popular motion estimation methods makes them unsuited to recover the exact shape of moving objects. In this contribution, we present an origi ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract. Traditional motion segmentation techniques generally depend on a pre-estimated optical flow. Unfortunately, the lack of precision over edges of most popular motion estimation methods makes them unsuited to recover the exact shape of moving objects. In this contribution, we present an original motion segmentation technique using a K-nearest-neighbor-based fusion of spatial and temporal label cues. Our fusion model takes as input a spatial segmentation of a still image and an estimated version of the motion label field. It minimizes an energy function made of spatial and temporal label cues extracted from the two input fields. The algorithm proposed is intuitive, simple to implement and remains sufficiently general to be applied to other segmentation problems. Furthermore, the method doesn’t depend on the estimation of any threshold or any weighting function between the spatial and temporal energy terms, as is sometimes required by energy-based segmentation models. Experiments on synthetic and real image sequences indicate that the proposed method is robust and accurate. 1
Using co-occurrence and segmentation to learn feature-based object models from video
- WACV/MOTION’05
, 2005
"... A number of recent systems for unsupervised featurebased learning of object models take advantage of cooccurrence: broadly, they search for clusters of discriminative features that tend to coincide across multiple still images or video frames. An intuition behind these efforts is that regularly co-o ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
A number of recent systems for unsupervised featurebased learning of object models take advantage of cooccurrence: broadly, they search for clusters of discriminative features that tend to coincide across multiple still images or video frames. An intuition behind these efforts is that regularly co-occurring image features are likely to refer to physical traits of the same object, while features that do not often co-occur are more likely to belong to different objects. In this paper we discuss a refinement to these techniques in which multiple segmentations establish meaningful contexts for co-occurrence, or limit the spatial regions in which two features are deemed to co-occur. This approach can reduce the variety of image data necessary for model learning and simplify the incorporation of less discriminative features into the model. 1. Introduction and
Video denoising using separable 4-D nonlocal spatiotemporal transforms
"... We propose a powerful video denoising algorithm that exploits temporal and spatial redundancy characterizing natural video sequences. The algorithm implements the paradigm of nonlocal grouping and collaborative filtering, where a higher-dimensional transform-domain representation is leveraged to enf ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
We propose a powerful video denoising algorithm that exploits temporal and spatial redundancy characterizing natural video sequences. The algorithm implements the paradigm of nonlocal grouping and collaborative filtering, where a higher-dimensional transform-domain representation is leveraged to enforce sparsity and thus regularize the data. The proposed algorithm exploits the mutual similarity between 3-D spatiotemporal volumes constructed by tracking blocks along trajectories defined by the motion vectors. Mutually similar volumes are grouped together by stacking them along an additional fourth dimension, thus producing a 4-D structure, termed group, where different types of data correlation exist along the different dimensions: local correlation along the two dimensions of the blocks, temporal correlation along the motion trajectories, and nonlocal spatial correlation (i.e. self-similarity) along the fourth dimension. Collaborative filtering is realized by transforming each group through a decorrelating 4-D separable transform and then by shrinkage and inverse transformation. In this way, collaborative filtering provides estimates for each volume stacked in the group, which are then returned and adaptively aggregated to their original position in the video. Experimental results demonstrate the effectiveness of the proposed procedure which outperforms the state of the art.
Joint Key-frame Extraction and Object Segmentation for Content-based Video Analysis ∗
"... Key-frame extraction and object segmentation are usually implemented independently and separately due to the fact that they are on different semantic levels and involve different features. In this work, we propose a joint key-frame extraction and object segmentation method by constructing a unified ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Key-frame extraction and object segmentation are usually implemented independently and separately due to the fact that they are on different semantic levels and involve different features. In this work, we propose a joint key-frame extraction and object segmentation method by constructing a unified feature space for both processes, where key-frame extraction is formulated as a feature selection process for object segmentation in the context of Gaussian mixture model (GMM)-based video modeling. Specifically, two divergence-based criteria are introduced for key-frame extraction. One recommends key-frame extraction that leads to the maximum pairwise interclass divergence between GMM components. The other aims at maximizing the marginal divergence that shows the intraframe variation of the mean density. The proposed methods can extract representative key-frames for object segmentation, and some interesting characteristics of key-frames are also discussed. This work provides a unique paradigm for content-based video analysis.
(2008)" SPATIO-TEMPORAL SEGMENTATION AND REGIONS TRACKING OF HIGH DEFINITION VIDEO SEQUENCES BASED ON A MARKOV RANDOM FIELD MODEL
, 2008
"... In this paper 1, we propose a Markov Random Field sequence segmentation and regions tracking model, which aims at combining color, texture, and motion features. First a motionbased segmentation is realized. Namely the global motion of the video sequence is estimated and compensated. From the remaini ..."
Abstract
- Add to MetaCart
In this paper 1, we propose a Markov Random Field sequence segmentation and regions tracking model, which aims at combining color, texture, and motion features. First a motionbased segmentation is realized. Namely the global motion of the video sequence is estimated and compensated. From the remaining motion information, a rough motion segmentation is achieved. Then, we use a Markovian approach to update and track over time the video objects. The spatio-temporal map is updated and compensated using our Markov Random Field segmentation model to keep consistency in video objects tracking.

