Graph Cut based Inference with Cooccurrence Statistics
Cited by 85 (12 self)
Abstract. Markov and Conditional random fields (CRFs) used in computer vision typically model only local interactions between variables, as this is computationally tractable. In this paper we consider a class of global potentials defined over all variables in the CRF. We show how they can be readily optimised using standard graph cut algorithms at little extra expense compared to a standard pairwise field. This result can be directly used for the problem of class based image segmentation which has seen increasing recent interest within computer vision. Here the aim is to assign a label to each pixel of a given image from a set of possible object classes. Typically these methods use random fields to model local interactions between pixels or superpixels. One of the cues that helps recognition is global object cooccurrence statistics, a measure of which classes (such as chair or motorbike) are likely to occur in the same image together. There have been several approaches proposed to exploit this property, but all of them suffer from different limitations and typically carry a high computational cost, preventing their application on large images. We find that the new model we propose produces an improvement in the labelling compared to just using a pairwise model. 1
Track to the Future: Spatiotemporal Video Segmentation with Longrange Motion Cues
Cited by 48 (2 self)
Video provides not only rich visual cues such as motion and appearance, but also much less explored longrange temporal interactions among objects. We aim to capture such interactions and to construct a powerful intermediatelevel video representation for subsequent recognition. Motivated by this goal, we seek to obtain spatiotemporal oversegmentation of a video into regions that respect object boundaries and, at the same time, associate object pixels over many video frames. The contributions of this paper are twofold. First, we develop an efficient spatiotemporal video segmentation algorithm, which naturally incorporates longrange motion cues from the past and future frames in the form of clusters of point tracks with coherent motion. Second, we devise a new track clustering cost function that includes occlusion reasoning, in the form of depth ordering constraints, as well as motion similarity along the tracks. We evaluate the proposed approach on a challenging set of video sequences of office scenes from feature length movies. 1.
A Comparative Study of Modern Inference Techniques for Discrete Energy Minimization Problem
Cited by 38 (9 self)
Seven years ago, Szeliski et al. published an influential study on energy minimization methods for Markov random fields (MRF). This study provided valuable insights in choosing the best optimization technique for certain classes of problems. While these insights remain generally useful today, the phenominal success of random field models means that the kinds of inference problems we solve have changed significantly. Specifically, the models today often include higher order interactions, flexible connectivity structures, large labelspaces of different cardinalities, or learned energy tables. To reflect these changes, we provide a modernized and enlarged study. We present an empirical comparison of 24 stateofart techniques on a corpus of 2,300 energy minimization instances from 20 diverse computer vision applications. To ensure reproducibility, we evaluate all methods in the OpenGM2 framework and report extensive results regarding runtime and solution quality. Key insights from our study agree with the results of Szeliski et al. for the types of models they studied. However, on new and challenging types of models our findings disagree and suggest that polyhedral methods and integer programming solvers are competitive in terms of runtime and solution quality over a large range of model types.
DiscreteContinuous Optimization for MultiTarget Tracking
Cited by 34 (5 self)
The problem of multitarget tracking is comprised of two distinct, but tightly coupled challenges: (i) the naturally discrete problem of data association, i.e. assigning image observations to the appropriate target; (ii) the naturally continuous problem of trajectory estimation, i.e. recovering the trajectories of all targets. To go beyond simple greedy solutions for data association, recent approaches often perform multitarget tracking using discrete optimization. This has the disadvantage that trajectories need to be precomputed or represented discretely, thus limiting accuracy. In this paper we instead formulate multitarget tracking as a discretecontinuous optimization problem that handles each aspect in its natural domain and allows leveraging powerful methods for multimodel fitting. Data association is performed using discrete optimization with label costs, yielding near optimality. Trajectory estimation is posed as a continuous fitting problem with a simple closedform solution, which is used in turn to update the label costs. We demonstrate the accuracy and robustness of our approach with stateoftheart performance on several standard datasets. 1.
Probabilistic image segmentation with closedness constraints
 In ICCV
, 2011
Cited by 27 (15 self)
We propose a novel graphical model for probabilistic image segmentation that contributes both to aspects of perceptual grouping in connection with image segmentation, and to globally optimal inference with higherorder graphical models. We represent image partitions in terms of cellular complexes in order to make the duality between connected regions and their contours explicit. This allows us to formulate a graphical model with higherorder factors that represent the requirement that all contours must be closed. The model induces a probability measure on the space of all partitions, concentrated on perceptually meaningful segmentations. We give a complete polyhedral characterization of the resulting global inference problem in terms of the multicut polytope and efficiently compute global optima by a cutting plane method. Competitive results for the Berkeley segmentation benchmark confirm the consistency of our approach. 1
Photoinspired modeldriven 3D object modeling
 ACM Trans. on Graphics (Proc. SIGGRAPH
, 2011
Cited by 25 (11 self)
variations of the candidates to fit the target object in the photo while preserving the 3D structure of the candidates. We introduce an algorithm for 3D object modeling where the user draws creative inspiration from an object captured in a single photograph. Our method leverages the rich source of photographs for creative 3D modeling. However, with only a photo as a guide, creating a 3D model from scratch is a daunting task. We support the modeling process by utilizing an available set of 3D candidate models. Specifically, the user creates a digital 3D model as a geometric variation from a 3D candidate. Our modeling technique consists of two major steps. The first step is a userguided imagespace object segmentation to reveal the structure of the photographed object. The core step is the second one, in which a 3D candidate is automatically deformed to fit the photographed target under the guidance of silhouette correspondence. The set of candidate models have been preanalyzed to possess useful highlevel structural information, which is heavily utilized in both steps to compensate for the illposedness of the analysis and modeling problems based only on content in a single image. Equally important, the structural information is preserved by the geometric variation so that the final product is coherent with its inherited structural information readily usable for subsequent model refinement or processing. Links: DL PDF WEB VIDEO 1
Energy based multiple model fitting for nonrigid structure from motion
 In Proceedings of IEEE Conference on Computer Vision and Pattern
, 2007
Cited by 20 (6 self)
In this paper we reformulate the 3D reconstruction of deformable surfaces from monocular video sequences as a labeling problem. We solve simultaneously for the assignment of feature points to multiple local deformation models and the fitting of models to points to minimize a geometric cost, subject to a spatial constraint that neighboring points should also belong to the same model. Piecewise reconstruction methods rely on features shared between models to enforce global consistency on the 3D surface. To account for this overlap between regions, we consider a superset of the classic labeling problem in which a set of labels, instead of a single one, is assigned to each variable. We propose a mathematical formulation of this new model and show how it can be efficiently optimized with a variant of αexpansion. We demonstrate how this framework can be applied to NonRigid Structure from Motion and leads to simpler explanations of the same data. Compared to existing methods run on the same data, our approach has up to half the reconstruction error, and is more robust to overfitting and outliers. 1.
Submodularity beyond submodular energies: coupling edges in graph cuts
 In CVPR
, 2011
Cited by 19 (14 self)
We propose a new family of nonsubmodular global energy functions that still use submodularity internally to couple edges in a graph cut. We show it is possible to develop an efficient approximation algorithm that, thanks to the internal submodularity, can use standard graph cuts as a subroutine. We demonstrate the advantages of edge coupling in a natural setting, namely image segmentation. In particular, for finestructured objects and objects with shading variation, our structured edge coupling leads to significant improvements over standard approaches. 1.
A Hierarchical Conditional Random Field Model for Labeling and Classifying Images of Manmade Scenes
Cited by 12 (0 self)
Semantic scene interpretation as a collection of meaningful regions in images is a fundamental problem in both photogrammetry and computer vision. Images of manmade scenes exhibit strong contextual dependencies in the form of spatial and hierarchical structures. In this paper, we introduce a hierarchical conditional random field to deal with the problem of image classification by modeling spatial and hierarchical structures. The probability outputs of an efficient randomized decision forest classifier are used as unary potentials. The spatial and hierarchical structures of the regions are integrated into pairwise potentials. The model is built on multiscale image analysis in order to aggregate evidence from local to global level. Experimental results are provided to demonstrate the performance of the proposed method using images from eTRIMS dataset, where our focus is the object classes building, car, door, pavement, road, sky, vegetation, and window. 1.