Results 1 - 10
of
11
Statistical Approaches to Feature-Based Object Recognition
, 1997
"... . This paper examines statistical approaches to model-based object recognition. Evidence is presented indicating that, in some domains, normal (Gaussian) distributions are more accurate than uniform distributions for modeling feature fluctuations. This motivates the development of new maximum-likeli ..."
Abstract
-
Cited by 53 (1 self)
- Add to MetaCart
. This paper examines statistical approaches to model-based object recognition. Evidence is presented indicating that, in some domains, normal (Gaussian) distributions are more accurate than uniform distributions for modeling feature fluctuations. This motivates the development of new maximum-likelihood and MAP recognition formulations which are based on normal feature models. These formulations lead to an expression for the posterior probability of the pose and correspondences given an image. Several avenues are explored for specifying a recognition hypothesis. In the first approach, correspondences are included as a part of the hypotheses. Search for solutions may be ordered as a combinatorial search in correspondence space, or as a search over pose space, where the same criterion can equivalently be viewed as a robust variant of chamfer matching. In the second approach, correspondences are not viewed as being a part of the hypotheses. This leads to a criterion that is a smooth funct...
Embedding Gestalt Laws in Markov Random Fields -- a theory for shape modeling and perceptual organization
, 1999
"... The goal of this paper is to study a mathematical framework of 2D object shape modeling and learning for middle level vision problems, such as image segmentation and perceptual organization. For this purpose, we pursue generic shape models which characterize the most common features of 2D object sha ..."
Abstract
-
Cited by 48 (7 self)
- Add to MetaCart
The goal of this paper is to study a mathematical framework of 2D object shape modeling and learning for middle level vision problems, such as image segmentation and perceptual organization. For this purpose, we pursue generic shape models which characterize the most common features of 2D object shapes. In this paper, shape models are learned from observed natural shapes based on a minimax entropy learning theory (Zhu and Mumford 1997, Zhu, Wu and Mumford 1997)[31, 32]. The learned shape models are Gibbs distributions dened on Markov random elds (MRFs). The neighborhood structures of these MRFs correspond to Gestalt laws {co-linearity, co-circularity, proximity, parallelism, and symmetry. Thus both contour-based and region-based features are accounted for. Stochastic Markov chain Monte Carlo (MCMC) algorithms are proposed for learning and model verication. Furthermore, this paper provides a quantitative measure for the so-called non-accidental statistics, and thus justies some empi...
Recognition Using Region Correspondences
- International Journal of Computer Vision
, 1995
"... A central problem in object recognition is to determine the transformation that relates the model to the image, given some partial correspondence between the two. This is useful in determining whether an object is present in an image, and if so, determining where the object is. We present a novel me ..."
Abstract
-
Cited by 30 (7 self)
- Add to MetaCart
A central problem in object recognition is to determine the transformation that relates the model to the image, given some partial correspondence between the two. This is useful in determining whether an object is present in an image, and if so, determining where the object is. We present a novel method of solving this problem that uses region information. In our approach the model is divided into volumes, and the image is divided into regions. Given a match between subsets of volumes and regions (without any explicit correspondence between different pieces of the regions) the alignment transformation is computed. The method applies to planar objects under similarity, affine, and projective transformations and to projections of 3-D objects undergoing affine and projective transformations. 1 Introduction A fundamental problem in recognition is pose estimation. Given a correspondence between some portions of an object model and some portions of an image, determine the transformation th...
Early completion of occluded objects
- Vision Research
, 1998
"... We show that early vision can use monocular cues to rapidly complete partially-occluded objects. Visual search for easilydetected fragments becomes difficult when the completed shape is similar to others in the display; conversely, search for fragments that are difficult to detect becomes easy when ..."
Abstract
-
Cited by 23 (6 self)
- Add to MetaCart
We show that early vision can use monocular cues to rapidly complete partially-occluded objects. Visual search for easilydetected fragments becomes difficult when the completed shape is similar to others in the display; conversely, search for fragments that are difficult to detect becomes easy when the completed shape is distinctive. Results indicate that completion occurs via the occlusion-triggered removal of occlusion edges and linking of associated regions. We fail to find evidence for a visible filling-in of contours or surfaces, but do find evidence for a "functional " filling-in that prevents the constituent fragments from being rapidly accessed. As such, it is only the completed structures—and not the fragments themselves—that serve as the basis for rapid recognition.
Extracting Salient Curves from Images: An Analysis of the Saliency Network
, 1998
"... The Saliency Network proposed by Shashua and Ullman (1988) is a well-known approach to the problem of extracting salient curves from images while performing gap completion. This paper analyzes the Saliency Network. The Saliency Network is attractive for several reasons. First, the network generally ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
The Saliency Network proposed by Shashua and Ullman (1988) is a well-known approach to the problem of extracting salient curves from images while performing gap completion. This paper analyzes the Saliency Network. The Saliency Network is attractive for several reasons. First, the network generally prefers long and smooth curves over short or wiggly ones. While computing saliencies, the network also fills in gaps with smooth completions and tolerates noise. Finally, the network is locally connected, and its size is proportional to the size of the image. Nevertheless, our analysis reveals certain weaknesses with the method. In particular, we show cases in which the most salient element does not lie on the perceptually most salient curve. Furthermore, in some cases the saliency measure changes its preferences when curves are scaled uniformly. Also, we show that for certain fragmented curves the measure prefers large gaps over a few small gaps of the same total size. In addition, we analyze the time complexity required by the method. We show that the number of steps required for convergence in serial implementations is quadratic in the size of the network, and in parallel implementations is linear in the size of the network. We discuss problems due to coarse sampling of the range of possible orientations. Finally, we consider the possibility of using the Saliency Network for grouping. We show that the Saliency Network recovers the most salient curve efficiently, but it has problems with identifying any salient curve other than the most salient one.
Indexing Based on Algebraic Functions of Views
- COMPUTER VISION AND IMAGE UNDERSTANDING
, 1998
"... this paper, we propose the use of algebraic functions of views for indexing-based object recognition. During indexing, we consider groups of model points and we represent all the views (i.e., images) that they can produce in a hash table. The images that a group of model points can produce are compu ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
this paper, we propose the use of algebraic functions of views for indexing-based object recognition. During indexing, we consider groups of model points and we represent all the views (i.e., images) that they can produce in a hash table. The images that a group of model points can produce are computed by combining a small number of reference views which contain the group using algebraic functions of views. Fundamental to this procedure is a methodology, based on Singular Value Decomposition and Interval Arithmetic, for estimating the allowable ranges of values that the parameters of algebraic functions can assume. During recognition, scene groups are used to retrieve from the hash table the most feasible model groups that might have produced the scene groups. The use of algebraic functions of views for indexing-based recognition offers a number of advantages. First of all, the hash table can be built using a small number of reference views per object. This is in contrast to current approaches which build the hash table using either a large number of reference views or 3D models. Most importantly, recognition does not rely on the similarity between reference views and novel views; all that is required for the novel views is to contain common groups of points with a small number of reference views. Second, verification becomes simpler. This is because candidate models can now be back-projected onto the scene by applying a linear transformationona small number of reference views of the candidate model. Finally, the proposed approach is more general and extendible. This is because algebraic functions of views have been shown to exist over a wide range of transformations and projections. The recognition performance of the proposed approach is demonstrated using both artific...
Computational Models of Perceptual Organization
- Robotics Institute, Carnegie Mellon University
, 2003
"... Perceptual organization refers to the process of organizing sensory input into coherent and interpretable perceptual structures. This process is challenging due to the chicken-and-egg nature between the various sub-processes such as image segmentation, figure-ground segregation and object recognitio ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Perceptual organization refers to the process of organizing sensory input into coherent and interpretable perceptual structures. This process is challenging due to the chicken-and-egg nature between the various sub-processes such as image segmentation, figure-ground segregation and object recognition. Low-level processing requires the guidance of high-level knowledge to overcome noise; while high-level processing relies on low-level processes to reduce the computational complexity. Neither process can be sufficient on its own. Consequently, any system that carries out these processes in a sequence is bound to be brittle. An alternative system is one in which all processes interact with each other simultaneously. In this thesis, we develop a set of simple yet realistic interactive processing models for perceptual organization. We model the processing in the framework of spectral graph theory, with a criterion encoding the overall goodness of perceptual organization. We derive fast solutions for near-global optima of the criterion, and demonstrate the efficacy of the models on segmenting a wide range of real images. Through these models, we are able to capture a variety of perceptual phenomena: a unified treatment of various grouping, figure-ground and depth cues to produce popout, region segmentation and depth segregation in one step; and a unified framework for integrating bottom-up and top-down information to produce an object segmentation from spatial and object attention. We achieve these goals by empowering current spectral graph methods with a principled solution for multiclass spectral graph partitioning; expanded repertoire of grouping cues to include similarity, dissimilarity and ordering relationships; a theory for integrating sparse grouping cues; and a model ...
3-D to 2-D Recognition with Regions
- IEEE Conference on Computer Vision and Pattern Recognition
, 1997
"... This paper presents a novel approach to parts-based object recognition in the presence of occlusion. We focus on the problem of determining the pose of a 3-D object from a single 2-D image when convex parts of the object have been matched to corresponding regions in the image. We consider three t ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper presents a novel approach to parts-based object recognition in the presence of occlusion. We focus on the problem of determining the pose of a 3-D object from a single 2-D image when convex parts of the object have been matched to corresponding regions in the image. We consider three types of occlusions: self-occlusion, occlusions whose locus is identified in the image, and completely arbitrary occlusions. We derive efficient algorithms for the first two cases, and characterize their performance. For the last case, we prove that the problem of finding valid poses is computationally hard, but provide an efficient, approximate algorithm. This work generalizes our previous work on region-based object recognition, which focused on the case of planar models. A preliminary version of this paper has appeared in [29] A brief overview of these and related results has appeared in [8] y This research was supported by the Unites States-Israel Binational Science Foundation, Gr...
Mining Surveillance Video for Independent Motion Detection
- Proc. 2002 IEEE International Conference on Data Mining (ICDM’02
"... This paper addresses the special applications of data mining techniques in homeland defense. The problem targeted, which is frequently encountered in military/intelligence surveillance, is to mine a massive surveillance video database automatically collected to retrieve the shots containing independ ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper addresses the special applications of data mining techniques in homeland defense. The problem targeted, which is frequently encountered in military/intelligence surveillance, is to mine a massive surveillance video database automatically collected to retrieve the shots containing independently moving targets. A novel solution to this problem is presented in this paper, which offers a completely qualitative approach to solving for the automatic independent motion detection problem directly from the compressed surveillance video in a faster than real-time mining performance. This approach is based on the linear system consistency analysis, and consequently is called QLS. Since the QLS approach only focuses on what exactly is necessary to compute a solution, it saves the computation to a minimum and achieves the efficacy to the maximum. Evaluations from real data show that QLS delivers effective mining performance at the achieved efficiency. 1

