Results 1 - 10
of
18
Recognition without Correspondence using Multidimensional Receptive Field Histograms
- International Journal of Computer Vision
, 2000
"... . The appearance of an object is composed of local structure. This local structure can be described and characterized by a vector of local features measured by local operators such as Gaussian derivatives or Gabor filters. This article presents a technique where appearances of objects are represente ..."
Abstract
-
Cited by 176 (15 self)
- Add to MetaCart
. The appearance of an object is composed of local structure. This local structure can be described and characterized by a vector of local features measured by local operators such as Gaussian derivatives or Gabor filters. This article presents a technique where appearances of objects are represented by the joint statistics of such local neighborhood operators. As such, this represents a new class of appearance based techniques for computer vision. Based on joint statistics, the paper develops techniques for the identification of multiple objects at arbitrary positions and orientations in a cluttered scene. Experiments show that these techniques can identify over 100 objects in the presence of major occlusions. Most remarkably, the techniques have low complexity and therefore run in real-time. 1. Introduction The paper proposes a framework for the statistical representation of the appearance of arbitrary 3D objects. This representation consists of a probability density function or jo...
An Active Vision Architecture based on Iconic Representations
- Artificial Intelligence
, 1995
"... Active vision systems have the capability of continuously interacting with the environment. The rapidly changing environment of such systems means that it is attractive to replace static representations with visual routines that compute information on demand. Such routines place a premium on image d ..."
Abstract
-
Cited by 116 (12 self)
- Add to MetaCart
Active vision systems have the capability of continuously interacting with the environment. The rapidly changing environment of such systems means that it is attractive to replace static representations with visual routines that compute information on demand. Such routines place a premium on image data structures that are easily computed and used. The purpose of this paper is to propose a general active vision architecture based on efficiently computable iconic representations. This architecture employs two primary visual routines, one for identifying the visual image near the fovea (object identification), and another for locating a stored prototype on the retina (object location). This design allows complex visual behaviors to be obtained by composing these two routines with different parameters. The iconic representations are comprised of high-dimensional feature vectors obtained from the responses of an ensemble of Gaussian derivative spatial filters at a number of orientations and...
Control of Selective Perception Using Bayes Nets and Decision Theory
, 1993
"... A selective vision system sequentially collects evidence to support a specified hypothesis about a scene, as long as the additional evidence is worth the effort of obtaining it. Efficiency comes from processing the scene only where necessary, to the level of detail necessary, and with only the neces ..."
Abstract
-
Cited by 87 (1 self)
- Add to MetaCart
A selective vision system sequentially collects evidence to support a specified hypothesis about a scene, as long as the additional evidence is worth the effort of obtaining it. Efficiency comes from processing the scene only where necessary, to the level of detail necessary, and with only the necessary operators. Knowledge representation and sequential decision-making are central issues for selective vision, which takes advantage of prior knowledge of a domain's abstract and geometrical structure and models for the expected performance and cost of visual operators. The TEA-1 selective vision system uses Bayes nets for representation and benefitcost analysis for control of visual and non-visual actions. It is the high-level control for an active vision system, enabling purposive behavior, the use of qualitative vision modules and a pointable multiresolution sensor. TEA-1 demonstrates that Bayes nets and decision theoretic techniques provide a general, re-usable framework for constructi...
Multiresolution Histograms and their Use for Recognition
- IEEE transactions on Pattern Analysis and Machine Intelligence
, 2004
"... Abstract—The histogram of image intensities is used extensively for recognition and for retrieval of images and video from visual databases. A single image histogram, however, suffers from the inability to encode spatial image variation. An obvious way to extend this feature is to compute the histog ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
Abstract—The histogram of image intensities is used extensively for recognition and for retrieval of images and video from visual databases. A single image histogram, however, suffers from the inability to encode spatial image variation. An obvious way to extend this feature is to compute the histograms of multiple resolutions of an image to form a multiresolution histogram. The multiresolution histogram shares many desirable properties with the plain histogram including that they are both fast to compute, space efficient, invariant to rigid motions, and robust to noise. In addition, the multiresolution histogram directly encodes spatial information. We describe a simple yet novel matching algorithm based on the multiresolution histogram that uses the differences between histograms of consecutive image resolutions. We evaluate it against five widely used image features. We show that with our simple feature we achieve or exceed the performance obtained with more complicated features. Further, we show our algorithm to be the most efficient and robust.
Approximate Orientation Steerability Based on Angular Gaussians
- IEEE Trans. Image Processing
, 2000
"... Junctions are signi cant features in images with an intensity variation that exhibits multiple orientations. This makes the detection and characterization of junctions a challenging problem. ..."
Abstract
-
Cited by 16 (9 self)
- Add to MetaCart
Junctions are signi cant features in images with an intensity variation that exhibits multiple orientations. This makes the detection and characterization of junctions a challenging problem.
Top-Down Gaze Targeting for Space-Variant Active Vision
- In Proc. ARPA Image Understanding Workshop
, 1994
"... The simultaneous need for a wide angle of field and high resolution has led to the use of spatially variant sensors in active vision systems. The use of such sensors however necessitates the existence of gaze control mechanisms for guiding the foveal high resolution region of the sensor to points of ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
The simultaneous need for a wide angle of field and high resolution has led to the use of spatially variant sensors in active vision systems. The use of such sensors however necessitates the existence of gaze control mechanisms for guiding the foveal high resolution region of the sensor to points of interest in the visual world. While bottom-up alerting cues such as motion have previously been used for this purpose, tasks such as visual search are better facilitated by top-down guidance mechanisms. In this paper, we describe the use of iconic scene descriptions for top-down foveal targeting. These descriptions take the form of a vector of responses of a bank of steerable filters at multiple scales and orientations. Such a representation has a number of useful properties such as rotation and scale invariance, partial view-insensitivity and tolerance to occlusions. Top-down control of gaze for uniform resolution sensors is achieved by the process of backprojection which matches vectors o...
Matching by local invariants
, 1995
"... apport de recherche ISSN 0249-6399Matching by local invariants ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
apport de recherche ISSN 0249-6399Matching by local invariants
Visual Routines for Autonomous Driving
- In Proceedings of the 6th International Conference on Computer Vision (ICCV-98
, 1998
"... The paper describes visual routines based on models of color and shape, as well as crucial issues involving the scheduling of such routines. The visual routines are developed in a unique platform. The view from a car driving in a simulated world is fed into a Datacube pipeline video processor. The u ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
The paper describes visual routines based on models of color and shape, as well as crucial issues involving the scheduling of such routines. The visual routines are developed in a unique platform. The view from a car driving in a simulated world is fed into a Datacube pipeline video processor. The use of this simulation provides a flexible environment from which to set crucial image processing parameters of the individual routines. In addition to the simulations, the routines are also tested in similar images generated by driving in the real world, to assure the generalizability of the simulation. 1 Introduction The advent of faster processors has moved the focus of computer vision from analysis of a single image to dealing with long sequences of images and, more important, extracting the parts of those images that are needed for behaviors. Nowhere is this more apparent than in the application of automated driving. Steering corrections in very complex environments normally have to be...
Learning to Recognize and Grasp Objects
- Autonomous Robots
, 1998
"... . We apply techniques of computer vision and neural network learning to get a versatile robot manipulator. All work conducted follows the principle of autonomous learning from visual demonstration. The user must demonstrate the relevant objects, situations, and/or actions, and the robot vision syste ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
. We apply techniques of computer vision and neural network learning to get a versatile robot manipulator. All work conducted follows the principle of autonomous learning from visual demonstration. The user must demonstrate the relevant objects, situations, and/or actions, and the robot vision system must learn from those. For approaching and grasping technical objects three principal tasks have to be done --- calibrating the camera--robotcoordination, detecting the desired object in the images, and choosinga stable graspingpose. These procedures are based on (nonlinear) functions, which are not known a priori and therefore have to be learned. We uniformly approximate the necessary functions by networks of gaussian basis functions (GBF networks). By modifying the number of basis functions and/or the size of the gaussian support the quality of the function approximation changes. The appropriate configuration is learned in the training phase and applied during the operation phase. All ex...
Seeing Behind Occlusions
- In Proceedings of the Third European Conference on Computer Vision (ECCV
, 1994
"... The location of objects in images is difficult owing to the view variance of geometric features but can be determined by developing view-insensitive descriptions of the intensities local to image points. View-insensitive descriptions are achieved in this work by describing points in terms of the res ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
The location of objects in images is difficult owing to the view variance of geometric features but can be determined by developing view-insensitive descriptions of the intensities local to image points. View-insensitive descriptions are achieved in this work by describing points in terms of the responses of steerable filters at multiple scales. Owing to the use of multiple scales, the vector for each point is, for all practical purposes, unique, and thus can be easily matched to other instances of the point in other images. We show that this method can be extended to handle the case where the area near a point of interest is partially occluded. The method uses a description of the occluder in the form of a template that can be obtained easily via active vision systems using a method such as disparity filtering. This research is supported by the Human Science Frontiers Program research grant and by the National Science Foundation under NSF research grant no. CDA-8822724. This report is...

