Results 1 - 10
of
52
A Theory of Networks for Approximation and Learning
- Laboratory, Massachusetts Institute of Technology
, 1989
"... Learning an input-output mapping from a set of examples, of the type that many neural networks have been constructed to perform, can be regarded as synthesizing an approximation of a multi-dimensional function, that is solving the problem of hypersurface reconstruction. From this point of view, t ..."
Abstract
-
Cited by 170 (25 self)
- Add to MetaCart
Learning an input-output mapping from a set of examples, of the type that many neural networks have been constructed to perform, can be regarded as synthesizing an approximation of a multi-dimensional function, that is solving the problem of hypersurface reconstruction. From this point of view, this form of learning is closely related to classical approximation techniques, such as generalized splines and regularization theory. This paper considers the problems of an exact representation and, in more detail, of the approximation of linear and nonlinear mappings in terms of simpler functions of fewer variables. Kolmogorov's theorem concerning the representation of functions of several variables in terms of functions of one variable turns out to be almost irrelevant in the context of networks for learning. Wedevelop a theoretical framework for approximation based on regularization techniques that leads to a class of three-layer networks that we call Generalized Radial Basis Functions (GRBF), since they are mathematically related to the well-known Radial Basis Functions, mainly used for strict interpolation tasks. GRBF networks are not only equivalent to generalized splines, but are also closely related to pattern recognition methods suchasParzen windows and potential functions and to several neural network algorithms, suchas Kanerva's associative memory,backpropagation and Kohonen's topology preserving map. They also haveaninteresting interpretation in terms of prototypes that are synthesized and optimally combined during the learning stage. The paper introduces several extensions and applications of the technique and discusses intriguing analogies with neurobiological data.
Object-Centered Surface Reconstruction: Combining Multi-Image Stereo and Shading
- International Journal of Computer Vision
, 1995
"... Our goal is to reconstruct both the shape and reflectance properties of surfaces from multiple images. We argue that an object-centered representation is most appropriate for this purpose because it naturally accommodates multiple sources of data, multiple images (including motion sequences of a rig ..."
Abstract
-
Cited by 103 (19 self)
- Add to MetaCart
Our goal is to reconstruct both the shape and reflectance properties of surfaces from multiple images. We argue that an object-centered representation is most appropriate for this purpose because it naturally accommodates multiple sources of data, multiple images (including motion sequences of a rigid object), and self-occlusions. We then present a specific objectcentered reconstruction method and its implementation. The method begins with an initial estimate of surface shape provided, for example, by triangulating the result of conventional stereo. The surface shape and reflectance properties are then iteratively adjusted to minimize an objective function that combines information from multiple input images. The objective function is a weighted sum of stereo, shading, and smoothness components, where the weight varies over the surface. For example, the stereo component is weighted more strongly where the surface projects onto highly textured areas in the images, and less strongly othe...
Qualitative Egomotion
- International Journal of Computer Vision
, 1993
"... Due to the aperture problem, the only general unambiguous motion measurement in images is normal flow---the projection of image motion on the gradient direction. In this paper we show how a monocular observer can estimate its 3D motion relative to the scene by using normal flow measurements in a ..."
Abstract
-
Cited by 29 (12 self)
- Add to MetaCart
Due to the aperture problem, the only general unambiguous motion measurement in images is normal flow---the projection of image motion on the gradient direction. In this paper we show how a monocular observer can estimate its 3D motion relative to the scene by using normal flow measurements in a global and mostly qualitative way. The problem is addressed through a search technique. By checking constraints imposed by 3D motion parameters on the normal flow field the possible space of solutions is gradually reduced. In the four modules that comprise the solution, constraints of increasing restriction are considered, culminating in testing every single normal flow value for its consistency with a set of motion parameters. The fact that motion is rigid defines geometric relations between certain values of the normal flow field. The selected values form patterns in the image plane that are dependent on only some of the motion parameters. These patterns, which are determined by the signs of the normal flow values, are searched for in order to find the axes of translation and rotation. The third rotational component is computed from normal flow vectors that are only due to rotational motion. Finally, by looking at the complete data set, all solutions that cannot give rise to the given normal flow field are discarded from the solution space.
Biologically motivated multi-modal processing of visual primitives
- THE INTERDISCIPLINARY JOURNAL OF ARTIFICIAL INTELLIGENCE AND THE SIMULATION OF BEHAVIOUR
, 2003
"... We describe a new kind of image representation in terms of local multi–modal Primitives. These Primitives are motivated by processing of the human visual system as well as by functional considerations. We discuss analogies of our representation to human vision and concentrate specifically on the imp ..."
Abstract
-
Cited by 29 (20 self)
- Add to MetaCart
We describe a new kind of image representation in terms of local multi–modal Primitives. These Primitives are motivated by processing of the human visual system as well as by functional considerations. We discuss analogies of our representation to human vision and concentrate specifically on the implications of the necessity of communication of information in a complex multi-modal system.
Cue integration through discriminative accumulation
- in Proc. CVPR’04
"... Object recognition systems aiming to work in real world settings should use multiple cues in order to achieve robustness. We present a new cue integration scheme which extends the idea of cue accumulation to discriminative classifiers. We derive and test the scheme for Support Vector Machines (SVMs) ..."
Abstract
-
Cited by 20 (8 self)
- Add to MetaCart
Object recognition systems aiming to work in real world settings should use multiple cues in order to achieve robustness. We present a new cue integration scheme which extends the idea of cue accumulation to discriminative classifiers. We derive and test the scheme for Support Vector Machines (SVMs), but we also show that it is easily extendible to any large margin classifier. Interestingly, in the case of one-class SVMs, the scheme can be interpreted as a new class of Mercer kernels for multiple cues. Experimental comparison with a probabilistic accumulation scheme is favorable to our method. Comparison with voting scheme shows that our method may suffer as the number of object classes increases. Based on these results, we propose a recognition algorithm consisting of a decision tree where decisions at each node are taken using our accumulation scheme. Results obtained using this new algorithm compare very favorably to accumulation (both probabilistic and discriminative) and voting scheme. 1
A Continuous Formulation of Intrinsic Dimension
, 2003
"... The intrinsic dimension (see, e.g., [29, 11]) has proven to be a suitable descriptor to distinguish between different kind of image structures such as edges, junctions or homogeneous image patches. In this paper, we will show that the intrinsic dimension is spanned by two axes: one axis represent ..."
Abstract
-
Cited by 17 (10 self)
- Add to MetaCart
The intrinsic dimension (see, e.g., [29, 11]) has proven to be a suitable descriptor to distinguish between different kind of image structures such as edges, junctions or homogeneous image patches. In this paper, we will show that the intrinsic dimension is spanned by two axes: one axis represents the variance of the spectral energy and one represents the a weighted variance in orientation. Moreover, we will show in section that the topological structure of instrinsic dimension has the form of a triangle. We will review diverse definitions of intrinsic dimension and we will show that they can be subsumed within the above mentioned scheme. We will then give a concrete continous definition of intrinsic dimension that realizes its triangular structure.
Active Fusion - A New Method Applied to Remote Sensing Image Interpretation
- Pattern Recognition Letters
, 1996
"... Today's computer vision applications often have to deal with multiple, uncertain, and incomplete visual information. In this paper, we introduce a new method, termed `active fusion', which provides a common framework for active selection and combination of information from multiple sources in order ..."
Abstract
-
Cited by 14 (6 self)
- Add to MetaCart
Today's computer vision applications often have to deal with multiple, uncertain, and incomplete visual information. In this paper, we introduce a new method, termed `active fusion', which provides a common framework for active selection and combination of information from multiple sources in order to arrive at a reliable result at reasonable costs. The implementation of active fusion on the basis of probability theory, the Dempster-Shafer theory of evidence and fuzzy sets is discussed. In a sample experiment, active fusion using Bayesian networks is applied to agricultural field classification from multitemporal Landsat imagery. This experiment shows a significant reduction of the number of information sources required for a reliable decision. Keywords: information fusion, image understanding, active fusion, probability theory, Bayesian networks, Dempster-Shafer theory of evidence, fuzzy sets, fuzzy measures, entropy 1 Motivation Information fusion deals with the integration of info...
Visualization of Multidimensional Shape and Texture Features in Laser Range Data using Complex-Valued Gabor Wavelets
, 1995
"... This paper describes a new method for visualization and analysis of multivariate laser range data using complex--valued non--orthogonal Gabor wavelets, principal component analysis and a topological mapping network. The initial data set that provides both shape and texture information is encoded in ..."
Abstract
-
Cited by 12 (7 self)
- Add to MetaCart
This paper describes a new method for visualization and analysis of multivariate laser range data using complex--valued non--orthogonal Gabor wavelets, principal component analysis and a topological mapping network. The initial data set that provides both shape and texture information is encoded in terms of both amplitude and phase of a complex valued 2D image function. A set of carefully designed oriented Gabor filters performs a decomposition of the data and allows for retrieving local shape and texture features. The feature vector obtained from this method is multidimensional and in order to evaluate similar data features, further subspace methods to transform the data onto visualizable attributes, such as R,G,B, have to be determined. For this purpose, a feature--based visualization pipeline is proposed consisting of principal component analysis, normalization and a topological mapping network. This process finally renders a R,G,B subspace representation of the multidimensional fea...
Multi-modal Semantic Place Classification
, 2010
"... The ability to represent knowledge about space and its position therein is crucial for a mobile robot. To this end, topological and semantic descriptions are gaining popularity for augmenting purely metric space representations. In this paper we present a multi-modal place classification system that ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
The ability to represent knowledge about space and its position therein is crucial for a mobile robot. To this end, topological and semantic descriptions are gaining popularity for augmenting purely metric space representations. In this paper we present a multi-modal place classification system that allows a mobile robot to identify places and recognize semantic categories in an indoor environment. The system effectively utilizes information from different robotic sensors by fusing multiple visual cues and laser range data. This is achieved using a high-level cue integration scheme based on a Support Vector Machine (SVM) that learns how to optimally combine and weight each cue. Our multi-modal place classification approach can be used to obtain a real-time semantic space labeling system which integrates information over time and space. We perform an extensive experimental evaluation of the method for two different platforms and environments, on a realistic off-line database and in a live experiment on an autonomous robot. The results clearly demonstrate the effec-
Object Recognition by Active Fusion
- In Intelligent Robots and Computer Vision XV: Algorithms, Techniques, Active Vision, and Materials Handling
, 1996
"... Today's computer vision applications often have to deal with multiple, uncertain, and incomplete visual information. In this paper, we apply a new method, termed `active fusion', to the problem of generic object recognition. Active fusion provides a common framework for active selection and combinat ..."
Abstract
-
Cited by 10 (8 self)
- Add to MetaCart
Today's computer vision applications often have to deal with multiple, uncertain, and incomplete visual information. In this paper, we apply a new method, termed `active fusion', to the problem of generic object recognition. Active fusion provides a common framework for active selection and combination of information from multiple sources in order to arrive at a reliable result at reasonable costs. In our experimental setup we use a camera mounted on a 2m x 1.5m x/z-table observing objects placed on a rotating table. Zoom, pan, tilt and aperture setting of the camera can be controlled by the system. We follow a part-based approach, trying to decompose objects into parts, which are modeled as geons. The active fusion system starts from an initial view of the objects placed on the table and is continuously trying to refine its current object hypotheses by requesting additional views. The implementation of active fusion on the basis of probability theory, Dempster-Shafer's theory of evide...

