Results 1  10
of
61
A Theory of Networks for Approximation and Learning
 Laboratory, Massachusetts Institute of Technology
, 1989
"... Learning an inputoutput mapping from a set of examples, of the type that many neural networks have been constructed to perform, can be regarded as synthesizing an approximation of a multidimensional function, that is solving the problem of hypersurface reconstruction. From this point of view, t ..."
Abstract

Cited by 194 (24 self)
 Add to MetaCart
Learning an inputoutput mapping from a set of examples, of the type that many neural networks have been constructed to perform, can be regarded as synthesizing an approximation of a multidimensional function, that is solving the problem of hypersurface reconstruction. From this point of view, this form of learning is closely related to classical approximation techniques, such as generalized splines and regularization theory. This paper considers the problems of an exact representation and, in more detail, of the approximation of linear and nonlinear mappings in terms of simpler functions of fewer variables. Kolmogorov's theorem concerning the representation of functions of several variables in terms of functions of one variable turns out to be almost irrelevant in the context of networks for learning. Wedevelop a theoretical framework for approximation based on regularization techniques that leads to a class of threelayer networks that we call Generalized Radial Basis Functions (GRBF), since they are mathematically related to the wellknown Radial Basis Functions, mainly used for strict interpolation tasks. GRBF networks are not only equivalent to generalized splines, but are also closely related to pattern recognition methods suchasParzen windows and potential functions and to several neural network algorithms, suchas Kanerva's associative memory,backpropagation and Kohonen's topology preserving map. They also haveaninteresting interpretation in terms of prototypes that are synthesized and optimally combined during the learning stage. The paper introduces several extensions and applications of the technique and discusses intriguing analogies with neurobiological data.
ObjectCentered Surface Reconstruction: Combining MultiImage Stereo and Shading
 International Journal of Computer Vision
, 1995
"... Our goal is to reconstruct both the shape and reflectance properties of surfaces from multiple images. We argue that an objectcentered representation is most appropriate for this purpose because it naturally accommodates multiple sources of data, multiple images (including motion sequences of a rig ..."
Abstract

Cited by 120 (19 self)
 Add to MetaCart
Our goal is to reconstruct both the shape and reflectance properties of surfaces from multiple images. We argue that an objectcentered representation is most appropriate for this purpose because it naturally accommodates multiple sources of data, multiple images (including motion sequences of a rigid object), and selfocclusions. We then present a specific objectcentered reconstruction method and its implementation. The method begins with an initial estimate of surface shape provided, for example, by triangulating the result of conventional stereo. The surface shape and reflectance properties are then iteratively adjusted to minimize an objective function that combines information from multiple input images. The objective function is a weighted sum of stereo, shading, and smoothness components, where the weight varies over the surface. For example, the stereo component is weighted more strongly where the surface projects onto highly textured areas in the images, and less strongly othe...
Qualitative Egomotion
, 1995
"... Due to the aperture problem, the only motion measurement in images, whose computation does not require any assumptions about the scene in view, is normal flowthe projection of image motion on the gradient direction. In this paper we show how a monocular observer can estimate its 3D motion relativ ..."
Abstract

Cited by 36 (15 self)
 Add to MetaCart
Due to the aperture problem, the only motion measurement in images, whose computation does not require any assumptions about the scene in view, is normal flowthe projection of image motion on the gradient direction. In this paper we show how a monocular observer can estimate its 3D motion relative to the scene by using normal flow measurements in a global and qualitative way. The problem is addressed through a search technique. By checking constraints imposed by 3D motion parameters on the normal flow field, the possible space of solutions is gradually reduced. In the four modules that comprise the solution, constraints of increasing restriction are considered, culminating in testing every single normal flow value for its consistency with a set of motion parameters. The fact that motion is rigid defines geometric relations between certain values of the normal flow field. The selected values form patterns in the image plane that are dependent on only some of the motion parameters. These patterns, which are determined by the signs of the normal flow values, are searched for in order to find the axes of translation and rotation. The third rotational component is computed from normal flow vectors that are only due to rotational motion. Finally, by looking at the complete data set, all solutions that cannot give rise to the given normal flow field are discarded from the solution space.
Biologically motivated multimodal processing of visual primitives
 THE INTERDISCIPLINARY JOURNAL OF ARTIFICIAL INTELLIGENCE AND THE SIMULATION OF BEHAVIOUR
, 2003
"... We describe a new kind of image representation in terms of local multi–modal Primitives. These Primitives are motivated by processing of the human visual system as well as by functional considerations. We discuss analogies of our representation to human vision and concentrate specifically on the imp ..."
Abstract

Cited by 35 (23 self)
 Add to MetaCart
We describe a new kind of image representation in terms of local multi–modal Primitives. These Primitives are motivated by processing of the human visual system as well as by functional considerations. We discuss analogies of our representation to human vision and concentrate specifically on the implications of the necessity of communication of information in a complex multimodal system.
Cue integration through discriminative accumulation
 in Proc. CVPR’04
"... Object recognition systems aiming to work in real world settings should use multiple cues in order to achieve robustness. We present a new cue integration scheme which extends the idea of cue accumulation to discriminative classifiers. We derive and test the scheme for Support Vector Machines (SVMs) ..."
Abstract

Cited by 28 (10 self)
 Add to MetaCart
Object recognition systems aiming to work in real world settings should use multiple cues in order to achieve robustness. We present a new cue integration scheme which extends the idea of cue accumulation to discriminative classifiers. We derive and test the scheme for Support Vector Machines (SVMs), but we also show that it is easily extendible to any large margin classifier. Interestingly, in the case of oneclass SVMs, the scheme can be interpreted as a new class of Mercer kernels for multiple cues. Experimental comparison with a probabilistic accumulation scheme is favorable to our method. Comparison with voting scheme shows that our method may suffer as the number of object classes increases. Based on these results, we propose a recognition algorithm consisting of a decision tree where decisions at each node are taken using our accumulation scheme. Results obtained using this new algorithm compare very favorably to accumulation (both probabilistic and discriminative) and voting scheme. 1
Multimodal Semantic Place Classification
, 2010
"... The ability to represent knowledge about space and its position therein is crucial for a mobile robot. To this end, topological and semantic descriptions are gaining popularity for augmenting purely metric space representations. In this paper we present a multimodal place classification system that ..."
Abstract

Cited by 22 (6 self)
 Add to MetaCart
The ability to represent knowledge about space and its position therein is crucial for a mobile robot. To this end, topological and semantic descriptions are gaining popularity for augmenting purely metric space representations. In this paper we present a multimodal place classification system that allows a mobile robot to identify places and recognize semantic categories in an indoor environment. The system effectively utilizes information from different robotic sensors by fusing multiple visual cues and laser range data. This is achieved using a highlevel cue integration scheme based on a Support Vector Machine (SVM) that learns how to optimally combine and weight each cue. Our multimodal place classification approach can be used to obtain a realtime semantic space labeling system which integrates information over time and space. We perform an extensive experimental evaluation of the method for two different platforms and environments, on a realistic offline database and in a live experiment on an autonomous robot. The results clearly demonstrate the effec
Confidencebased cue integration for visual place recognition
 In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’07
, 2007
"... Abstract — A distinctive feature of intelligent systems is their capability to analyze their level of expertise for a given task; in other words, they know what they know. As a way towards this ambitious goal, this paper presents a recognition algorithm able to measure its own level of confidence an ..."
Abstract

Cited by 20 (9 self)
 Add to MetaCart
Abstract — A distinctive feature of intelligent systems is their capability to analyze their level of expertise for a given task; in other words, they know what they know. As a way towards this ambitious goal, this paper presents a recognition algorithm able to measure its own level of confidence and, in case of uncertainty, to seek for extra information so to increase its own knowledge and ultimately achieve better performance. We focus on the visual place recognition problem for topological localization, and we take an SVM approach. We propose a new method for measuring the confidence level of the classification output, based on the distance of a test image and the average distance of training vectors. This method is combined with a discriminative accumulation scheme for cue integration. We show with extensive experiments that the resulting algorithm achieves better performances for two visual cues than the classic single cue SVM on the same task, while minimising the computational load. More important, our method provides a reliable measure of the level of confidence of the decision. I.
A continuous formulation of intrinsic dimension
, 2003
"... The intrinsic dimension (see, e.g., [29, 11]) has proven to be a suitable descriptor to distinguish between different kind of image structures such as edges, junctions or homogeneous image patches. In this paper, we will show that the intrinsic dimension is spanned by two axes: one axis represents t ..."
Abstract

Cited by 19 (11 self)
 Add to MetaCart
The intrinsic dimension (see, e.g., [29, 11]) has proven to be a suitable descriptor to distinguish between different kind of image structures such as edges, junctions or homogeneous image patches. In this paper, we will show that the intrinsic dimension is spanned by two axes: one axis represents the variance of the spectral energy and one represents the a weighted variance in orientation. Moreover, we will show in section that the topological structure of instrinsic dimension has the form of a triangle. We will review diverse definitions of intrinsic dimension and we will show that they can be subsumed within the above mentioned scheme. We will then give a concrete continous definition of intrinsic dimension that realizes its triangular structure. 1
Active Fusion  A New Method Applied to Remote Sensing Image Interpretation
 Pattern Recognition Letters
, 1996
"... Today's computer vision applications often have to deal with multiple, uncertain, and incomplete visual information. In this paper, we introduce a new method, termed `active fusion', which provides a common framework for active selection and combination of information from multiple sources in order ..."
Abstract

Cited by 14 (6 self)
 Add to MetaCart
Today's computer vision applications often have to deal with multiple, uncertain, and incomplete visual information. In this paper, we introduce a new method, termed `active fusion', which provides a common framework for active selection and combination of information from multiple sources in order to arrive at a reliable result at reasonable costs. The implementation of active fusion on the basis of probability theory, the DempsterShafer theory of evidence and fuzzy sets is discussed. In a sample experiment, active fusion using Bayesian networks is applied to agricultural field classification from multitemporal Landsat imagery. This experiment shows a significant reduction of the number of information sources required for a reliable decision. Keywords: information fusion, image understanding, active fusion, probability theory, Bayesian networks, DempsterShafer theory of evidence, fuzzy sets, fuzzy measures, entropy 1 Motivation Information fusion deals with the integration of info...
Visualization of Multidimensional Shape and Texture Features in Laser Range Data using ComplexValued Gabor Wavelets
, 1995
"... This paper describes a new method for visualization and analysis of multivariate laser range data using complexvalued nonorthogonal Gabor wavelets, principal component analysis and a topological mapping network. The initial data set that provides both shape and texture information is encoded in ..."
Abstract

Cited by 13 (7 self)
 Add to MetaCart
This paper describes a new method for visualization and analysis of multivariate laser range data using complexvalued nonorthogonal Gabor wavelets, principal component analysis and a topological mapping network. The initial data set that provides both shape and texture information is encoded in terms of both amplitude and phase of a complex valued 2D image function. A set of carefully designed oriented Gabor filters performs a decomposition of the data and allows for retrieving local shape and texture features. The feature vector obtained from this method is multidimensional and in order to evaluate similar data features, further subspace methods to transform the data onto visualizable attributes, such as R,G,B, have to be determined. For this purpose, a featurebased visualization pipeline is proposed consisting of principal component analysis, normalization and a topological mapping network. This process finally renders a R,G,B subspace representation of the multidimensional fea...