Results 1  10
of
56
Regularization Theory and Neural Networks Architectures
 Neural Computation
, 1995
"... We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Ba ..."
Abstract

Cited by 382 (31 self)
 Add to MetaCart
We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Basis Functions approximation schemes. This paper shows that regularization networks encompass a much broader range of approximation schemes, including many of the popular general additive models and some of the neural networks. In particular, we introduce new classes of smoothness functionals that lead to different classes of basis functions. Additive splines as well as some tensor product splines can be obtained from appropriate classes of smoothness functionals. Furthermore, the same generalization that extends Radial Basis Functions (RBF) to Hyper Basis Functions (HBF) also leads from additive models to ridge approximation models, containing as special cases Breiman's hinge functions, som...
Regularization networks and support vector machines
 Advances in Computational Mathematics
, 2000
"... Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization a ..."
Abstract

Cited by 350 (34 self)
 Add to MetaCart
(Show Context)
Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization and Support Vector Machines. We review both formulations in the context of Vapnik’s theory of statistical learning which provides a general foundation for the learning problem, combining functional analysis and statistics. The emphasis is on regression: classification is treated as a special case.
Shape quantization and recognition with randomized trees
 NEURAL COMPUTATION
, 1997
"... We explore a new approach to shape recognition based on a virtually infinite family of binary features ("queries") of the image data, designed to accommodate prior information about shape invariance and regularity. Each query corresponds to a spatial arrangement ofseveral local topographic ..."
Abstract

Cited by 259 (19 self)
 Add to MetaCart
We explore a new approach to shape recognition based on a virtually infinite family of binary features ("queries") of the image data, designed to accommodate prior information about shape invariance and regularity. Each query corresponds to a spatial arrangement ofseveral local topographic codes ("tags") which are in themselves too primitive and common to be informative about shape. All the discriminating power derives from relative angles and distances among the tags. The important attributes of the queries are (i) a natural partial ordering corresponding to increasing structure and complexity; (ii) semiinvariance, meaning that most shapes of a given class will answer the same way to two queries which are successive in the ordering; and (iii) stability, since the queries are not based on distinguished points and substructures. No classifier based on the full feature set can be evaluated and it is impossible to determine a priori which arrangements are informative. Our approach is to select informative features and build tree classifiers at the same time by inductive learning. In effect, each tree provides an approximation to the full posterior where the features
The mathematics of learning: Dealing with data
 Notices of the American Mathematical Society
, 2003
"... Draft for the Notices of the AMS Learning is key to developing systems tailored to a broad range of data analysis and information extraction tasks. We outline the mathematical foundations of learning theory and describe a key algorithm of it. 1 ..."
Abstract

Cited by 156 (17 self)
 Add to MetaCart
(Show Context)
Draft for the Notices of the AMS Learning is key to developing systems tailored to a broad range of data analysis and information extraction tasks. We outline the mathematical foundations of learning theory and describe a key algorithm of it. 1
A nonparametric approach to pricing and hedging derivative securities via learning networks
 Journal of Finance
, 1994
"... http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, no ..."
Abstract

Cited by 143 (7 self)
 Add to MetaCart
http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, noncommercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
Incorporating Prior Information in Machine Learning by Creating Virtual Examples
 Proceedings of the IEEE
, 1998
"... One of the key problems in supervised learning is the insufficient size of the training set. The natural way for an intelligent learner to counter this problem and successfully generalize is to exploit prior information that may be available about the domain or that can be learned from prototypical ..."
Abstract

Cited by 58 (3 self)
 Add to MetaCart
One of the key problems in supervised learning is the insufficient size of the training set. The natural way for an intelligent learner to counter this problem and successfully generalize is to exploit prior information that may be available about the domain or that can be learned from prototypical examples. We discuss the notion of using prior knowledge by creating virtual examples and thereby expanding the effective training set size. We show that in some contexts, this idea is mathematically equivalent to incorporating the prior knowledge as a regularizer, suggesting that the strategy is wellmotivated. The process of creating virtual examples in real world pattern recognition tasks is highly nontrivial. We provide demonstrative examples from object recognition and speech recognition to illustrate the idea. 1 Learning from Examples Recently, machine learning techniques have become increasingly popular as an alternative to knowledgebased approaches to artificial intelligence pro...
A unified framework for Regularization Networks and Support Vector Machines
, 1999
"... This report describers research done at the Center for Biological & Computational Learning and the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. This research was sponsored by theN ational Science Foundation under contractN o. IIS9800032, the O#ce ofN aval Res ..."
Abstract

Cited by 56 (12 self)
 Add to MetaCart
(Show Context)
This report describers research done at the Center for Biological & Computational Learning and the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. This research was sponsored by theN ational Science Foundation under contractN o. IIS9800032, the O#ce ofN aval Research under contractN o.N 0001493 10385 and contractN o.N 000149510600. Partial support was also provided by DaimlerBenz AG, Eastman Kodak, Siemens Corporate Research, Inc., ATR and AT&T. Contents Introductic 3 2 OverviF of stati.48EF learni4 theory 5 2.1 Unifo6 Co vergence and the VapnikChervo nenkis bo und ............. 7 2.2 The metho d o Structural Risk Minimizatio ..................... 10 2.3 #unifo8 co vergence and the V # ..................... 10 2.4 Overviewo fo urappro6 h ............................... 13 3 Reproduci9 Kernel HiT ert Spaces: a briL overviE 14 4RegulariEqq.L Networks 16 4.1 Radial Basis Functio8 ................................. 19 4.2 Regularizatioz generalized splines and kernel smo oxy rs .............. 20 4.3 Dual representatio o f Regularizatio Netwo rks ................... 21 4.4 Fro regressioto 5 Support vector machiT9 22 5.1 SVMin RKHS ..................................... 22 5.2 Fro regressioto 6SRMforRNsandSVMs 26 6.1 SRMfo SVMClassificatio .............................. 28 6.1.1 Distributio dependent bo undsfo SVMC .................. 29 7 A BayesiL Interpretatiq ofRegulariTFqEL and SRM? 30 7.1 Maximum A Po terio6 Interpretatio o f ............... 30 7.2 Bayesian interpretatio o f the stabilizer in the RN andSVMfunctio6I6 ...... 32 7.3 Bayesian interpretatio o f the data term in the Regularizatio andSVMfunctioy8 33 7.4 Why a MAP interpretatio may be misleading .................... 33 Connectine between SVMs and Sparse Ap...
Survey of Neural Transfer Functions
 Neural Computing Surveys
, 1999
"... The choice of transfer functions may strongly influence complexity and performance of neural networks. Although sigmoidal transfer functions are the most common there is no apriorireason why models based on such functions should always provide optimal decision borders. A large number of alternative ..."
Abstract

Cited by 49 (21 self)
 Add to MetaCart
(Show Context)
The choice of transfer functions may strongly influence complexity and performance of neural networks. Although sigmoidal transfer functions are the most common there is no apriorireason why models based on such functions should always provide optimal decision borders. A large number of alternative transfer functions has been described in the literature. A taxonomy of activation and output functions is proposed, and advantages of various nonlocal and local neural transfer functions are discussed. Several lessknown types of transfer functions and new combinations of activation/output functions are described. Universal transfer functions, parametrized to change from localized to delocalized type, are of greatest interest. Other types of neural transfer functions discussed here include functions with activations based on nonEuclidean distance measures, bicentral functions, formed from products or linear combinations of pairs of sigmoids, and extensions of such functions making rotations...
Best choices for regularization parameters in learning theory: on the biasvariance problem
 Foundations of Computationals Mathematics
"... The goal of learning theory (and a goal in some other contexts as well) is to find an approximation of a function fρ: X → Y known only through a set of pairs z = (xi, yi) m i=1 drawn from an unknown probability measure ρ on X×Y ( fρ is the “regression function ” of ρ). ..."
Abstract

Cited by 48 (11 self)
 Add to MetaCart
(Show Context)
The goal of learning theory (and a goal in some other contexts as well) is to find an approximation of a function fρ: X → Y known only through a set of pairs z = (xi, yi) m i=1 drawn from an unknown probability measure ρ on X×Y ( fρ is the “regression function ” of ρ).
Adaptive Stochastic Resonance
 Proceedings of the IEEE: special issue on intelligent signal processing
, 1998
"... This paper shows how adaptive systems can learn to add an optimal amount of noise to some nonlinear feedback systems. Noise can improve the signaltonoise ratio of many nonlinear dynamical systems. This "stochastic resonance" effect occurs in a wide range of physical and biological system ..."
Abstract

Cited by 31 (11 self)
 Add to MetaCart
This paper shows how adaptive systems can learn to add an optimal amount of noise to some nonlinear feedback systems. Noise can improve the signaltonoise ratio of many nonlinear dynamical systems. This "stochastic resonance" effect occurs in a wide range of physical and biological systems. The SR effect may also occur in engineering systems in signal processing, communications, and control. The noise energy can enhance the faint periodic signals or faint broadband signals that force the dynamical systems. Most SR studies assume full knowledge of a system's dynamics and its noise and signal structure. Fuzzy and other adaptive systems can learn to induce SR based only on samples from the process. These samples can tune a fuzzy system's ifthen rules so that the fuzzy system approximates the dynamical system and its noise response. The paper derives the SR optimality conditions that any stochastic learning system should try to achieve. The adaptive system learns the SR effect as the sys...