Results 1  10
of
37
Regularization Theory and Neural Networks Architectures
 Neural Computation
, 1995
"... We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Ba ..."
Abstract

Cited by 309 (31 self)
 Add to MetaCart
We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Basis Functions approximation schemes. This paper shows that regularization networks encompass a much broader range of approximation schemes, including many of the popular general additive models and some of the neural networks. In particular, we introduce new classes of smoothness functionals that lead to different classes of basis functions. Additive splines as well as some tensor product splines can be obtained from appropriate classes of smoothness functionals. Furthermore, the same generalization that extends Radial Basis Functions (RBF) to Hyper Basis Functions (HBF) also leads from additive models to ridge approximation models, containing as special cases Breiman's hinge functions, som...
Regularization networks and support vector machines
 Advances in Computational Mathematics
, 2000
"... Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization a ..."
Abstract

Cited by 266 (33 self)
 Add to MetaCart
Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization and Support Vector Machines. We review both formulations in the context of Vapnik’s theory of statistical learning which provides a general foundation for the learning problem, combining functional analysis and statistics. The emphasis is on regression: classification is treated as a special case.
Shape quantization and recognition with randomized trees
 Neural Computation
, 1997
"... We explore a new approach to shape recognition based on a virtually in nite family of binary features (\queries") of the image data, designed to accommodate prior information about shape invariance and regularity. Each query corresponds to a spatial arrangement ofseveral local topographic code ..."
Abstract

Cited by 179 (17 self)
 Add to MetaCart
We explore a new approach to shape recognition based on a virtually in nite family of binary features (\queries") of the image data, designed to accommodate prior information about shape invariance and regularity. Each query corresponds to a spatial arrangement ofseveral local topographic codes (\tags") which are in themselves too primitive and common to be informative about shape. All the discriminating power derives from relative angles and distances among the tags. The important attributes of the queries are (i) a natural partial ordering corresponding to increasing structure and complexity � (ii) semiinvariance, meaning that most shapes of a given class will answer the same way totwo queries which are successive in the ordering � and (iii) stability, since the queries are not based on distinguished points and substructures. No classi er based on the full feature set can be evaluated and it is impossible to determine a priori which arrangements are informative. Our approach istoselect informative features and build tree classi ers at the same time by inductive learning. In e ect, each tree provides an approximation to the full posterior where the features
A nonparametric approach to pricing and hedging derivative securities via learning networks
 Journal of Finance
, 1994
"... http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, noncom ..."
Abstract

Cited by 104 (4 self)
 Add to MetaCart
http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, noncommercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
The mathematics of learning: Dealing with data
 Notices of the American Mathematical Society
, 2003
"... Draft for the Notices of the AMS Learning is key to developing systems tailored to a broad range of data analysis and information extraction tasks. We outline the mathematical foundations of learning theory and describe a key algorithm of it. 1 ..."
Abstract

Cited by 103 (15 self)
 Add to MetaCart
Draft for the Notices of the AMS Learning is key to developing systems tailored to a broad range of data analysis and information extraction tasks. We outline the mathematical foundations of learning theory and describe a key algorithm of it. 1
A unified framework for Regularization Networks and Support Vector Machines
, 1999
"... This report describers research done at the Center for Biological & Computational Learning and the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. This research was sponsored by theN ational Science Foundation under contractN o. IIS9800032, the O#ce ofN aval Researc ..."
Abstract

Cited by 50 (13 self)
 Add to MetaCart
This report describers research done at the Center for Biological & Computational Learning and the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. This research was sponsored by theN ational Science Foundation under contractN o. IIS9800032, the O#ce ofN aval Research under contractN o.N 0001493 10385 and contractN o.N 000149510600. Partial support was also provided by DaimlerBenz AG, Eastman Kodak, Siemens Corporate Research, Inc., ATR and AT&T. Contents Introductic 3 2 OverviF of stati.48EF learni4 theory 5 2.1 Unifo6 Co vergence and the VapnikChervo nenkis bo und ............. 7 2.2 The metho d o Structural Risk Minimizatio ..................... 10 2.3 #unifo8 co vergence and the V # ..................... 10 2.4 Overviewo fo urappro6 h ............................... 13 3 Reproduci9 Kernel HiT ert Spaces: a briL overviE 14 4RegulariEqq.L Networks 16 4.1 Radial Basis Functio8 ................................. 19 4.2 Regularizatioz generalized splines and kernel smo oxy rs .............. 20 4.3 Dual representatio o f Regularizatio Netwo rks ................... 21 4.4 Fro regressioto 5 Support vector machiT9 22 5.1 SVMin RKHS ..................................... 22 5.2 Fro regressioto 6SRMforRNsandSVMs 26 6.1 SRMfo SVMClassificatio .............................. 28 6.1.1 Distributio dependent bo undsfo SVMC .................. 29 7 A BayesiL Interpretatiq ofRegulariTFqEL and SRM? 30 7.1 Maximum A Po terio6 Interpretatio o f ............... 30 7.2 Bayesian interpretatio o f the stabilizer in the RN andSVMfunctio6I6 ...... 32 7.3 Bayesian interpretatio o f the data term in the Regularizatio andSVMfunctioy8 33 7.4 Why a MAP interpretatio may be misleading .................... 33 Connectine between SVMs and Sparse Ap...
Incorporating Prior Information in Machine Learning by Creating Virtual Examples
 Proceedings of the IEEE
, 1998
"... One of the key problems in supervised learning is the insufficient size of the training set. The natural way for an intelligent learner to counter this problem and successfully generalize is to exploit prior information that may be available about the domain or that can be learned from prototypical ..."
Abstract

Cited by 42 (2 self)
 Add to MetaCart
One of the key problems in supervised learning is the insufficient size of the training set. The natural way for an intelligent learner to counter this problem and successfully generalize is to exploit prior information that may be available about the domain or that can be learned from prototypical examples. We discuss the notion of using prior knowledge by creating virtual examples and thereby expanding the effective training set size. We show that in some contexts, this idea is mathematically equivalent to incorporating the prior knowledge as a regularizer, suggesting that the strategy is wellmotivated. The process of creating virtual examples in real world pattern recognition tasks is highly nontrivial. We provide demonstrative examples from object recognition and speech recognition to illustrate the idea. 1 Learning from Examples Recently, machine learning techniques have become increasingly popular as an alternative to knowledgebased approaches to artificial intelligence pro...
Best choices for regularization parameters in learning theory: on the biasvariance problem
 Foundations of Computationals Mathematics
"... The goal of learning theory (and a goal in some other contexts as well) is to find an approximation of a function fρ: X → Y known only through a set of pairs z = (xi, yi) m i=1 drawn from an unknown probability measure ρ on X×Y ( fρ is the “regression function ” of ρ). ..."
Abstract

Cited by 40 (9 self)
 Add to MetaCart
The goal of learning theory (and a goal in some other contexts as well) is to find an approximation of a function fρ: X → Y known only through a set of pairs z = (xi, yi) m i=1 drawn from an unknown probability measure ρ on X×Y ( fρ is the “regression function ” of ρ).
Survey of Neural Transfer Functions
 Neural Computing Surveys
, 1999
"... The choice of transfer functions may strongly influence complexity and performance of neural networks. Although sigmoidal transfer functions are the most common there is no apriorireason why models based on such functions should always provide optimal decision borders. A large number of alternative ..."
Abstract

Cited by 35 (19 self)
 Add to MetaCart
The choice of transfer functions may strongly influence complexity and performance of neural networks. Although sigmoidal transfer functions are the most common there is no apriorireason why models based on such functions should always provide optimal decision borders. A large number of alternative transfer functions has been described in the literature. A taxonomy of activation and output functions is proposed, and advantages of various nonlocal and local neural transfer functions are discussed. Several lessknown types of transfer functions and new combinations of activation/output functions are described. Universal transfer functions, parametrized to change from localized to delocalized type, are of greatest interest. Other types of neural transfer functions discussed here include functions with activations based on nonEuclidean distance measures, bicentral functions, formed from products or linear combinations of pairs of sigmoids, and extensions of such functions making rotations...
Learning and Approximation Capabilities of Adaptive Spline Activation Function Neural Networks
 NEURAL NETWORKS
, 1998
"... In this paper, we study the theoretical properties of a new kind of artificial neural network, which is able to adapt its activation functions by varying the control points of a Catmull Rom cubic spline. Most of all, we are interested in generalization capability, and we can show that our architec ..."
Abstract

Cited by 28 (19 self)
 Add to MetaCart
In this paper, we study the theoretical properties of a new kind of artificial neural network, which is able to adapt its activation functions by varying the control points of a Catmull Rom cubic spline. Most of all, we are interested in generalization capability, and we can show that our architecture presents several advantages. First of all, it can be seen as a suboptimal realization of the additive spline based model obtained by the reguralization theory. Besides, simulations confirm that the special learning mechanism allows to use in a very effective way the network's free parameters, keeping their total number at lower values than in networks with sigmoidal activation functions. Other notable properties are a shorter training time and a reduced hardware complexity, due to the surplus in the number of neurons. # 1998 Elsevier Science Ltd. All rights reserved. Keywords: Spline neural networks; Multilayer perceptron; Generalized sigmoidal functions; Adaptive activation functions...