Results 1  10
of
19
Regularization networks and support vector machines
 Advances in Computational Mathematics
, 2000
"... Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization a ..."
Abstract

Cited by 266 (33 self)
 Add to MetaCart
Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization and Support Vector Machines. We review both formulations in the context of Vapnik’s theory of statistical learning which provides a general foundation for the learning problem, combining functional analysis and statistics. The emphasis is on regression: classification is treated as a special case.
Online Convex Programming and Generalized Infinitesimal Gradient Ascent
, 2003
"... Convex programming involves a convex set F R and a convex function c : F ! R. The goal of convex programming is to nd a point in F which minimizes c. In this paper, we introduce online convex programming. In online convex programming, the convex set is known in advance, but in each step of some ..."
Abstract

Cited by 183 (4 self)
 Add to MetaCart
Convex programming involves a convex set F R and a convex function c : F ! R. The goal of convex programming is to nd a point in F which minimizes c. In this paper, we introduce online convex programming. In online convex programming, the convex set is known in advance, but in each step of some repeated optimization problem, one must select a point in F before seeing the cost function for that step. This can be used to model factory production, farm production, and many other industrial optimization problems where one is unaware of the value of the items produced until they have already been constructed. We introduce an algorithm for this domain, apply it to repeated games, and show that it is really a generalization of in nitesimal gradient ascent, and the results here imply that generalized in nitesimal gradient ascent (GIGA) is universally consistent.
A Natural Policy Gradient
"... We provide a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space. Although gradient methods cannot make large changes in the values of the parameters, we show that the natural gradient is moving toward choosing a greedy opt ..."
Abstract

Cited by 106 (0 self)
 Add to MetaCart
We provide a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space. Although gradient methods cannot make large changes in the values of the parameters, we show that the natural gradient is moving toward choosing a greedy optimal action rather than just a better action. These greedy optimal actions are those that would be chosen under one improvement step of policy iteration with approximate, compatible value functions, as deo/ned by Sutton et al. [9]. We then show drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris.
A unified framework for Regularization Networks and Support Vector Machines
, 1999
"... This report describers research done at the Center for Biological & Computational Learning and the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. This research was sponsored by theN ational Science Foundation under contractN o. IIS9800032, the O#ce ofN aval Researc ..."
Abstract

Cited by 50 (13 self)
 Add to MetaCart
This report describers research done at the Center for Biological & Computational Learning and the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. This research was sponsored by theN ational Science Foundation under contractN o. IIS9800032, the O#ce ofN aval Research under contractN o.N 0001493 10385 and contractN o.N 000149510600. Partial support was also provided by DaimlerBenz AG, Eastman Kodak, Siemens Corporate Research, Inc., ATR and AT&T. Contents Introductic 3 2 OverviF of stati.48EF learni4 theory 5 2.1 Unifo6 Co vergence and the VapnikChervo nenkis bo und ............. 7 2.2 The metho d o Structural Risk Minimizatio ..................... 10 2.3 #unifo8 co vergence and the V # ..................... 10 2.4 Overviewo fo urappro6 h ............................... 13 3 Reproduci9 Kernel HiT ert Spaces: a briL overviE 14 4RegulariEqq.L Networks 16 4.1 Radial Basis Functio8 ................................. 19 4.2 Regularizatioz generalized splines and kernel smo oxy rs .............. 20 4.3 Dual representatio o f Regularizatio Netwo rks ................... 21 4.4 Fro regressioto 5 Support vector machiT9 22 5.1 SVMin RKHS ..................................... 22 5.2 Fro regressioto 6SRMforRNsandSVMs 26 6.1 SRMfo SVMClassificatio .............................. 28 6.1.1 Distributio dependent bo undsfo SVMC .................. 29 7 A BayesiL Interpretatiq ofRegulariTFqEL and SRM? 30 7.1 Maximum A Po terio6 Interpretatio o f ............... 30 7.2 Bayesian interpretatio o f the stabilizer in the RN andSVMfunctio6I6 ...... 32 7.3 Bayesian interpretatio o f the data term in the Regularizatio andSVMfunctioy8 33 7.4 Why a MAP interpretatio may be misleading .................... 33 Connectine between SVMs and Sparse Ap...
Bankruptcy Analysis with SelfOrganizing Maps in Learning Metrics
 IEEE Transactions on Neural Networks
, 2001
"... We introduce a method for deriving a metric, locally based on the Fisher information matrix, into the data space. A SelfOrganizing Map is computed in the new metric to explore financial statements of enterprises. The metric measures local distances in terms of changes in the distribution of an auxi ..."
Abstract

Cited by 48 (19 self)
 Add to MetaCart
We introduce a method for deriving a metric, locally based on the Fisher information matrix, into the data space. A SelfOrganizing Map is computed in the new metric to explore financial statements of enterprises. The metric measures local distances in terms of changes in the distribution of an auxiliary random variable that reflects what is important in the data. In this paper the variable indicates bankruptcy within the next few years. The conditional density of the auxiliary variable is first estimated, and the change in the estimate resulting from local displacements in the primary data space is measured using the Fisher information matrix. When a SelfOrganizing Map is computed in the new metric it still visualizes the data space in a topologypreserving fashion, but represents the (local) directions in which the probability of bankruptcy changes the most.
Flexible Independent Component Analysis
, 2000
"... This paper addresses an independent component analysis (ICA) learning algorithm with flexible nonlinearity, so named as flexible ICA, that is able to separate instantaneous mixtures of suband superGaussian source signals. In the framework of natural Riemannian gradient, we employ the parameterized ..."
Abstract

Cited by 43 (13 self)
 Add to MetaCart
This paper addresses an independent component analysis (ICA) learning algorithm with flexible nonlinearity, so named as flexible ICA, that is able to separate instantaneous mixtures of suband superGaussian source signals. In the framework of natural Riemannian gradient, we employ the parameterized generalized Gaussian density model for hypothesized source distributions. The nonlinear function in the flexible ICA algorithm is controlled by the Gaussian exponent according to the estimated kurtosis of demixing filter output. Computer simulation results and performance comparison with existing methods are presented.
Incremental projection learning for optimal generalization
 Neural Networks
, 2001
"... In many practical situations in supervised learning, it is often expected to further improve the generalization capability after the learning process has been completed. One of the common approaches to improving the generalization capability is to add training examples. In view of the learning metho ..."
Abstract

Cited by 12 (6 self)
 Add to MetaCart
In many practical situations in supervised learning, it is often expected to further improve the generalization capability after the learning process has been completed. One of the common approaches to improving the generalization capability is to add training examples. In view of the learning methods of human beings, it seems natural to build posterior learning results upon prior results, which is generally referred to as incremental learning. In this paper, a method of incremental projection learning (IPL) is presented. IPL provides exactly the same learning result as that obtained by batch projection learning. The effectiveness of the presented method is demonstrated through computer simulations.
Multichannel Signal Separation for Cocktail Party Speech Recognition: A Dynamic Recurrent Network
, 2000
"... This paper addresses a method of multichannel signal separation (MSS) with its application to cocktail party speech recognition. First, we present a fundamental principle for multichannel signal separation which uses the spatial independence of located sources as well as the temporal dependence o ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
This paper addresses a method of multichannel signal separation (MSS) with its application to cocktail party speech recognition. First, we present a fundamental principle for multichannel signal separation which uses the spatial independence of located sources as well as the temporal dependence of speech signals. Second, for practical implementation of the signal separation lter, we consider a dynamic recurrent network and develop a simple new learning algorithm. The performance of the proposed method is evaluated in terms of word recognition error rate (WER) in a large speech recognition experiment. The results show that our proposed method dramatically improves the word recognition performance in the case of two simultaneous speech inputs, and that a timing eect is involved in the segregation process. Indexing Terms: Blind signal separation, cocktail party speech recognition, dynamic recurrent networks, multichannel signal separation. submitted to Special Issue, Blind Si...
Online Learning in Changing Environments with Applications in Supervised and Unsupervised Learning
, 2002
"... An adaptive online algorithm extending the learning of learning idea is proposed and theoretically motivated. Relying only on gradient flow information it can be applied to learning continuous functions or distributions, even when no explicit loss function is given and the Hessian is not available. ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
An adaptive online algorithm extending the learning of learning idea is proposed and theoretically motivated. Relying only on gradient flow information it can be applied to learning continuous functions or distributions, even when no explicit loss function is given and the Hessian is not available. The framework is applied for unsupervised and supervised learning. Its efficiency is demonstrated for drifting and switching nonstationary blind separation tasks of acoustic signals. Furthermore applications to classification (USPS data set) and timeseries prediction in changing environments are presented.
Gaussianization
, 2001
"... High dimensional data modeling is difficult mainly because the socalled "curse of dimensionality". We propose a technique called "Gaussianization " for high dimensional density estimation, which alleviates the curse of dimensionality by exploiting the independence structures in the data. Gaussi ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
High dimensional data modeling is difficult mainly because the socalled "curse of dimensionality". We propose a technique called "Gaussianization " for high dimensional density estimation, which alleviates the curse of dimensionality by exploiting the independence structures in the data. Gaussianization is motivated from recent developments in the statistics literature: projection pursuit, independent component analysis and Gaussian mixture models with semitied covariances. We propose an iterative Gaussianization procedure which converges weakly: at each iteration, the data is first transformed to the least dependent coordinates and then each coordinate is marginally Gaussianized by univariate techniques. Gaussianization offers density estimation sharper than traditional kernel methods and radial basis function methods. Gaussianization can be viewed as efficient solution of nonlinear independent component analysis and high dimensional projection pursuit. 1