Results 1 - 10
of
19
Regularization networks and support vector machines
- Advances in Computational Mathematics
, 2000
"... Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization a ..."
Abstract
-
Cited by 215 (28 self)
- Add to MetaCart
Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization and Support Vector Machines. We review both formulations in the context of Vapnik’s theory of statistical learning which provides a general foundation for the learning problem, combining functional analysis and statistics. The emphasis is on regression: classification is treated as a special case.
Online Convex Programming and Generalized Infinitesimal Gradient Ascent
, 2003
"... Convex programming involves a convex set F R and a convex function c : F ! R. The goal of convex programming is to nd a point in F which minimizes c. In this paper, we introduce online convex programming. In online convex programming, the convex set is known in advance, but in each step of some ..."
Abstract
-
Cited by 125 (3 self)
- Add to MetaCart
Convex programming involves a convex set F R and a convex function c : F ! R. The goal of convex programming is to nd a point in F which minimizes c. In this paper, we introduce online convex programming. In online convex programming, the convex set is known in advance, but in each step of some repeated optimization problem, one must select a point in F before seeing the cost function for that step. This can be used to model factory production, farm production, and many other industrial optimization problems where one is unaware of the value of the items produced until they have already been constructed. We introduce an algorithm for this domain, apply it to repeated games, and show that it is really a generalization of in nitesimal gradient ascent, and the results here imply that generalized in nitesimal gradient ascent (GIGA) is universally consistent.
A Natural Policy Gradient
"... We provide a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space. Although gradient methods cannot make large changes in the values of the parameters, we show that the natural gradient is moving toward choosing a greedy opt ..."
Abstract
-
Cited by 85 (0 self)
- Add to MetaCart
We provide a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space. Although gradient methods cannot make large changes in the values of the parameters, we show that the natural gradient is moving toward choosing a greedy optimal action rather than just a better action. These greedy optimal actions are those that would be chosen under one improvement step of policy iteration with approximate, compatible value functions, as deo/ned by Sutton et al. [9]. We then show drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris.
Bankruptcy Analysis with Self-Organizing Maps in Learning Metrics
- IEEE Transactions on Neural Networks
, 2001
"... We introduce a method for deriving a metric, locally based on the Fisher information matrix, into the data space. A Self-Organizing Map is computed in the new metric to explore financial statements of enterprises. The metric measures local distances in terms of changes in the distribution of an auxi ..."
Abstract
-
Cited by 46 (19 self)
- Add to MetaCart
We introduce a method for deriving a metric, locally based on the Fisher information matrix, into the data space. A Self-Organizing Map is computed in the new metric to explore financial statements of enterprises. The metric measures local distances in terms of changes in the distribution of an auxiliary random variable that reflects what is important in the data. In this paper the variable indicates bankruptcy within the next few years. The conditional density of the auxiliary variable is first estimated, and the change in the estimate resulting from local displacements in the primary data space is measured using the Fisher information matrix. When a Self-Organizing Map is computed in the new metric it still visualizes the data space in a topology-preserving fashion, but represents the (local) directions in which the probability of bankruptcy changes the most.
A unified framework for Regularization Networks and Support Vector Machines
, 1999
"... This report describers research done at the Center for Biological & Computational Learning and the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. This research was sponsored by theN ational Science Foundation under contractN o. IIS-9800032, the O#ce ofN aval Researc ..."
Abstract
-
Cited by 40 (11 self)
- Add to MetaCart
This report describers research done at the Center for Biological & Computational Learning and the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. This research was sponsored by theN ational Science Foundation under contractN o. IIS-9800032, the O#ce ofN aval Research under contractN o.N 0001493 -1-0385 and contractN o.N 00014-95-1-0600. Partial support was also provided by Daimler-Benz AG, Eastman Kodak, Siemens Corporate Research, Inc., ATR and AT&T. Contents Introductic 3 2 OverviF of stati.48EF learni4 theory 5 2.1 Unifo6 Co vergence and the Vapnik-Chervo nenkis bo und ............. 7 2.2 The metho d o Structural Risk Minimizatio ..................... 10 2.3 #-unifo8 co vergence and the V # ..................... 10 2.4 Overviewo fo urappro6 h ............................... 13 3 Reproduci9 Kernel HiT ert Spaces: a briL overviE 14 4RegulariEqq.L Networks 16 4.1 Radial Basis Functio8 ................................. 19 4.2 Regularizatioz generalized splines and kernel smo oxy rs .............. 20 4.3 Dual representatio o f Regularizatio Netwo rks ................... 21 4.4 Fro regressioto 5 Support vector machiT9 22 5.1 SVMin RKHS ..................................... 22 5.2 Fro regressioto 6SRMforRNsandSVMs 26 6.1 SRMfo SVMClassificatio .............................. 28 6.1.1 Distributio dependent bo undsfo SVMC .................. 29 7 A BayesiL Interpretatiq ofRegulariTFqEL and SRM? 30 7.1 Maximum A Po terio6 Interpretatio o f ............... 30 7.2 Bayesian interpretatio o f the stabilizer in the RN andSVMfunctio6I6 ...... 32 7.3 Bayesian interpretatio o f the data term in the Regularizatio andSVMfunctioy8 33 7.4 Why a MAP interpretatio may be misleading .................... 33 Connectine between SVMs and Sparse Ap...
Flexible Independent Component Analysis
, 2000
"... This paper addresses an independent component analysis (ICA) learning algorithm with flexible nonlinearity, so named as flexible ICA, that is able to separate instantaneous mixtures of suband super-Gaussian source signals. In the framework of natural Riemannian gradient, we employ the parameterized ..."
Abstract
-
Cited by 32 (13 self)
- Add to MetaCart
This paper addresses an independent component analysis (ICA) learning algorithm with flexible nonlinearity, so named as flexible ICA, that is able to separate instantaneous mixtures of suband super-Gaussian source signals. In the framework of natural Riemannian gradient, we employ the parameterized generalized Gaussian density model for hypothesized source distributions. The nonlinear function in the flexible ICA algorithm is controlled by the Gaussian exponent according to the estimated kurtosis of demixing filter output. Computer simulation results and performance comparison with existing methods are presented.
Multichannel Signal Separation for Cocktail Party Speech Recognition: A Dynamic Recurrent Network
, 2000
"... This paper addresses a method of multichannel signal separation (MSS) with its application to cocktail party speech recognition. First, we present a fundamental principle for multichannel signal separation which uses the spatial independence of located sources as well as the temporal dependence o ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
This paper addresses a method of multichannel signal separation (MSS) with its application to cocktail party speech recognition. First, we present a fundamental principle for multichannel signal separation which uses the spatial independence of located sources as well as the temporal dependence of speech signals. Second, for practical implementation of the signal separation lter, we consider a dynamic recurrent network and develop a simple new learning algorithm. The performance of the proposed method is evaluated in terms of word recognition error rate (WER) in a large speech recognition experiment. The results show that our proposed method dramatically improves the word recognition performance in the case of two simultaneous speech inputs, and that a timing eect is involved in the segregation process. Indexing Terms: Blind signal separation, cocktail party speech recognition, dynamic recurrent networks, multichannel signal separation. submitted to Special Issue, Blind Si...
On-line Learning in Changing Environments with Applications in Supervised and Unsupervised Learning
, 2002
"... An adaptive on-line algorithm extending the learning of learning idea is proposed and theoretically motivated. Relying only on gradient ow information it can be applied to learning continuous functions or distributions, even when no explicit loss function is given and the Hessian is not available. T ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
An adaptive on-line algorithm extending the learning of learning idea is proposed and theoretically motivated. Relying only on gradient ow information it can be applied to learning continuous functions or distributions, even when no explicit loss function is given and the Hessian is not available. The framework is applied for unsupervised and supervised learning. Its eciency is demonstrated for drifting and switching non-stationary blind separation tasks of acoustic signals. Furthermore applications to classi cation (USPS data set) and time-series prediction in changing environments are presented.
Incremental projection learning for optimal generalization
- Neural Networks
, 2001
"... In many practical situations in supervised learning, it is often expected to further improve the generalization capability after the learning process has been completed. One of the common approaches to improving the generalization capability is to add training examples. In view of the learning metho ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
In many practical situations in supervised learning, it is often expected to further improve the generalization capability after the learning process has been completed. One of the common approaches to improving the generalization capability is to add training examples. In view of the learning methods of human beings, it seems natural to build posterior learning results upon prior results, which is generally referred to as incremental learning. In this paper, a method of incremental projection learning (IPL) is presented. IPL provides exactly the same learning result as that obtained by batch projection learning. The effectiveness of the presented method is demonstrated through computer simulations.
Maximum Likelihood Estimation Of Ica Model For Wide Class Of Source Distributions
- in Signal Processing
, 2000
"... We propose two blind source separation techniques that are applicable to a wide class of source distributions that may also be skewed and may even have zero kurtosis. Skewed distributions are encountered in many important application areas such as communications and biomedical signal processing. The ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We propose two blind source separation techniques that are applicable to a wide class of source distributions that may also be skewed and may even have zero kurtosis. Skewed distributions are encountered in many important application areas such as communications and biomedical signal processing. The methods are based on maximum likelihood approach where source distributions are modeled adaptively by the Pearson system and the Extended Generalized Lambda Distribution (EGLD). To compare the developed methods with the existing methods, quantitative measures for the quality of separation are used. Simulation experiments demonstrate the good performance of proposed methods in the cases where the standard BSS methods perform poorly.

