Results 1  10
of
446
Exploiting Generative Models in Discriminative Classifiers
 In Advances in Neural Information Processing Systems 11
, 1998
"... Generative probability models such as hidden Markov models provide a principled way of treating missing information and dealing with variable length sequences. On the other hand, discriminative methods such as support vector machines enable us to construct flexible decision boundaries and often resu ..."
Abstract

Cited by 551 (9 self)
 Add to MetaCart
Generative probability models such as hidden Markov models provide a principled way of treating missing information and dealing with variable length sequences. On the other hand, discriminative methods such as support vector machines enable us to construct flexible decision boundaries and often result in classification performance superior to that of the model based approaches. An ideal classifier should combine these two complementary approaches. In this paper, we develop a natural way of achieving this combination by deriving kernel functions for use in discriminative methods such as support vector machines from generative probability models. We provide a theoretical justification for this combination as well as demonstrate a substantial improvement in the classification performance in the context of DNA and protein sequence analysis.
Pegasos: Primal Estimated subgradient solver for SVM
"... We describe and analyze a simple and effective stochastic subgradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy ɛ is Õ(1/ɛ), where each iteration operates on a singl ..."
Abstract

Cited by 542 (20 self)
 Add to MetaCart
We describe and analyze a simple and effective stochastic subgradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy ɛ is Õ(1/ɛ), where each iteration operates on a single training example. In contrast, previous analyses of stochastic gradient descent methods for SVMs require Ω(1/ɛ2) iterations. As in previously devised SVM solvers, the number of iterations also scales linearly with 1/λ, where λ is the regularization parameter of SVM. For a linear kernel, the total runtime of our method is Õ(d/(λɛ)), where d is a bound on the number of nonzero features in each example. Since the runtime does not depend directly on the size of the training set, the resulting algorithm is especially suited for learning from large datasets. Our approach also extends to nonlinear kernels while working solely on the primal objective function, though in this case the runtime does depend linearly on the training set size. Our algorithm is particularly well suited for large text classification problems, where we demonstrate an orderofmagnitude speedup over previous SVM learning methods.
Blind Signal Separation: Statistical Principles
, 2003
"... Blind signal separation (BSS) and independent component analysis (ICA) are emerging techniques of array processing and data analysis, aiming at recovering unobserved signals or `sources' from observed mixtures (typically, the output of an array of sensors), exploiting only the assumption of mut ..."
Abstract

Cited by 529 (4 self)
 Add to MetaCart
Blind signal separation (BSS) and independent component analysis (ICA) are emerging techniques of array processing and data analysis, aiming at recovering unobserved signals or `sources' from observed mixtures (typically, the output of an array of sensors), exploiting only the assumption of mutual independence between the signals. The weakness of the assumptions makes it a powerful approach but requires to venture beyond familiar second order statistics. The objective of this paper is to review some of the approaches that have been recently developed to address this exciting problem, to show how they stem from basic principles and how they relate to each other.
Regularization networks and support vector machines
 Advances in Computational Mathematics
, 2000
"... Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization a ..."
Abstract

Cited by 366 (38 self)
 Add to MetaCart
(Show Context)
Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization and Support Vector Machines. We review both formulations in the context of Vapnik’s theory of statistical learning which provides a general foundation for the learning problem, combining functional analysis and statistics. The emphasis is on regression: classification is treated as a special case.
Independent Component Analysis Using an Extended Infomax Algorithm for Mixed Subgaussian and Supergaussian Sources
, 1999
"... An extension of the infomax algorithm of Bell and Sejnowski (1995) is presented that is able blindly to separate mixed signals with sub and supergaussian source distributions. This was achieved by using a simple type of learning rule first derived by Girolami (1997) by choosing negentropy as a proj ..."
Abstract

Cited by 314 (22 self)
 Add to MetaCart
An extension of the infomax algorithm of Bell and Sejnowski (1995) is presented that is able blindly to separate mixed signals with sub and supergaussian source distributions. This was achieved by using a simple type of learning rule first derived by Girolami (1997) by choosing negentropy as a projection pursuit index. Parameterized probability distributions that have sub and supergaussian regimes were used to derive a general learning rule that preserves the simple architecture proposed by Bell and Sejnowski (1995), is optimized using the natural gradient by Amari (1998), and uses the stability analysis of Cardoso and Laheld (1996) to switch between sub and supergaussian regimes. We demonstrate that the extended infomax algorithm is able to separate 20 sources with a variety of source distributions easily. Applied to highdimensional data from electroencephalographic recordings, it is effective at separating artifacts such as eye blinks and line noise from weaker electrical signals that arise from sources in the brain.
Online Convex Programming and Generalized Infinitesimal Gradient Ascent
, 2003
"... Convex programming involves a convex set F R and a convex function c : F ! R. The goal of convex programming is to nd a point in F which minimizes c. In this paper, we introduce online convex programming. In online convex programming, the convex set is known in advance, but in each step of some ..."
Abstract

Cited by 298 (4 self)
 Add to MetaCart
(Show Context)
Convex programming involves a convex set F R and a convex function c : F ! R. The goal of convex programming is to nd a point in F which minimizes c. In this paper, we introduce online convex programming. In online convex programming, the convex set is known in advance, but in each step of some repeated optimization problem, one must select a point in F before seeing the cost function for that step. This can be used to model factory production, farm production, and many other industrial optimization problems where one is unaware of the value of the items produced until they have already been constructed. We introduce an algorithm for this domain, apply it to repeated games, and show that it is really a generalization of in nitesimal gradient ascent, and the results here imply that generalized in nitesimal gradient ascent (GIGA) is universally consistent.
Efficient BackProp
, 1998
"... . The convergence of backpropagation learning is analyzed so as to explain common phenomenon observed by practitioners. Many undesirable behaviors of backprop can be avoided with tricks that are rarely exposed in serious technical publications. This paper gives some of those tricks, and offers expl ..."
Abstract

Cited by 215 (29 self)
 Add to MetaCart
. The convergence of backpropagation learning is analyzed so as to explain common phenomenon observed by practitioners. Many undesirable behaviors of backprop can be avoided with tricks that are rarely exposed in serious technical publications. This paper gives some of those tricks, and offers explanations of why they work. Many authors have suggested that secondorder optimization methods are advantageous for neural net training. It is shown that most "classical" secondorder methods are impractical for large neural networks. A few methods are proposed that do not have these limitations. 1 Introduction Backpropagation is a very popular neural network learning algorithm because it is conceptually simple, computationally efficient, and because it often works. However, getting it to work well, and sometimes to work at all, can seem more of an art than a science. Designing and training a network using backprop requires making many seemingly arbitrary choices such as the number ...
Representation learning: A review and new perspectives.
 of IEEE Conf. Comp. Vision Pattern Recog. (CVPR),
, 2005
"... AbstractThe success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can b ..."
Abstract

Cited by 173 (4 self)
 Add to MetaCart
AbstractThe success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representationlearning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. This motivates longer term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation, and manifold learning.
Blind Separation of Instantaneous Mixtures of Non Stationary Sources
 IEEE Trans. Signal Processing
, 2000
"... Most ICA algorithms are based on a model of stationary sources. This paper considers exploiting the (possible) nonstationarity of the sources to achieve separation. We introduce two objective functions based on the likelihood and on mutual information in a simple Gaussian non stationary model and w ..."
Abstract

Cited by 167 (12 self)
 Add to MetaCart
(Show Context)
Most ICA algorithms are based on a model of stationary sources. This paper considers exploiting the (possible) nonstationarity of the sources to achieve separation. We introduce two objective functions based on the likelihood and on mutual information in a simple Gaussian non stationary model and we show how they can be optimized, offline or online, by simple yet remarkably efficient algorithms (one is based on a novel joint diagonalization procedure, the other on a Newtonlike technique). The paper also includes (limited) numerical experiments and a discussion contrasting nonGaussian and nonstationary models. 1. INTRODUCTION The aim of this paper is to develop a blind source separation procedure adapted to source signals with time varying intensity (such as speech signals). For simplicity, we shall restrict ourselves to the simplest mixture model: X(t) = AS(t) (1) where X(t) = [X 1 (t) XK (t)] T is the vector of observations (at time t), A is a fixed unknown K K inver...
Nonlinear source separation: the postnonlinear mixtures
 In: Proceedings of the ESANN’97
, 1997
"... Abstract—In this paper, we address the problem of separation of mutually independent sources in nonlinear mixtures. First, we propose theoretical results and prove that in the general case, it is not possible to separate the sources without nonlinear distortion. Therefore, we focus our work on speci ..."
Abstract

Cited by 148 (26 self)
 Add to MetaCart
(Show Context)
Abstract—In this paper, we address the problem of separation of mutually independent sources in nonlinear mixtures. First, we propose theoretical results and prove that in the general case, it is not possible to separate the sources without nonlinear distortion. Therefore, we focus our work on specific nonlinear mixtures known as postnonlinear mixtures. These mixtures constituted by a linear instantaneous mixture (linear memoryless channel) followed by an unknown and invertible memoryless nonlinear distortion, are realistic models in many situations and emphasize interesting properties i.e., in such nonlinear mixtures, sources can be estimated with the same indeterminacies as in instantaneous linear mixtures. The separation structure of nonlinear mixtures is a twostage system, namely, a nonlinear stage followed by a linear stage, the parameters of which are updated to minimize an output independence criterion expressed as a mutual information criterion. The minimization of this criterion requires knowledge or estimation of source densities or of their logderivatives. A first algorithm based on a Gram–Charlier expansion of densities is proposed. Unfortunately, it fails for hard nonlinear mixtures. A second algorithm based on an adaptive estimation of the logderivative of densities leads to very good performance, even with hard nonlinearities. Experiments are proposed to illustrate these results. Index Terms—Entropy, neural networks, nonlinear mixtures, source separation, unsupervised adaptive algorithms. I.