Results 1  10
of
5,363
Logistic Regression and Stochastic Gradient Training
, 2010
"... An important extension of the idea of likelihood is conditional likelihood. The conditional likelihood of θ given data x and y is L(θ; yx) = f(yx; θ). Intuitively, y follows a probability distribution that is different for different x, but x itself is never unknown, so there is no need to have a ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
probabilistic model of it. Technically, for each x there is a different distribution f(yx; θ) of y, but all these distributions share the same parameters θ. Given training data consisting of 〈xi, yi 〉 pairs, the principle of maximum conditional likelihood says to choose a parameter estimate ˆ θ that maximizes
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training
, 2013
"... Consider a family of probability distributions defined by a set of parameters θ. The distributions may be either probability mass functions (pmfs) or probability density functions (pdfs). Suppose that we have a random sample drawn from a fixed but unknown member of this family. The random sample is ..."
Abstract
 Add to MetaCart
is a training set of n examples x1 to xn. An example may also be called an observation, an outcome, an instance, or a data point. In general each xj is a vector of values, and θ is a vector of realvalued parameters. For example, for a Gaussian distribution θ = 〈µ, σ2 〉. We assume that the examples
Pegasos: Primal Estimated subgradient solver for SVM
"... We describe and analyze a simple and effective stochastic subgradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy ɛ is Õ(1/ɛ), where each iteration operates on a singl ..."
Abstract

Cited by 542 (20 self)
 Add to MetaCart
single training example. In contrast, previous analyses of stochastic gradient descent methods for SVMs require Ω(1/ɛ2) iterations. As in previously devised SVM solvers, the number of iterations also scales linearly with 1/λ, where λ is the regularization parameter of SVM. For a linear kernel, the total
Gradientbased learning applied to document recognition
 Proceedings of the IEEE
, 1998
"... Multilayer neural networks trained with the backpropagation algorithm constitute the best example of a successful gradientbased learning technique. Given an appropriate network architecture, gradientbased learning algorithms can be used to synthesize a complex decision surface that can classify hi ..."
Abstract

Cited by 1533 (84 self)
 Add to MetaCart
Multilayer neural networks trained with the backpropagation algorithm constitute the best example of a successful gradientbased learning technique. Given an appropriate network architecture, gradientbased learning algorithms can be used to synthesize a complex decision surface that can classify
Stochastic Gradient Boosting
 Computational Statistics and Data Analysis
, 1999
"... Gradient boosting constructs additive regression models by sequentially fitting a simple parameterized function (base learner) to current "pseudo"residuals by leastsquares at each iteration. The pseudoresiduals are the gradient of the loss functional being minimized, with respect to ..."
Abstract

Cited by 285 (1 self)
 Add to MetaCart
to the model values at each training data point, evaluated at the current step. It is shown that both the approximation accuracy and execution speed of gradient boosting can be substantially improved by incorporating randomization into the procedure. Specifically, at each iteration a subsample of the training
Online Learning with Kernels
, 2003
"... Kernel based algorithms such as support vector machines have achieved considerable success in various problems in the batch setting where all of the training data is available in advance. Support vector machines combine the socalled kernel trick with the large margin idea. There has been little u ..."
Abstract

Cited by 2831 (123 self)
 Add to MetaCart
use of these methods in an online setting suitable for realtime applications. In this paper we consider online learning in a Reproducing Kernel Hilbert Space. By considering classical stochastic gradient descent within a feature space, and the use of some straightforward tricks, we develop simple
Gaussian processes for machine learning
, 2003
"... We give a basic introduction to Gaussian Process regression models. We focus on understanding the role of the stochastic process and how it is used to define a distribution over functions. We present the simple equations for incorporating training data and examine how to learn the hyperparameters us ..."
Abstract

Cited by 720 (2 self)
 Add to MetaCart
We give a basic introduction to Gaussian Process regression models. We focus on understanding the role of the stochastic process and how it is used to define a distribution over functions. We present the simple equations for incorporating training data and examine how to learn the hyperparameters
A Learning Algorithm for Continually Running Fully Recurrent Neural Networks
, 1989
"... The exact form of a gradientfollowing learning algorithm for completely recurrent networks running in continually sampled time is derived and used as the basis for practical algorithms for temporal supervised learning tasks. These algorithms have: (1) the advantage that they do not require a precis ..."
Abstract

Cited by 534 (4 self)
 Add to MetaCart
The exact form of a gradientfollowing learning algorithm for completely recurrent networks running in continually sampled time is derived and used as the basis for practical algorithms for temporal supervised learning tasks. These algorithms have: (1) the advantage that they do not require a
Simple statistical gradientfollowing algorithms for connectionist reinforcement learning
 Machine Learning
, 1992
"... Abstract. This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinfor ..."
Abstract

Cited by 449 (0 self)
 Add to MetaCart
Abstract. This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected
Policy gradient methods for reinforcement learning with function approximation.
 In NIPS,
, 1999
"... Abstract Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and determining a policy from it has so far proven theoretically intractable. In this paper we explore an alternative approach in which the policy is explicitly repres ..."
Abstract

Cited by 439 (20 self)
 Add to MetaCart
represented by its own function approximator, independent of the value function, and is updated according to the gradient of expected reward with respect to the policy parameters. Williams's REINFORCE method and actorcritic methods are examples of this approach. Our main new result is to show
Results 1  10
of
5,363