Results 1  10
of
70,091
Learning to rank using gradient descent
 In ICML
, 2005
"... We investigate using gradient descent methods for learning ranking functions; we propose a simple probabilistic cost function, and we introduce RankNet, an implementation of these ideas using a neural network to model the underlying ranking function. We present test results on toy data and on data f ..."
Abstract

Cited by 510 (17 self)
 Add to MetaCart
We investigate using gradient descent methods for learning ranking functions; we propose a simple probabilistic cost function, and we introduce RankNet, an implementation of these ideas using a neural network to model the underlying ranking function. We present test results on toy data and on data
descent
, 2003
"... general inefficiency of batch training for gradient ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
general inefficiency of batch training for gradient
Learning LongTerm Dependencies with Gradient Descent is Difficult
 TO APPEAR IN THE SPECIAL ISSUE ON RECURRENT NETWORKS OF THE IEEE TRANSACTIONS ON NEURAL NETWORKS
"... Recurrent neural networks can be used to map input sequences to output sequences, such as for recognition, production or prediction problems. However, practical difficulties have been reported in training recurrent neural networks to perform tasks in which the temporal contingencies present in th ..."
Abstract

Cited by 374 (35 self)
 Add to MetaCart
in the input/output sequences span long intervals. We showwhy gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases. These results expose a tradeoff between efficient learning by gradient descent and latching on information
Exponentiated Gradient Versus Gradient Descent for Linear Predictors
 Information and Computation
, 1995
"... this paper, we concentrate on linear predictors . To any vector u 2 R ..."
Abstract

Cited by 325 (14 self)
 Add to MetaCart
this paper, we concentrate on linear predictors . To any vector u 2 R
Greedy Function Approximation: A Gradient Boosting Machine
 Annals of Statistics
, 2000
"... Function approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest{descent minimization. A general gradient{descent \boosting" paradigm is developed for additi ..."
Abstract

Cited by 951 (12 self)
 Add to MetaCart
Function approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest{descent minimization. A general gradient{descent \boosting" paradigm is developed
A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm
 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS
, 1993
"... A new learning algorithm for multilayer feedforward networks, RPROP, is proposed. To overcome the inherent disadvantages of pure gradientdescent, RPROP performs a local adaptation of the weightupdates according to the behaviour of the errorfunction. In substantial difference to other adaptive tech ..."
Abstract

Cited by 917 (34 self)
 Add to MetaCart
A new learning algorithm for multilayer feedforward networks, RPROP, is proposed. To overcome the inherent disadvantages of pure gradientdescent, RPROP performs a local adaptation of the weightupdates according to the behaviour of the errorfunction. In substantial difference to other adaptive
Parallelized stochastic gradient descent
 Advances in Neural Information Processing Systems 23
, 2010
"... With the increase in available data parallel machine learning has become an increasingly pressing problem. In this paper we present the first parallel stochastic gradient descent algorithm including a detailed analysis and experimental evidence. Unlike prior work on parallel optimization algorithms ..."
Abstract

Cited by 85 (3 self)
 Add to MetaCart
With the increase in available data parallel machine learning has become an increasingly pressing problem. In this paper we present the first parallel stochastic gradient descent algorithm including a detailed analysis and experimental evidence. Unlike prior work on parallel optimization algorithms
A scaled conjugate gradient algorithm for fast supervised learning
 NEURAL NETWORKS
, 1993
"... A supervised learning algorithm (Scaled Conjugate Gradient, SCG) with superlinear convergence rate is introduced. The algorithm is based upon a class of optimization techniques well known in numerical analysis as the Conjugate Gradient Methods. SCG uses second order information from the neural netwo ..."
Abstract

Cited by 441 (0 self)
 Add to MetaCart
A supervised learning algorithm (Scaled Conjugate Gradient, SCG) with superlinear convergence rate is introduced. The algorithm is based upon a class of optimization techniques well known in numerical analysis as the Conjugate Gradient Methods. SCG uses second order information from the neural
Algorithms for Nonnegative Matrix Factorization
 In NIPS
, 2001
"... Nonnegative matrix factorization (NMF) has previously been shown to be a useful decomposition for multivariate data. Two different multiplicative algorithms for NMF are analyzed. They differ only slightly in the multiplicative factor used in the update rules. One algorithm can be shown to minim ..."
Abstract

Cited by 1230 (5 self)
 Add to MetaCart
. The algorithms can also be interpreted as diagonally rescaled gradient descent, where the rescaling factor is optimally chosen to ensure convergence.
Learning by Online Gradient Descent
 Journal of Physics A
, 1995
"... We study online gradientdescent learning in multilayer networks analytically and numerically. The training is based on randomly drawn inputs and their corresponding outputs as defined by a target rule. In the thermodynamic limit we derive deterministic differential equations for the order paramete ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
We study online gradientdescent learning in multilayer networks analytically and numerically. The training is based on randomly drawn inputs and their corresponding outputs as defined by a target rule. In the thermodynamic limit we derive deterministic differential equations for the order
Results 1  10
of
70,091