Results 1  10
of
57,038
Algorithms for Nonnegative Matrix Factorization
 In NIPS
, 2001
"... Nonnegative matrix factorization (NMF) has previously been shown to be a useful decomposition for multivariate data. Two different multiplicative algorithms for NMF are analyzed. They differ only slightly in the multiplicative factor used in the update rules. One algorithm can be shown to minim ..."
Abstract

Cited by 1230 (5 self)
 Add to MetaCart
. The algorithms can also be interpreted as diagonally rescaled gradient descent, where the rescaling factor is optimally chosen to ensure convergence.
Learning LongTerm Dependencies with Gradient Descent is Difficult
 TO APPEAR IN THE SPECIAL ISSUE ON RECURRENT NETWORKS OF THE IEEE TRANSACTIONS ON NEURAL NETWORKS
"... Recurrent neural networks can be used to map input sequences to output sequences, such as for recognition, production or prediction problems. However, practical difficulties have been reported in training recurrent neural networks to perform tasks in which the temporal contingencies present in th ..."
Abstract

Cited by 374 (35 self)
 Add to MetaCart
in the input/output sequences span long intervals. We showwhy gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases. These results expose a tradeoff between efficient learning by gradient descent and latching on information
Greedy Function Approximation: A Gradient Boosting Machine
 Annals of Statistics
, 2000
"... Function approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest{descent minimization. A general gradient{descent \boosting" paradigm is developed for additi ..."
Abstract

Cited by 951 (12 self)
 Add to MetaCart
Function approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest{descent minimization. A general gradient{descent \boosting" paradigm is developed
Optimization Flow Control, I: Basic Algorithm and Convergence
 IEEE/ACM TRANSACTIONS ON NETWORKING
, 1999
"... We propose an optimization approach to flow control where the objective is to maximize the aggregate source utility over their transmission rates. We view network links and sources as processors of a distributed computation system to solve the dual problem using gradient projection algorithm. In thi ..."
Abstract

Cited by 690 (64 self)
 Add to MetaCart
We propose an optimization approach to flow control where the objective is to maximize the aggregate source utility over their transmission rates. We view network links and sources as processors of a distributed computation system to solve the dual problem using gradient projection algorithm
Global Optimization with Polynomials and the Problem of Moments
 SIAM Journal on Optimization
, 2001
"... We consider the problem of finding the unconstrained global minimum of a realvalued polynomial p(x) : R R, as well as the global minimum of p(x), in a compact set K defined by polynomial inequalities. It is shown that this problem reduces to solving an (often finite) sequence of convex linear mat ..."
Abstract

Cited by 569 (47 self)
 Add to MetaCart
We consider the problem of finding the unconstrained global minimum of a realvalued polynomial p(x) : R R, as well as the global minimum of p(x), in a compact set K defined by polynomial inequalities. It is shown that this problem reduces to solving an (often finite) sequence of convex linear
Mean shift, mode seeking, and clustering
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1995
"... AbstractMean shift, a simple iterative procedure that shifts each data point to the average of data points in its neighborhood, is generalized and analyzed in this paper. This generalization makes some kmeans like clustering algorithms its special cases. It is shown that mean shift is a modeseeki ..."
Abstract

Cited by 620 (0 self)
 Add to MetaCart
in clustering and Hough transform are demonstrated. Mean shift is also considered as an evolutionary strategy that performs multistart global optimization. Index TermsMean shift, gradient descent, global optimization, Hough transform, cluster analysis, kmeans clustering. I.
Exponentiated Gradient Versus Gradient Descent for Linear Predictors
 Information and Computation
, 1995
"... this paper, we concentrate on linear predictors . To any vector u 2 R ..."
Abstract

Cited by 325 (14 self)
 Add to MetaCart
this paper, we concentrate on linear predictors . To any vector u 2 R
Gradient flows in metric spaces and in the space of probability measures
 LECTURES IN MATHEMATICS ETH ZÜRICH, BIRKHÄUSER VERLAG
, 2005
"... ..."
A fast iterative shrinkagethresholding algorithm with application to . . .
, 2009
"... We consider the class of Iterative ShrinkageThresholding Algorithms (ISTA) for solving linear inverse problems arising in signal/image processing. This class of methods is attractive due to its simplicity, however, they are also known to converge quite slowly. In this paper we present a Fast Iterat ..."
Abstract

Cited by 1055 (8 self)
 Add to MetaCart
Iterative ShrinkageThresholding Algorithm (FISTA) which preserves the computational simplicity of ISTA, but with a global rate of convergence which is proven to be significantly better, both theoretically and practically. Initial promising numerical results for waveletbased image deblurring demonstrate
Understanding Normal and Impaired Word Reading: Computational Principles in QuasiRegular Domains
 PSYCHOLOGICAL REVIEW
, 1996
"... We develop a connectionist approach to processing in quasiregular domains, as exemplified by English word reading. A consideration of the shortcomings of a previous implementation (Seidenberg & McClelland, 1989, Psych. Rev.) in reading nonwords leads to the development of orthographic and phono ..."
Abstract

Cited by 583 (94 self)
 Add to MetaCart
We develop a connectionist approach to processing in quasiregular domains, as exemplified by English word reading. A consideration of the shortcomings of a previous implementation (Seidenberg & McClelland, 1989, Psych. Rev.) in reading nonwords leads to the development of orthographic and phonological representations that capture better the relevant structure among the written and spoken forms of words. In a number of simulation experiments, networks using the new representations learn to read both regular and exception words, including lowfrequency exception words, and yet are still able to read pronounceable nonwords as well as skilled readers. A mathematical analysis of the effects of word frequency and spellingsound consistency in a related but simpler system serves to clarify the close relationship of these factors in influencing naming latencies. These insights are verified in subsequent simulations, including an attractor network that reproduces the naming latency data directly in its time to settle on a response. Further analyses of the network's ability to reproduce data on impaired reading in surface dyslexia support a view of the reading system that incorporates a graded divisionoflabor between semantic and phonological processes. Such a view is consistent with the more general Seidenberg and McClelland framework and has some similarities withbut also important differences fromthe standard dualroute account.
Results 1  10
of
57,038