Results 1  10
of
193
On the limited memory BFGS method for large scale optimization
 Mathematical Programming
, 1989
"... this paper has appeared in ..."
First and SecondOrder Methods for Learning: between Steepest Descent and Newton's Method
 Neural Computation
, 1992
"... Online first order backpropagation is sufficiently fast and effective for many largescale classification problems but for very high precision mappings, batch processing may be the method of choice. This paper reviews first and secondorder optimization methods for learning in feedforward neura ..."
Abstract

Cited by 126 (6 self)
 Add to MetaCart
Online first order backpropagation is sufficiently fast and effective for many largescale classification problems but for very high precision mappings, batch processing may be the method of choice. This paper reviews first and secondorder optimization methods for learning in feedforward neural networks. The viewpoint is that of optimization: many methods can be cast in the language of optimization techniques, allowing the transfer to neural nets of detailed results about computational complexity and safety procedures to ensure convergence and to avoid numerical problems. The review is not intended to deliver detailed prescriptions for the most appropriate methods in specific applications, but to illustrate the main characteristics of the different methods and their mutual relations.
Efficient BackProp
, 1998
"... . The convergence of backpropagation learning is analyzed so as to explain common phenomenon observed by practitioners. Many undesirable behaviors of backprop can be avoided with tricks that are rarely exposed in serious technical publications. This paper gives some of those tricks, and offers expl ..."
Abstract

Cited by 125 (24 self)
 Add to MetaCart
. The convergence of backpropagation learning is analyzed so as to explain common phenomenon observed by practitioners. Many undesirable behaviors of backprop can be avoided with tricks that are rarely exposed in serious technical publications. This paper gives some of those tricks, and offers explanations of why they work. Many authors have suggested that secondorder optimization methods are advantageous for neural net training. It is shown that most "classical" secondorder methods are impractical for large neural networks. A few methods are proposed that do not have these limitations. 1 Introduction Backpropagation is a very popular neural network learning algorithm because it is conceptually simple, computationally efficient, and because it often works. However, getting it to work well, and sometimes to work at all, can seem more of an art than a science. Designing and training a network using backprop requires making many seemingly arbitrary choices such as the number ...
Representations Of QuasiNewton Matrices And Their Use In Limited Memory Methods
, 1994
"... We derive compact representations of BFGS and symmetric rankone matrices for optimization. These representations allow us to efficiently implement limited memory methods for large constrained optimization problems. In particular, we discuss how to compute projections of limited memory matrices onto ..."
Abstract

Cited by 103 (8 self)
 Add to MetaCart
We derive compact representations of BFGS and symmetric rankone matrices for optimization. These representations allow us to efficiently implement limited memory methods for large constrained optimization problems. In particular, we discuss how to compute projections of limited memory matrices onto subspaces. We also present a compact representation of the matrices generated by Broyden's update for solving systems of nonlinear equations. Key words: QuasiNewton method, constrained optimization, limited memory method, largescale optimization. Abbreviated title: Representation of quasiNewton matrices. 1. Introduction. Limited memory quasiNewton methods are known to be effective techniques for solving certain classes of largescale unconstrained optimization problems (Buckley and Le Nir (1983), Liu and Nocedal (1989), Gilbert and Lemar'echal (1989)) . They make simple approximations of Hessian matrices, which are often good enough to provide a fast rate of linear convergence, and re...
Theory of Algorithms for Unconstrained Optimization
, 1992
"... this article I will attempt to review the most recent advances in the theory of unconstrained optimization, and will also describe some important open questions. Before doing so, I should point out that the value of the theory of optimization is not limited to its capacity for explaining the behavio ..."
Abstract

Cited by 84 (1 self)
 Add to MetaCart
this article I will attempt to review the most recent advances in the theory of unconstrained optimization, and will also describe some important open questions. Before doing so, I should point out that the value of the theory of optimization is not limited to its capacity for explaining the behavior of the most widely used techniques. The question
Learning DependencyBased Compositional Semantics
"... Compositional question answering begins by mapping questions to logical forms, but training a semantic parser to perform this mapping typically requires the costly annotation of the target logical forms. In this paper, we learn to map questions to answers via latent logical forms, which are induced ..."
Abstract

Cited by 49 (1 self)
 Add to MetaCart
Compositional question answering begins by mapping questions to logical forms, but training a semantic parser to perform this mapping typically requires the costly annotation of the target logical forms. In this paper, we learn to map questions to answers via latent logical forms, which are induced automatically from questionanswer pairs. In tackling this challenging learning problem, we introduce a new semantic representation which highlights a parallel between dependency syntax and efficient evaluation of logical forms. On two standard semantic parsing benchmarks (GEO and JOBS), our system obtains the highest published accuracies, despite requiring no annotated logical forms. 1
Learning for Semantic Parsing with Statistical Machine Translation
, 2006
"... We present a novel statistical approach to semantic parsing, WASP, for constructing a complete, formal meaning representation of a sentence. A semantic parser is learned given a set of sentences annotated with their correct meaning representations. The main innovation of WASP is its use of stateof ..."
Abstract

Cited by 46 (7 self)
 Add to MetaCart
We present a novel statistical approach to semantic parsing, WASP, for constructing a complete, formal meaning representation of a sentence. A semantic parser is learned given a set of sentences annotated with their correct meaning representations. The main innovation of WASP is its use of stateoftheart statistical machine translation techniques. A word alignment model is used for lexical acquisition, and the parsing model itself can be seen as a syntaxbased translation model. We show that WASP performs favorably in terms of both accuracy and coverage compared to existing learning methods requiring similar amount of supervision, and shows better robustness to variations in task complexity and word order.
LBFGSB  Fortran Subroutines for LargeScale Bound Constrained Optimization
, 1994
"... LBFGSB is a limited memory algorithm for solving large nonlinear optimization problems subject to simple bounds on the variables. It is intended for problems in which information on the Hessian matrix is di cult to obtain, or for large dense problems. LBFGSB can also be used for unconstrained pr ..."
Abstract

Cited by 38 (2 self)
 Add to MetaCart
LBFGSB is a limited memory algorithm for solving large nonlinear optimization problems subject to simple bounds on the variables. It is intended for problems in which information on the Hessian matrix is di cult to obtain, or for large dense problems. LBFGSB can also be used for unconstrained problems, and in this case performs similarly to its predecessor, algorithm LBFGS (Harwell routine VA15). The algorithm is implemented in Fortran 77.
Feature Forest Models for Probabilistic HPSG Parsing
 In Computational Linguistics
, 2008
"... Probabilistic modeling of lexicalized grammars is difficult because these grammars exploit complicated data structures, such as typed feature structures. This prevents us from applying common methods of probabilistic modeling in which a complete structure is divided into substructures under the assu ..."
Abstract

Cited by 36 (6 self)
 Add to MetaCart
Probabilistic modeling of lexicalized grammars is difficult because these grammars exploit complicated data structures, such as typed feature structures. This prevents us from applying common methods of probabilistic modeling in which a complete structure is divided into substructures under the assumption of statistical independence among substructures. For example, partofspeech tagging of a sentence is decomposed into tagging of each word, and CFGparsing is split into applications of CFGrules. These methods have relied on the structure of the target problem, namely lattices or trees, and cannot be applied to graph structures including typed feature structures. This article proposes the feature forest model as a solution to the problem of probabilistic modeling of complex data structures including typed feature structures. The feature forest model provides a method for probabilistic modeling without the independence assumption when probabilistic events are represented with feature forests. Feature forests are generic data structures that represent ambiguous trees in a packed forest structure. Feature forest models are maximum entropy models defined over feature forests. A dynamic programming algorithm is proposed for maximum entropy estimation without unpacking feature forests. Thus probabilistic modeling of
Automatic preconditioning by limited memory QuasiNewton updating
 SIAM J. Optim
"... The paper proposes a preconditioner for the conjugate gradient method (CG) that is designed for solving systems of equations Ax = bi with di erent right hand side vectors, or for solving a sequence of slowly varying systems Akx = bk. The preconditioner has the form of a limited memory quasiNewton m ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
The paper proposes a preconditioner for the conjugate gradient method (CG) that is designed for solving systems of equations Ax = bi with di erent right hand side vectors, or for solving a sequence of slowly varying systems Akx = bk. The preconditioner has the form of a limited memory quasiNewton matrix and is generated using information from the CG iteration. The automatic preconditioner does not require explicit knowledge of the coe cient matrix A and is therefore suitable for problems where only products of A times avector can be computed. Numerical experiments indicate that the preconditioner has most to o er when these matrixvector products are expensive to compute, and when low accuracy in the solution is required. The e ectiveness of the preconditioner is tested within a Hessianfree Newton method for optimization, and by solving certain linear systems arising in nite element models.