## More AD of Nonlinear AMPL Models: Computing Hessian Information and Exploiting Partial Separability (1996)

Venue: | in Computational Differentiation: Applications, Techniques, and |

Citations: | 16 - 10 self |

### BibTeX

@INPROCEEDINGS{Gay96moread,

author = {David M. Gay},

title = {More AD of Nonlinear AMPL Models: Computing Hessian Information and Exploiting Partial Separability},

booktitle = {in Computational Differentiation: Applications, Techniques, and},

year = {1996},

pages = {173--184}

}

### Years of Citing Articles

### OpenURL

### Abstract

We describe computational experience with automatic differentiation of mathematical programming problems expressed in the modeling language AMPL. Nonlinear expressions are translated to loop-free code, which makes it easy to compute gradients and Jacobians by backward automatic differentiation. The nonlinear expressions may be interpreted or, to gain some evaluation speed at the cost of increased preparation time, converted to Fortran or C. We have extended the interpretive scheme to evaluate Hessian (of Lagrangian) times vector. Detecting partially separable structure (sums of terms, each depending, perhaps after a linear transformation, on only a few variables) is of independent interest, as some solvers exploit this structure. It can be detected automatically by suitable "tree walks". Exploiting this structure permits an AD computation of the entire Hessian matrix by accumulating Hessian times vector computations for each term, and can lead to a much faster computation...

### Citations

583 |
CHARMM: a program for macromolecular energy, minimization, and dynamics
- Brooks, Bruccoleri, et al.
- 1983
(Show Context)
Citation Context ...roblem. The objective for this problem started out as a Fortran subroutine (the handcoded Fortran in timings below, which evaluates an all-atom representation [22] of the empirical force field CHARMM =-=[5]) that Ter-=-esa Head-Gordon and Frank Stillinger provided in connection with [21]; the AMPL model for the problem has various features, such as "if" expressions and heavy use of defined variables (in ef... |

363 |
AMPL: A Modeling Language for
- Fourer, Gay, et al.
- 2002
(Show Context)
Citation Context ...resentation of f and c. This paper discusses computing (3) by automatic differentiation (AD) and gives some computational experience with problems expressed symbolically in the AMPL modeling language =-=[11, 12]-=-. For simplicity, the remainder of this paper ignores constraints and just talks about computings2 f. However, the implementation discussed below is designed to compute (3), which is also relevant whe... |

149 | A package for the automatic differentiation of algorithms written in C/C
- Griewank, Juedes, et al.
- 1996
(Show Context)
Citation Context ...ations that are only about a factor of 3 slower than handcoded Fortran evaluations. March 13, 1996s- The last four lines in Table 1 are for gradient evaluations by the very general C++ package ADOL-C =-=[18]-=-, working with C from variants of the protein-folding objective produced by nlc and and by the Fortran-to-C converter f2c[10], applied to the original hand-coded Fortran. In its derivative computation... |

96 |
LANCELOT: A Fortran Package for LargeScale Nonlinear Optimization (Release A
- Toint
- 1991
(Show Context)
Citation Context ...tic detection of partially separable structure is appealing. It should make life easier for users than, say, having to state this structure explicitly in the input format SIF associated with LANCELOT =-=[7]-=-, a solver that exploits partially separable structure. The computations described below involve automatic detection of partially separable structure, starting with the expression graphs written by th... |

73 |
A modelling language for mathematical programming
- Fourer, Gay
- 1993
(Show Context)
Citation Context ...resentation of f and c. This paper discusses computing (3) by automatic differentiation (AD) and gives some computational experience with problems expressed symbolically in the AMPL modeling language =-=[11, 12]-=-. For simplicity, the remainder of this paper ignores constraints and just talks about computings2 f. However, the implementation discussed below is designed to compute (3), which is also relevant whe... |

62 |
Newton-type minimization via the lanczos method
- Nash
- 1984
(Show Context)
Citation Context ...hus reducing VE08 to a simple secant-update (quasi-Newton) method, whereas v8 exploits the structure. The final three solvers, tn , htn-fd and htn-hv , are variants of Nash's truncated-Newton code TN =-=[23, 24]-=-, which approximates Hessian-times-vector products by finite differences of gradients as it runs a preconditioned linear conjugategradient algorithm to compute an approximate Newton step. Variant tn r... |

55 | The ADIFOR 2.0 system for the automatic differentiation of Fortran 77 programs
- Bischof, Carle, et al.
- 1996
(Show Context)
Citation Context ...s. The interpreted evaluations in Table 1 are those of the AMPL/solver interface [17]; the compiled evaluations are for C and Fortran produced by the nlc program [17]. The Fortran preprocessor ADIFOR =-=[3, 2] offers both "d-=-ense" and "sparse" modes, each of which is sometimes preferable. Table 1 shows ADIFOR evaluations for the Fortran objective produced by nlc and for the original hand-coded objective. Th... |

38 |
On the unconstrained optimization of partially separable functions
- GRIEWANK, TOINT
- 1982
(Show Context)
Citation Context ... problems, m i = 3, 6, or 9. The gradient and Hessian of (4) exhibit useful structure: (5)sf (x) = i = 1 S q U i Tsf i (U i x) and (6)s2 f (x) = i = 1 S q U i T 2 f i (U i x) U i . Griewank and Toint =-=[19, 20]-=- originally pointed out the structure in (4 - 6) and proposed using secant updates to approximate eachs2 f i separately. Since the Hessianss2 f i in (6) are m i m i matrices, secant updates often give... |

27 |
ALGORITHM 611. Subroutines for unconstrained minimization using a model�trust region approach
- Gay
(Show Context)
Citation Context ...lems. solution times for some solvers to solve the 22-atom protein-folding problem discussed above, and for three instances of the minimal-surface problem. Solvers mnh and mnhp are variants of HUMSOL =-=[14]-=-, which factors a dense Hessian matrix to carry out Newton's method and is thus impractical when n is large. The HUMSOL variants differ in that mnhp exploits partially separable structure in computing... |

21 | ADIFOR 2.0 user's guide
- Bischof, Carle, et al.
- 1994
(Show Context)
Citation Context ... be an overestimate. For example, consider the AMPL model var x{1..6}; var y{i in 0..2} = sum{j in 1..2} x[2*i+j]; minimize silly: sum{i in 1..6} (x[i] - i)2 + if (x[1] > 1) then y[0]*y[1] else y[1]*y=-=[2]; For this-=- problem, q = 7, U i = e iT for 1sis6, and U 7 = 0 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 0 , so (9) would give a completely dense "sparsity" pattern, whereas the true pattern is March 13, 1996s- *... |

18 |
Automatic Hessians by reverse accumulation
- Christianson
- 1990
(Show Context)
Citation Context ...signed to compute (3), which is also relevant when some of the constraints are changed to inequality constraints. Use of AD in Hessian computations is not new; for example, Dixon [8] and Christianson =-=[6]-=- discuss using AD in this context. What is new in my work is automatic detection of partially separable structure and use of this structure to give faster Hessian computations. The rest of the paper i... |

16 |
User’s guide for TN/TNBC: Fortran routines for nonlinear optimization
- Nash
- 1984
(Show Context)
Citation Context ...hus reducing VE08 to a simple secant-update (quasi-Newton) method, whereas v8 exploits the structure. The final three solvers, tn , htn-fd and htn-hv , are variants of Nash's truncated-Newton code TN =-=[23, 24]-=-, which approximates Hessian-times-vector products by finite differences of gradients as it runs a preconditioned linear conjugategradient algorithm to compute an approximate Newton step. Variant tn r... |

14 |
Use of automatic differentiation for calculating Hessians and Newton steps
- Dixon
- 1991
(Show Context)
Citation Context ...discussed below is designed to compute (3), which is also relevant when some of the constraints are changed to inequality constraints. Use of AD in Hessian computations is not new; for example, Dixon =-=[8]-=- and Christianson [6] discuss using AD in this context. What is new in my work is automatic detection of partially separable structure and use of this structure to give faster Hessian computations. Th... |

12 |
Partitioned Variable Metric Updates for Large Structured Optimization Problems
- Toint
- 1982
(Show Context)
Citation Context ... problems, m i = 3, 6, or 9. The gradient and Hessian of (4) exhibit useful structure: (5)sf (x) = i = 1 S q U i Tsf i (U i x) and (6)s2 f (x) = i = 1 S q U i T 2 f i (U i x) U i . Griewank and Toint =-=[19, 20]-=- originally pointed out the structure in (4 - 6) and proposed using secant updates to approximate eachs2 f i separately. Since the Hessianss2 f i in (6) are m i m i matrices, secant updates often give... |

11 |
Distribution of Mathematical Software by Electronic Mail
- Dongarra, Grosse
- 1987
(Show Context)
Citation Context ...ointing conclusion that with my current implementation, it is better to use finite differences in this context. Source for the basic solvers underlying the variants in Table 2 is available from netlib=-=[9]: mnh and mnhp -=-use "dmnhb from port", the current PORT Library [13] variant of HUMSOL, which enforces simple-bound constraints; both VE08 and TN are available from netlib's opt directory as "ve08 from... |

10 | Automatic Differentiation of Nonlinear AMPL Models
- Gay
- 1991
(Show Context)
Citation Context ...uations of objectives and constraint bodies, as well as backward AD computations of their first derivatives. A brief overview of AMPL and a discussion of these first-derivative computations appear in =-=[15]-=-. In short, an operation is represented by a structure that provides storage for the partial derivatives of the operation and has pointers to its operands and to a function that carries out the operat... |

8 |
A Fortran-to-C Converter," Computing Science
- FELDMAN, GAY, et al.
- 1990
(Show Context)
Citation Context ...able 1 are for gradient evaluations by the very general C++ package ADOL-C [18], working with C from variants of the protein-folding objective produced by nlc and and by the Fortran-to-C converter f2c=-=[10], applied -=-to the original hand-coded Fortran. In its derivative computations, ADOL-C uses some internal "tape" arrays recorded during during its evaluation of the function f (x). ADOL-C can also repla... |

4 |
Poly(L-alanine) as a universal reference material for understanding protein energies and structures
- Head-Gordon, Stillinger, et al.
- 1992
(Show Context)
Citation Context ...e handcoded Fortran in timings below, which evaluates an all-atom representation [22] of the empirical force field CHARMM [5]) that Teresa Head-Gordon and Frank Stillinger provided in connection with =-=[21]; the AMPL-=- model for the problem has various features, such as "if" expressions and heavy use of defined variables (in effect, named common subexpressions), that make it a good stress-test. Though par... |

3 |
Molecular Modeling of Proteins: A Feasibility Study of
- Neumaier
- 1996
(Show Context)
Citation Context ...es for a 22-atom instance of the protein-folding problem. conformation of the atoms). For much more on the protein-folding problem and many pointers to the literature, see Neumaier's excellent survey =-=[25]-=-. Table 1 compares several ways of computing the protein-folding objective f and its gradientsf. The times in Table 1 are relative to that for the original hand-coded Fortran, which gives the fastest ... |

2 |
Massive Memory Buys Little Speed for Complete, In-Core Sparse Cholesky Factorizations
- GAY
- 1988
(Show Context)
Citation Context ...large number n of variables often have sparse Hessians, and some solvers that require explicit Hessians will need them in sparse form. By using a column-dispatch scheme much like the one described in =-=[16]-=-, we can compute a Hessian column by column. This involves structures representing the current state of contributions from (7) and from (8); after processing a contribution to the current column, we d... |

2 |
Hooking Your Solver to AMPL," Numerical Analysis Manuscript 93--10
- Gay
- 1993
(Show Context)
Citation Context ...atives. Because of the compilation and linking times they entail, compiled evaluations only save time for expressions that are to be evaluated a great many times. A brief discussion of nlc appears in =-=[17]-=-. The interpreted evaluations are fast enough to be useful in many situations. Some comparisons with hand-coded Fortran evaluations appear in [15], and more appear below in 6. 3. Hessian Times Vector ... |

2 |
User's Guide to the Routine VE08 for Solving Partially Separable Bounded Optimization Problems
- Toint
- 1983
(Show Context)
Citation Context ... in that mnhp exploits partially separable structure in computing the Hessian matrix, whereas mnh simply computes n Hessian-vector productss2 f (x) e i . Solvers ve08 and v8 are based on Toint's VE08 =-=[26]-=-, which is designed to exploit partially separable structure; it uses a secant update to approximate each element Hessian and applies a truncated preconditioned conjugate gradient algorithm to compute... |

1 |
e, and Guo-Liang Xue, "The Minpack-2 Test Problem Collection
- Averick, Carter, et al.
- 1992
(Show Context)
Citation Context ...ere e is a vector of ones, this may be an overestimate. For example, consider the AMPL model var x{1..6}; var y{i in 0..2} = sum{j in 1..2} x[2*i+j]; minimize silly: sum{i in 1..6} (x[i] - i)2 + if (x=-=[1] > 1) then-=- y[0]*y[1] else y[1]*y[2]; For this problem, q = 7, U i = e iT for 1sis6, and U 7 = 0 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 0 , so (9) would give a completely dense "sparsity" pattern, whereas the... |

1 |
Mor e, "Computing Gradients in Large-Scale Optimization Using Automatic Differentiation," Report ANL/MCS-P488-0195
- Bischof, Bouaricha, et al.
- 1995
(Show Context)
Citation Context ... currently one can only provide S as a run-time argument. With more programming work, it should be possible to make the ADIFOR evaluations faster. Specifically, Bischof, Bouaricha, Khademi, and Morse =-=[4]-=- have described a way to use partially separable structure to speed up ADIFOR evaluations. On the minimal-surface problem considered below, for example, this idea is easy to apply, and it leads to ADI... |

1 |
Lothar Sch .. afer, "On the Use of Conformationally Dependent Geometry Trends from Ab Initio Dipeptide Studies to Refine Potentials for the Empirical Force Field CHARMM
- Momany, Klimkowski
- 1990
(Show Context)
Citation Context ...tance (22 atoms) of the protein-folding problem. The objective for this problem started out as a Fortran subroutine (the handcoded Fortran in timings below, which evaluates an all-atom representation =-=[22] of the em-=-pirical force field CHARMM [5]) that Teresa Head-Gordon and Frank Stillinger provided in connection with [21]; the AMPL model for the problem has various features, such as "if" expressions a... |