Results 1  10
of
59
SNOPT: An SQP Algorithm For LargeScale Constrained Optimization
, 2002
"... Sequential quadratic programming (SQP) methods have proved highly effective for solving constrained optimization problems with smooth nonlinear functions in the objective and constraints. Here we consider problems with general inequality constraints (linear and nonlinear). We assume that first deriv ..."
Abstract

Cited by 597 (24 self)
 Add to MetaCart
(Show Context)
Sequential quadratic programming (SQP) methods have proved highly effective for solving constrained optimization problems with smooth nonlinear functions in the objective and constraints. Here we consider problems with general inequality constraints (linear and nonlinear). We assume that first derivatives are available, and that the constraint gradients are sparse. We discuss
A sparse approximate inverse preconditioner for nonsymmetric linear systems
 SIAM J. SCI. COMPUT
, 1998
"... This paper is concerned with a new approach to preconditioning for large, sparse linear systems. A procedure for computing an incomplete factorization of the inverse of a nonsymmetric matrix is developed, and the resulting factorized sparse approximate inverse is used as an explicit preconditioner f ..."
Abstract

Cited by 197 (22 self)
 Add to MetaCart
This paper is concerned with a new approach to preconditioning for large, sparse linear systems. A procedure for computing an incomplete factorization of the inverse of a nonsymmetric matrix is developed, and the resulting factorized sparse approximate inverse is used as an explicit preconditioner for conjugate gradient–type methods. Some theoretical properties of the preconditioner are discussed, and numerical experiments on test matrices from the Harwell–Boeing collection and from Tim Davis’s collection are presented. Our results indicate that the new preconditioner is cheaper to construct than other approximate inverse preconditioners. Furthermore, the new technique insures convergence rates of the preconditioned iteration which are comparable with those obtained with standard implicit preconditioners.
The design and use of algorithms for permuting large entries to the diagonal of sparse matrices
 SIAM J. MATRIX ANAL. APPL
, 1999
"... ..."
(Show Context)
Parallelizing While Loops for Multiprocessor Systems
 IN PROCEEDINGS OF THE 9TH INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM
, 1995
"... Current parallelizing compilers treat while loops and do loops with conditional exits as sequential constructs because their iteration space is unknown. Motivated by the fact that these types of loops arise frequently in practice, we have developed techniques that can be used to automatically transf ..."
Abstract

Cited by 40 (17 self)
 Add to MetaCart
(Show Context)
Current parallelizing compilers treat while loops and do loops with conditional exits as sequential constructs because their iteration space is unknown. Motivated by the fact that these types of loops arise frequently in practice, we have developed techniques that can be used to automatically transform them for parallel execution. We succeed in parallelizing loops involving linked lists traversals  something that has not been done before. This is an important problem since linked list traversals arise frequently in loops with irregular access patterns, such as sparse matrix computations. The methods can even be applied to loops whose data dependence relations cannot be analyzed at compiletime. We outline a cost/performance analysis that can be used to decide when the methods should be applied. Since, as we show, the expected speedups are significant, our conclusion is that they should almost always be applied  providing there is sufficient parallelism available in the original loop. We present experimental results on loops from the PERFECT Benchmarks and sparse matrix packages which substantiate our conclusion that these techniques can yield significant speedups.
RunTime Methods for Parallelizing Partially Parallel Loops
 Proceedings of the 9th ACM International Conference on Supercomputing
, 1995
"... In this paper we give a new run–time technique for finding an optimal parallel execution schedule for a partially parallel loop, i.e., a loop whose parallelization requires synchronization to ensure that the iterations are executed in the correct order. Given the original loop, the compiler generate ..."
Abstract

Cited by 40 (8 self)
 Add to MetaCart
(Show Context)
In this paper we give a new run–time technique for finding an optimal parallel execution schedule for a partially parallel loop, i.e., a loop whose parallelization requires synchronization to ensure that the iterations are executed in the correct order. Given the original loop, the compiler generates inspector code that performs run–time preprocessing of the loop’s access pattern, and scheduler code that schedules (and executes) the loop iterations. The inspector is fully parallel, uses no synchronization, and can be applied to any loop. In addition, it can implement at run–time the two most effective transformations for increasing the amount of parallelism in a loop: array privatization and reduction parallelization (element–wise). We also describe a new scheme for constructing an optimal parallel execution schedule for the iterations of the loop. 1
The design of MA48, a code for the direct solution of sparse unsymmetric linear systems of equations
, 1995
"... We describe the design of a new code for the direct solution of sparse unsymmetric linear systems of equations. The new code utilizes a novel restructuring of the symbolic and numerical phases, which increases speed and saves storage without sacrifice of numerical stability. Other features inclu ..."
Abstract

Cited by 36 (8 self)
 Add to MetaCart
We describe the design of a new code for the direct solution of sparse unsymmetric linear systems of equations. The new code utilizes a novel restructuring of the symbolic and numerical phases, which increases speed and saves storage without sacrifice of numerical stability. Other features include switching to full matrix processing in all phases of the computation enabling the use of all three levels of BLAS, treatment of rectangular or rankdeficient matrices, partial factorization, and integrated facilities for iterative refinement and error estimation.
MA48  a Fortran code for direct solution of sparse unsymmetric linear systems of equations
, 1993
"... We describe the design of a new code that supersedes the Harwell Subroutine Library (HSL) code MA28 for the direct solution of sparse unsymmetric linear systems of equations. The principal differences lie in a new factorization entry that includes row permutations for stability without an overhe ..."
Abstract

Cited by 34 (6 self)
 Add to MetaCart
We describe the design of a new code that supersedes the Harwell Subroutine Library (HSL) code MA28 for the direct solution of sparse unsymmetric linear systems of equations. The principal differences lie in a new factorization entry that includes row permutations for stability without an overhead of greater complexity than that of the factorization itself, switching to full processing including the use of all three levels of BLAS, better treatment of rectangular or rankdeficient matrices, partial refactorization, and integrated facilities for iterative refinement and error estimation.
A Scalable Method for RunTime Loop Parallelization
 IJPP
, 1995
"... Current parallelizing compilers do a reasonable job of extracting parallelism from programs with regular, well behaved, statically analyzable access patterns. However, they cannot extract a significant fraction of the available parallelism if the program has a complex and/or statically insufficientl ..."
Abstract

Cited by 34 (18 self)
 Add to MetaCart
(Show Context)
Current parallelizing compilers do a reasonable job of extracting parallelism from programs with regular, well behaved, statically analyzable access patterns. However, they cannot extract a significant fraction of the available parallelism if the program has a complex and/or statically insufficiently defined access pattern, e.g., simulation programs with irregular domains and/or dynamically changing interactions. Since such programs represent a large fraction of all applications, techniques are needed for extracting their inherent parallelism at runtime. In this paper we give a new runtime technique for finding an optimal parallel execution schedule for a partially parallel loop, i.e., a loop whose parallelization requires synchronization to ensure that the iterations are executed in the correct order. Given the original loop, the compiler generates inspector code that performs runtime preprocessing of the loop's access pattern, and scheduler code that schedules (and executes) the loop iterations. The inspector is fully parallel, uses no synchronization, and can be applied to any loop (from which an inspector can be extracted). In addition, it can implement at runtime the two most effective transformations for increasing the amount of parallelism in a loop: array privatization and reduction parallelization (elementwise). The ability to identify privatizable and reduction variables is very powerful since it eliminates the data dependences involving these variables and thereby potentially increases the overall parallelism of the loop. We also describe a new scheme for constructing an optimal parallel execution schedule for the iterations of the loop. The schedule produced is a partition of the set of iterations into subsets called wavefronts so that there are n...
A Family of Newton Codes for Systems of Highly Nonlinear Equations
, 1991
"... This reports presents new codes for the numerical solutiuon of highly nonlinear systems. They realize the most recent variants of affine invariant Newton Techniques due to Deuflhard. The standard method is implemented in the code NLEQ1, whereas the code NLEQ2 contains a rank reduction device additio ..."
Abstract

Cited by 33 (2 self)
 Add to MetaCart
This reports presents new codes for the numerical solutiuon of highly nonlinear systems. They realize the most recent variants of affine invariant Newton Techniques due to Deuflhard. The standard method is implemented in the code NLEQ1, whereas the code NLEQ2 contains a rank reduction device additionally. The code NLEQ1S is the sparse version of NLEQ1, i.e. the arising linear systems are solved with sparse matrix techniques. Within the new implementations a common design of the software in view of user interface and internal modularization is realized. Numerical experiments for some rather challenging examples illustrate robustness and efficiency of algorithm and software.