Results 1  10
of
10
GEMMBased Level 3 BLAS: HighPerformance Model Implementations and Performance Evaluation Benchmark
 ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE
, 1998
"... The level 3 Basic Linear Algebra Subprograms (BLAS) are designed to perform various matrix multiply and triangular system solving computations. Due to the complex hardware organization of advanced computer architectures the development of optimal level 3 BLAS code is costly and time consuming. Howev ..."
Abstract

Cited by 89 (8 self)
 Add to MetaCart
The level 3 Basic Linear Algebra Subprograms (BLAS) are designed to perform various matrix multiply and triangular system solving computations. Due to the complex hardware organization of advanced computer architectures the development of optimal level 3 BLAS code is costly and time consuming. However, it is possible to develop a portable and highperformance level 3 BLAS library mainly relying on a highly optimized GEMM, the routine for the general matrix multiply and add operation. With suitable partitioning, all the other level 3 BLAS can be defined in terms of GEMM and a small amount of level 1 and level 2 computations. Our contribution is twofold. First, the model implementations in Fortran 77 of the GEMMbased level 3 BLAS are structured to reduced effectively data traffic in a memory hierarchy. Second, the GEMMbased level 3 BLAS performance evaluation benchmark is a tool for evaluating and comparing different implementations of the level 3 BLAS with the GEMMbased model implementations.
A FortrantoC converter
 AT&T Bell Laboratories
, 1992
"... We describe f 2c, a program that translates Fortran 77 into C or C++. F 2c lets one portably mix C and Fortran and makes a large body of welltested Fortran source code available to C environments. 1. ..."
Abstract

Cited by 34 (0 self)
 Add to MetaCart
We describe f 2c, a program that translates Fortran 77 into C or C++. F 2c lets one portably mix C and Fortran and makes a large body of welltested Fortran source code available to C environments. 1.
Correctly Rounded BinaryDecimal and DecimalBinary Conversions
 NUMERICAL ANALYSIS MANUSCRIPT 9010, AT&T BELL LABORATORIES
, 1990
"... This note discusses the main issues in performing correctly rounded decimaltobinary and binarytodecimal conversions. It reviews recent work by Clinger and by Steele and White on these conversions and describes some efficiency enhancements. Computational experience with several kinds of arithmeti ..."
Abstract

Cited by 22 (3 self)
 Add to MetaCart
This note discusses the main issues in performing correctly rounded decimaltobinary and binarytodecimal conversions. It reviews recent work by Clinger and by Steele and White on these conversions and describes some efficiency enhancements. Computational experience with several kinds of arithmetic suggests that the average computational cost for correct rounding can be small for typical conversions. Source for conversion routines that support this claim is available from netlib.
More AD of Nonlinear AMPL Models: Computing Hessian Information and Exploiting Partial Separability
 in Computational Differentiation: Applications, Techniques, and
, 1996
"... We describe computational experience with automatic differentiation of mathematical programming problems expressed in the modeling language AMPL. Nonlinear expressions are translated to loopfree code, which makes it easy to compute gradients and Jacobians by backward automatic differentiation. ..."
Abstract

Cited by 16 (10 self)
 Add to MetaCart
We describe computational experience with automatic differentiation of mathematical programming problems expressed in the modeling language AMPL. Nonlinear expressions are translated to loopfree code, which makes it easy to compute gradients and Jacobians by backward automatic differentiation. The nonlinear expressions may be interpreted or, to gain some evaluation speed at the cost of increased preparation time, converted to Fortran or C. We have extended the interpretive scheme to evaluate Hessian (of Lagrangian) times vector. Detecting partially separable structure (sums of terms, each depending, perhaps after a linear transformation, on only a few variables) is of independent interest, as some solvers exploit this structure. It can be detected automatically by suitable "tree walks". Exploiting this structure permits an AD computation of the entire Hessian matrix by accumulating Hessian times vector computations for each term, and can lead to a much faster computation...
Experience with a Primal Presolve Algorithm
 IN LARGE SCALE OPTIMIZATION: STATE OF THE
, 1994
"... Sometimes an optimization problem can be simplified to a form that is faster to solve. Indeed, sometimes it is convenient to state a problem in a way that admits some obvious simplifications, such as eliminating fixed variables and removing constraints that become redundant after simple bounds on th ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
Sometimes an optimization problem can be simplified to a form that is faster to solve. Indeed, sometimes it is convenient to state a problem in a way that admits some obvious simplifications, such as eliminating fixed variables and removing constraints that become redundant after simple bounds on the variables have been updated appropriately. Because of this convenience, the AMPL modeling system includes a "presolver" that attempts to simplify a problem before passing it to a solver. The current AMPL presolver carries out all the primal simplifications described by Brearely et al. in 1975. This paper describes AMPL's presolver, discusses reconstruction of dual values for eliminated constraints, and presents some computational results.
A Fortran to C Converter
 BELL LABORATORIES, COMPUTER SCIENCE
, 1993
"... We describe f 2c, a program that translates Fortran 77 into C or C++. F2c lets one portably mix C and Fortran and makes a large body of welltested Fortran source code available to C environments. ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
We describe f 2c, a program that translates Fortran 77 into C or C++. F2c lets one portably mix C and Fortran and makes a large body of welltested Fortran source code available to C environments.
SMART: Towards Spatial Internet Marketplaces
 CSIRO Mathematical and Information Sciences, GPO Box 664
, 1997
"... . Spatial Internet Marketplaces are attractive as a mechanism to enable spatial applications to be built drawing on remote services for supply of data and for data manipulation. They differ from the usual distributed systems by treating the services as published by providers on the Internet and boug ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
. Spatial Internet Marketplaces are attractive as a mechanism to enable spatial applications to be built drawing on remote services for supply of data and for data manipulation. They differ from the usual distributed systems by treating the services as published by providers on the Internet and bought by customers on an asrequired basis. Design of a Spatial Internet Marketplace essentially seeks to overcome the problems of heterogeneity in large collections of autonomouslyprovided resources while avoiding complex requirements for publication. The architecture for a marketplace is considered in terms of the components required and the special issues in constructing an infrastructure for a Spatial Internet Marketplace. We initially describe the SMART (Spatial Marketplace) basic model, with its four service types of query, function, planning and execution services and different message types. A prototype application, the ACTTAP system, is sketched to demonstrate an application of the S...
Automatically finding and exploiting partially separable structure in nonlinear programming problems,” Bell Laboratories
, 1996
"... Nonlinear programming problems often involve an objective and constraints that are partially separable — the sum of terms involving only a few variables (perhaps after a linear change of variables). This paper discusses finding and exploiting such structure in nonlinear programming problems expresse ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Nonlinear programming problems often involve an objective and constraints that are partially separable — the sum of terms involving only a few variables (perhaps after a linear change of variables). This paper discusses finding and exploiting such structure in nonlinear programming problems expressed symbolically in the AMPL modeling language. For some computations, such as computing Hessians by backwards automatic differentiation, exploiting partial separability can give significant speedups. Overview To set the context for this paper, it is necessary to talk about various aspects of nonlinear programming problems and automatic differentiation. Accordingly, it is convenient to begin with brief overviews of Newton’s method, nonlinear programming, and automatic differentiation. Since I report computational experience with problems expressed symbolically in the AMPL modeling language, a brief account of AMPL is also appropriate. In the initial overviews, I will omit most references. Newton’s Method for Nonlinear Equations Newton’s method is in some ways an ideal algorithm for solving systems of nonlinear equations. It is easily derived by a linearization argument, and it converges quickly when started close to a ‘‘strong’ ’ solution. As a simple example, Table 1 shows the sequence of residual errors for a squareroot iteration; note how the residuals are approximately squared in successive iterations (‘‘quadratic convergence’’). In more detail, if f: I R n →I R n is a differentiable mapping of real nspace to itself, then f (x + y) ∼ f (x) + f ′(x) (y − x), where f ′(x) is the Jacobian matrix of f at x, so if f ′(x) is nonsingular and y = x − f ′(x) − 1 f (x), then f (y) ∼ 0, which gives Newton’s method: (1) x k + 1 = x k − f ′(x k) − 1 f (x k). To carry out a step of Newton’s method, it is of course not necessary to explicitly form f ′(x k) − 1; rather it suffices to solve
The Development of Parallel Optimisation Routines for the NAG Parallel Library
, 1998
"... this paper we consider the design, development and evaluation of parallel optimisation routines for the NAG Parallel Library. We focus on the parallel implementation of two optimisation routines from the NAG sequential library [11], E04JAF and E04UCF. E04JAF implements a quasiNewton algorithm for t ..."
Abstract
 Add to MetaCart
this paper we consider the design, development and evaluation of parallel optimisation routines for the NAG Parallel Library. We focus on the parallel implementation of two optimisation routines from the NAG sequential library [11], E04JAF and E04UCF. E04JAF implements a quasiNewton algorithm for the minimisation of a smooth function subject to fixed upper and lower bounds on the variables (see [4], [6] for information about quasiNewton algorithms, and [7] for details of E04JAF). E04UCF implements a sequential quadratic programming (SQP) algorithm to minimise a smooth function subject to constraints on the variables which may include simple bounds, linear constraints and smooth nonlinear constraints (see [4], [6] for information about SQP algorithms, and [8] for details of E04UCF). The documentation of E04UCF suggests that the user supplies any known partial derivatives of the objective and constraint functions to improve time to solution and robustness of the algorithm. All partial derivatives in E04JAF and any unsupplied partial derivatives in E04UCF are approximated by finite differences.
Preliminary LAPACK Users' Guide
 in the PAT algebra tutor. Journal of Interactive Media in Education
, 1999
"... LAPACK is a transportable library of Fortran 77 subroutines for solving the most common problems in numerical linear algebra: systems of linear equations, linear least squares problems, eigenvalue problems and singular value problems. It has been designed to be efficient on a wide range of modern hi ..."
Abstract
 Add to MetaCart
LAPACK is a transportable library of Fortran 77 subroutines for solving the most common problems in numerical linear algebra: systems of linear equations, linear least squares problems, eigenvalue problems and singular value problems. It has been designed to be efficient on a wide range of modern highperformance computers. This is a Preliminary Users' Guide to Release 1.0 of LAPACK. It gives an informal introduction to the design of the algorithms and software, summarizes the contents of the first release, and describes the conventions used in the software and its documentation.