Results 1  10
of
19
An Extended Set of Fortran Basic Linear Algebra Subprograms
 ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE
, 1986
"... This paper describes an extension to the set of Basic Linear Algebra Subprograms. The extensions are targeted at matrixvector operations which should provide for efficient and portable implementations of algorithms for high performance computers. ..."
Abstract

Cited by 450 (71 self)
 Add to MetaCart
This paper describes an extension to the set of Basic Linear Algebra Subprograms. The extensions are targeted at matrixvector operations which should provide for efficient and portable implementations of algorithms for high performance computers.
An Evaluation of Java for Numerical Computing
 In Proceedings of ISCOPE'98
, 1998
"... This paper describes the design and implementation of high performance numerical software in Java. Our primary goals are to characterize the performance of objectoriented numerical software written in Java and to investigate whether Java is a suitable language for such endeavors. We have implemente ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
This paper describes the design and implementation of high performance numerical software in Java. Our primary goals are to characterize the performance of objectoriented numerical software written in Java and to investigate whether Java is a suitable language for such endeavors. We have implemented JLAPACK, a subset of the LAPACK library in Java. LAPACK is a highperformance Fortran 77 library used to solve common linear algebra problems. JLAPACK is an objectoriented library, using encapsulation, inheritance, and exception handling. It performs within a factor of four of the optimized Fortran version for certain platforms and test cases. When used with the native BLAS library, JLAPACK performs comparably with the Fortran version using the native BLAS library. We conclude that highperformance numerical software could be written in Java if a handful of concerns about language features and compilation strategies are adequately addressed.
A Parallel Block Implementation of Level 3 BLAS for MIMD Vector Processors
 ACM Transactions on Mathematical Software
, 1993
"... We describe an implementation of Level 3 BLAS based on the use of the matrixmatrix multiplication kernel (GEMM). Blocking techniques are used to express the BLAS in terms of operations involving triangular blocks and calls to GEMM. A principal advantage of this approach is that most manufacturers pr ..."
Abstract

Cited by 14 (9 self)
 Add to MetaCart
We describe an implementation of Level 3 BLAS based on the use of the matrixmatrix multiplication kernel (GEMM). Blocking techniques are used to express the BLAS in terms of operations involving triangular blocks and calls to GEMM. A principal advantage of this approach is that most manufacturers provide at least an efficient serial version of GEMM so that our implementation can capture a significant percentage of the computer performance. A parameter which controls the blocking allows an efficient exploitation of the memory hierarchy of the various target computers. Furthermore, this blocked version of Level 3 BLAS is naturally parallel. We present results on the ALLIANT FX/80, the CONVEX C220, the CRAY2, and the IBM 3090/VF. For GEMM, we always use the manufacturersupplied versions. For the operations dealing with triangular blocks, we use assembler or tuned Fortran (using looopunrolling) codes, depending on the efficiency of the available libraries. Keywords : Vectorization, para...
The Automatic Generation of Sparse Primitives
 ACM Transactions on Mathematical Software
, 1996
"... this paper, we discuss some of our experiences with this new approach. ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
this paper, we discuss some of our experiences with this new approach.
INTLIB: A Portable FORTRAN 77 Interval Standard Function Library
"... INTLIB is meant to be a readily available, portable, exhaustively documented interval arithmetic library, written in standard FORTRAN 77. Its underlying philosophy is to provide a standard for interval operations to aid in efficiently transporting programs involving interval arithmetic. The model is ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
INTLIB is meant to be a readily available, portable, exhaustively documented interval arithmetic library, written in standard FORTRAN 77. Its underlying philosophy is to provide a standard for interval operations to aid in efficiently transporting programs involving interval arithmetic. The model is the BLAS package, for basic linear algebra operations. The library is composed of elementary interval arithmetic routines, standard function routines for interval data and values, and utility routines. The library can be used with INTBIS (Algorithm 681), and a Fortran 90 module to use the library to define an interval data type is available from the first author. Keywords: interval arithmetic, standard functions, BLAS, operator overloading, FORTRAN 77, Fortran 90 Subject classifications: AMS: 65G10, 65D15, 26A09. CR: G.1.0 (Computer arithmetic), G.1.2 (standard function approximation), D.2.2 (Software libraries) D.2.7 (documentation, portability) This work is partially supported by Nat...
Lapack 3.1 xHSEQR: Tuning and Implementation Notes on the Small Bulge Multishift QR Algorithm with Aggressive Early Deflation
, 2007
"... This note documents implementation details of the small bulge, multishift QR algorithm with aggressive early deflation that appears as Lapack version 3.1 programs CHSEQR, DHSEQR, SHSEQR and ZHSEQR and the subroutines they call. These codes calculate eigenvalues and optionally a Schur factorization ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
This note documents implementation details of the small bulge, multishift QR algorithm with aggressive early deflation that appears as Lapack version 3.1 programs CHSEQR, DHSEQR, SHSEQR and ZHSEQR and the subroutines they call. These codes calculate eigenvalues and optionally a Schur factorization of a Hessenberg matrix. They do the bulk of the work required to calculate eigenvalues and optionally eigenvectors of a general nonsymmetric matrix. This report is intended to provide some guidance for setting the machine dependent tuning parameters, to help maintainers to identify and correct problems, and to help developers improve upon this implementation.
Exploiting separability in largescale support vector machine training
, 2007
"... Support vector machine training can be represented as a large quadratic program. We present an efficient and numerically stable algorithm for this problem using interior point methods, which requires only O(n) operations per iteration. Through exploiting the separability of the Hessian, we provide a ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
Support vector machine training can be represented as a large quadratic program. We present an efficient and numerically stable algorithm for this problem using interior point methods, which requires only O(n) operations per iteration. Through exploiting the separability of the Hessian, we provide a unified approach, from an optimization perspective, to 1norm classification, 2norm classification and ɛinsensitive regression. Numerical experiments indicate that, in contrast to existing decomposition methods, the algorithm is largely unaffected by noisy data, for both linear and nonlinear kernels, and they show our implementation outperforming all known implementations by a large margin. We discuss the effect of using multiple correctors, and monitoring the angle of the normal to the hyperplane to determine termination.
An Extended Set of Fortran Basic Linear Algebra Subprograms
 ACM Transactions on Mathematical Software
, 1986
"... This paper describes an extension to the set of Basic Linear Algebra Subprograms. The extensions are targeted at matrixvector operations which should provide for efficient and portable implementations of algorithms for high performance computers. ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
This paper describes an extension to the set of Basic Linear Algebra Subprograms. The extensions are targeted at matrixvector operations which should provide for efficient and portable implementations of algorithms for high performance computers.
A Block Version of the EskowSchnabel Modified Cholesky Factorization
, 1995
"... The modified Cholesky factorization is widely used in optimization. Let A be a symmetric nbyn, not necessarily positivedefinite matrix, then we can compute : P T (A +E)P = LL T where P is a permutation matrix and E is a nbyn matrix equal to 0 if A is safely positivedefinite, otherwise E ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
The modified Cholesky factorization is widely used in optimization. Let A be a symmetric nbyn, not necessarily positivedefinite matrix, then we can compute : P T (A +E)P = LL T where P is a permutation matrix and E is a nbyn matrix equal to 0 if A is safely positivedefinite, otherwise E is a diagonal matrix chosen so that A+E is safely positivedefinite ([17], [21], and [22]). We describe a block version of the EskowSchnabel Modified Cholesky factorization that gives exactly the same numerical results as the original algorithm but allows for the use of the Level 3 BLAS computational kernels ([12] and [13]), and thus takes advantage of the memory hierarchy of today's high performance computers including vector computers and RISCbased workstations. Keywords : Nonlinear optimization, Level 3 BLAS, matrixmatrix kernels, block algorithms, RISC processors, vector processors. 1 Contents 1 Introduction 3 2 The EskowSchnabel Modified Cholesky Factorization 3 2.1 Phase...
Local Basic Linear Algebra Subroutines (LBLAS) for the CM5/5E
 Harvard University, Division of Applied Sciences
, 1994
"... The Connection Machine Scientific Software Library (CMSSL) is a library of scientific routines designed for distributed memory architectures. The BLAS of the CMSSL have been implemented as a twolevel structure to exploit optimizations local to nodes and across nodes. This paper presents the implem ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
The Connection Machine Scientific Software Library (CMSSL) is a library of scientific routines designed for distributed memory architectures. The BLAS of the CMSSL have been implemented as a twolevel structure to exploit optimizations local to nodes and across nodes. This paper presents the implementation considerations and performance of the Local BLAS, or BLAS local to each node of the system. A wide variety of loop structures and unrollings have been implemented in order to achieve a uniform and high performance, irrespective of the data layout in node memory. The CMSSL is the only existing highperformance library capable of supporting both the data parallel and message passing modes of programming a distributed memory computer. The implications of implementing BLAS on distributed memory computers are considered in this light. 1 Introduction The Connection Machine systems provide highperformance computation on large, dataintensive problems and have found successful applica...