Results 1 - 10
of
18
An Extended Set of Fortran Basic Linear Algebra Subprograms
- ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE
, 1986
"... This paper describes an extension to the set of Basic Linear Algebra Subprograms. The extensions are targeted at matrix-vector operations which should provide for efficient and portable implementations of algorithms for high performance computers. ..."
Abstract
-
Cited by 409 (72 self)
- Add to MetaCart
This paper describes an extension to the set of Basic Linear Algebra Subprograms. The extensions are targeted at matrix-vector operations which should provide for efficient and portable implementations of algorithms for high performance computers.
An Evaluation of Java for Numerical Computing
- In Proceedings of ISCOPE'98
, 1998
"... This paper describes the design and implementation of high performance numerical software in Java. Our primary goals are to characterize the performance of object-oriented numerical software written in Java and to investigate whether Java is a suitable language for such endeavors. We have implemente ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
This paper describes the design and implementation of high performance numerical software in Java. Our primary goals are to characterize the performance of object-oriented numerical software written in Java and to investigate whether Java is a suitable language for such endeavors. We have implemented JLAPACK, a subset of the LAPACK library in Java. LAPACK is a high-performance Fortran 77 library used to solve common linear algebra problems. JLAPACK is an object-oriented library, using encapsulation, inheritance, and exception handling. It performs within a factor of four of the optimized Fortran version for certain platforms and test cases. When used with the native BLAS library, JLAPACK performs comparably with the Fortran version using the native BLAS library. We conclude that high-performance numerical software could be written in Java if a handful of concerns about language features and compilation strategies are adequately addressed.
A Parallel Block Implementation of Level 3 BLAS for MIMD Vector Processors
- ACM Transactions on Mathematical Software
, 1993
"... We describe an implementation of Level 3 BLAS based on the use of the matrixmatrix multiplication kernel (GEMM). Blocking techniques are used to express the BLAS in terms of operations involving triangular blocks and calls to GEMM. A principal advantage of this approach is that most manufacturers pr ..."
Abstract
-
Cited by 14 (9 self)
- Add to MetaCart
We describe an implementation of Level 3 BLAS based on the use of the matrixmatrix multiplication kernel (GEMM). Blocking techniques are used to express the BLAS in terms of operations involving triangular blocks and calls to GEMM. A principal advantage of this approach is that most manufacturers provide at least an efficient serial version of GEMM so that our implementation can capture a significant percentage of the computer performance. A parameter which controls the blocking allows an efficient exploitation of the memory hierarchy of the various target computers. Furthermore, this blocked version of Level 3 BLAS is naturally parallel. We present results on the ALLIANT FX/80, the CONVEX C220, the CRAY-2, and the IBM 3090/VF. For GEMM, we always use the manufacturer-supplied versions. For the operations dealing with triangular blocks, we use assembler or tuned Fortran (using looop-unrolling) codes, depending on the efficiency of the available libraries. Keywords : Vectorization, para...
The Automatic Generation of Sparse Primitives
- ACM Transactions on Mathematical Software
, 1996
"... this paper, we discuss some of our experiences with this new approach. ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
this paper, we discuss some of our experiences with this new approach.
INTLIB: A Portable FORTRAN 77 Interval Standard Function Library
"... INTLIB is meant to be a readily available, portable, exhaustively documented interval arithmetic library, written in standard FORTRAN 77. Its underlying philosophy is to provide a standard for interval operations to aid in efficiently transporting programs involving interval arithmetic. The model is ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
INTLIB is meant to be a readily available, portable, exhaustively documented interval arithmetic library, written in standard FORTRAN 77. Its underlying philosophy is to provide a standard for interval operations to aid in efficiently transporting programs involving interval arithmetic. The model is the BLAS package, for basic linear algebra operations. The library is composed of elementary interval arithmetic routines, standard function routines for interval data and values, and utility routines. The library can be used with INTBIS (Algorithm 681), and a Fortran 90 module to use the library to define an interval data type is available from the first author. Keywords: interval arithmetic, standard functions, BLAS, operator overloading, FORTRAN 77, Fortran 90 Subject classifications: AMS: 65G10, 65D15, 26A09. CR: G.1.0 (Computer arithmetic), G.1.2 (standard function approximation), D.2.2 (Software libraries) D.2.7 (documentation, portability) This work is partially supported by Nat...
Lapack 3.1 xHSEQR: Tuning and Implementation Notes on the Small Bulge Multi-shift QR Algorithm with Aggressive Early Deflation
, 2007
"... This note documents implementation details of the small bulge, multi-shift QR algorithm with aggressive early deflation that appears as Lapack version 3.1 programs CHSEQR, DHSEQR, SHSEQR and ZHSEQR and the subroutines they call. These codes calculate eigenvalues and optionally a Schur factorization ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
This note documents implementation details of the small bulge, multi-shift QR algorithm with aggressive early deflation that appears as Lapack version 3.1 programs CHSEQR, DHSEQR, SHSEQR and ZHSEQR and the subroutines they call. These codes calculate eigenvalues and optionally a Schur factorization of a Hessenberg matrix. They do the bulk of the work required to calculate eigenvalues and optionally eigenvectors of a general non-symmetric matrix. This report is intended to provide some guidance for setting the machine dependent tuning parameters, to help maintainers to identify and correct problems, and to help developers improve upon this implementation.
An Extended Set of Fortran Basic Linear Algebra Subprograms
- ACM Transactions on Mathematical Software
, 1986
"... This paper describes an extension to the set of Basic Linear Algebra Subprograms. The extensions are targeted at matrix-vector operations which should provide for efficient and portable implementations of algorithms for high performance computers. ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper describes an extension to the set of Basic Linear Algebra Subprograms. The extensions are targeted at matrix-vector operations which should provide for efficient and portable implementations of algorithms for high performance computers.
Exploiting separability in large-scale support vector machine training
, 2007
"... Support vector machine training can be represented as a large quadratic program. We present an efficient and numerically stable algorithm for this problem using interior point methods, which requires only O(n) operations per iteration. Through exploiting the separability of the Hessian, we provide a ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Support vector machine training can be represented as a large quadratic program. We present an efficient and numerically stable algorithm for this problem using interior point methods, which requires only O(n) operations per iteration. Through exploiting the separability of the Hessian, we provide a unified approach, from an optimization perspective, to 1-norm classification, 2-norm classification and ɛ-insensitive regression. Numerical experiments indicate that, in contrast to existing decomposition methods, the algorithm is largely unaffected by noisy data, for both linear and non-linear kernels, and they show our implementation outperforming all known implementations by a large margin. We discuss the effect of using multiple correctors, and monitoring the angle of the normal to the hyperplane to determine termination.
A Block Version of the Eskow-Schnabel Modified Cholesky Factorization
, 1995
"... The modified Cholesky factorization is widely used in optimization. Let A be a symmetric n-by-n, not necessarily positive-definite matrix, then we can compute : P T (A +E)P = LL T where P is a permutation matrix and E is a n-by-n matrix equal to 0 if A is safely positive-definite, otherwise E ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
The modified Cholesky factorization is widely used in optimization. Let A be a symmetric n-by-n, not necessarily positive-definite matrix, then we can compute : P T (A +E)P = LL T where P is a permutation matrix and E is a n-by-n matrix equal to 0 if A is safely positive-definite, otherwise E is a diagonal matrix chosen so that A+E is safely positivedefinite ([17], [21], and [22]). We describe a block version of the Eskow-Schnabel Modified Cholesky factorization that gives exactly the same numerical results as the original algorithm but allows for the use of the Level 3 BLAS computational kernels ([12] and [13]), and thus takes advantage of the memory hierarchy of today's high performance computers including vector computers and RISC-based workstations. Keywords : Nonlinear optimization, Level 3 BLAS, matrix-matrix kernels, block algorithms, RISC processors, vector processors. 1 Contents 1 Introduction 3 2 The Eskow-Schnabel Modified Cholesky Factorization 3 2.1 Phase...
Local Basic Linear Algebra Subroutines (LBLAS) for the CM-5/5E
- Harvard University, Division of Applied Sciences
, 1994
"... The Connection Machine Scientific Software Library (CMSSL) is a library of scientific routines designed for distributed memory architectures. The BLAS of the CMSSL have been implemented as a two--level structure to exploit optimizations local to nodes and across nodes. This paper presents the implem ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The Connection Machine Scientific Software Library (CMSSL) is a library of scientific routines designed for distributed memory architectures. The BLAS of the CMSSL have been implemented as a two--level structure to exploit optimizations local to nodes and across nodes. This paper presents the implementation considerations and performance of the Local BLAS, or BLAS local to each node of the system. A wide variety of loop structures and unrollings have been implemented in order to achieve a uniform and high performance, irrespective of the data layout in node memory. The CMSSL is the only existing high--performance library capable of supporting both the data parallel and message passing modes of programming a distributed memory computer. The implications of implementing BLAS on distributed memory computers are considered in this light. 1 Introduction The Connection Machine systems provide high--performance computation on large, data--intensive problems and have found successful applica...

