Results 1  10
of
28
An Extended Set of Fortran Basic Linear Algebra Subprograms
 ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE
, 1986
"... This paper describes an extension to the set of Basic Linear Algebra Subprograms. The extensions are targeted at matrixvector operations which should provide for efficient and portable implementations of algorithms for high performance computers. ..."
Abstract

Cited by 517 (72 self)
 Add to MetaCart
This paper describes an extension to the set of Basic Linear Algebra Subprograms. The extensions are targeted at matrixvector operations which should provide for efficient and portable implementations of algorithms for high performance computers.
An Evaluation of Java for Numerical Computing
 In Proceedings of ISCOPE'98
, 1998
"... This paper describes the design and implementation of high performance numerical software in Java. Our primary goals are to characterize the performance of objectoriented numerical software written in Java and to investigate whether Java is a suitable language for such endeavors. We have implemente ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
This paper describes the design and implementation of high performance numerical software in Java. Our primary goals are to characterize the performance of objectoriented numerical software written in Java and to investigate whether Java is a suitable language for such endeavors. We have implemented JLAPACK, a subset of the LAPACK library in Java. LAPACK is a highperformance Fortran 77 library used to solve common linear algebra problems. JLAPACK is an objectoriented library, using encapsulation, inheritance, and exception handling. It performs within a factor of four of the optimized Fortran version for certain platforms and test cases. When used with the native BLAS library, JLAPACK performs comparably with the Fortran version using the native BLAS library. We conclude that highperformance numerical software could be written in Java if a handful of concerns about language features and compilation strategies are adequately addressed.
A Parallel Block Implementation of Level 3 BLAS for MIMD Vector Processors
 ACM Transactions on Mathematical Software
, 1994
"... ..."
The Automatic Generation of Sparse Primitives
 ACM Transactions on Mathematical Software
, 1996
"... this paper, we discuss some of our experiences with this new approach. ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
(Show Context)
this paper, we discuss some of our experiences with this new approach.
INTLIB: A portable FORTRAN 77 interval standard function library
 ACM Transactions on Mathematical Software
, 1994
"... ..."
(Show Context)
Lapack 3.1 xHSEQR: Tuning and Implementation Notes on the Small Bulge Multishift QR Algorithm with Aggressive Early Deflation
, 2007
"... This note documents implementation details of the small bulge, multishift QR algorithm with aggressive early deflation that appears as Lapack version 3.1 programs CHSEQR, DHSEQR, SHSEQR and ZHSEQR and the subroutines they call. These codes calculate eigenvalues and optionally a Schur factorization ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
This note documents implementation details of the small bulge, multishift QR algorithm with aggressive early deflation that appears as Lapack version 3.1 programs CHSEQR, DHSEQR, SHSEQR and ZHSEQR and the subroutines they call. These codes calculate eigenvalues and optionally a Schur factorization of a Hessenberg matrix. They do the bulk of the work required to calculate eigenvalues and optionally eigenvectors of a general nonsymmetric matrix. This report is intended to provide some guidance for setting the machine dependent tuning parameters, to help maintainers to identify and correct problems, and to help developers improve upon this implementation.
Exploiting separability in largescale support vector machine training
, 2007
"... Support vector machine training can be represented as a large quadratic program. We present an efficient and numerically stable algorithm for this problem using interior point methods, which requires only O(n) operations per iteration. Through exploiting the separability of the Hessian, we provide a ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
(Show Context)
Support vector machine training can be represented as a large quadratic program. We present an efficient and numerically stable algorithm for this problem using interior point methods, which requires only O(n) operations per iteration. Through exploiting the separability of the Hessian, we provide a unified approach, from an optimization perspective, to 1norm classification, 2norm classification and ɛinsensitive regression. Numerical experiments indicate that, in contrast to existing decomposition methods, the algorithm is largely unaffected by noisy data, for both linear and nonlinear kernels, and they show our implementation outperforming all known implementations by a large margin. We discuss the effect of using multiple correctors, and monitoring the angle of the normal to the hyperplane to determine termination.
Porting Industrial Codes and Developing Sparse Linear Solvers on Parallel Computers
, 1994
"... We address the main issues when porting existing codes from serial to parallel computers and when developing portable parallel software on MIMD multiprocessors (shared memory, virtual shared memory, and distributed memory multiprocessors, and networks of computers). We discuss the use of numerical l ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
We address the main issues when porting existing codes from serial to parallel computers and when developing portable parallel software on MIMD multiprocessors (shared memory, virtual shared memory, and distributed memory multiprocessors, and networks of computers). We discuss the use of numerical libraries as a way of developing portable and efficient parallel code. We illustrate this by using examples from our experience in porting industrial codes and in designing parallel numerical libraries. We report in some detail on the parallelization of scientific applications coming from Centre National d'Etudes Spatiales and from A'erospatiale, and we illustrate how it is possible to develop portable and efficient numerical software by considering the parallel solution of sparse linear systems of equations. 1 Introduction One of the common problems for application scientists is the porting of codes from serial to parallel computers. Since most of the existing software has been developed on ...
Exploiting separability in largescale linear Support Vector Machine training
, 2009
"... Linear support vector machine training can be represented as a large quadratic program. We present an efficient and numerically stable algorithm for this problem using interior point methods, which requires only O(n) operations per iteration. Through exploiting the separability of the Hessian, we p ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
(Show Context)
Linear support vector machine training can be represented as a large quadratic program. We present an efficient and numerically stable algorithm for this problem using interior point methods, which requires only O(n) operations per iteration. Through exploiting the separability of the Hessian, we provide a unified approach, from an optimization perspective, to 1norm classification, 2norm classification, universum classification, ordinal regression and ɛinsensitive regression. Our approach has the added advantage of obtaining the hyperplane weights and bias directly from the solver. Numerical experiments indicate that, in contrast to existing methods, the algorithm is largely unaffected by noisy data, and they show training times for our implementation are consistent and highly competitive. We discuss the effect of using multiple correctors, and monitoring the angle of the normal to the hyperplane to determine termination.
A Block Version of the EskowSchnabel Modified Cholesky Factorization
, 1995
"... The modified Cholesky factorization is widely used in optimization. Let A be a symmetric nbyn, not necessarily positivedefinite matrix, then we can compute : P T (A +E)P = LL T where P is a permutation matrix and E is a nbyn matrix equal to 0 if A is safely positivedefinite, otherwise E ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
The modified Cholesky factorization is widely used in optimization. Let A be a symmetric nbyn, not necessarily positivedefinite matrix, then we can compute : P T (A +E)P = LL T where P is a permutation matrix and E is a nbyn matrix equal to 0 if A is safely positivedefinite, otherwise E is a diagonal matrix chosen so that A+E is safely positivedefinite ([17], [21], and [22]). We describe a block version of the EskowSchnabel Modified Cholesky factorization that gives exactly the same numerical results as the original algorithm but allows for the use of the Level 3 BLAS computational kernels ([12] and [13]), and thus takes advantage of the memory hierarchy of today's high performance computers including vector computers and RISCbased workstations. Keywords : Nonlinear optimization, Level 3 BLAS, matrixmatrix kernels, block algorithms, RISC processors, vector processors. 1 Contents 1 Introduction 3 2 The EskowSchnabel Modified Cholesky Factorization 3 2.1 Phase...