Results 11  20
of
351
An Updated Set of Basic Linear Algebra Subprograms (BLAS)
 ACM Transactions on Mathematical Software
, 2001
"... This paper summarizes the BLAS Technical Forum Standard, a speci #cation of a set of kernel routines for linear algebra, historically called the Basic Linear Algebra Subprograms and commonly known as the BLAS. The complete standard can be found in #1#, and on the BLAS Technical Forum webpage #http: ..."
Abstract

Cited by 72 (7 self)
 Add to MetaCart
This paper summarizes the BLAS Technical Forum Standard, a speci #cation of a set of kernel routines for linear algebra, historically called the Basic Linear Algebra Subprograms and commonly known as the BLAS. The complete standard can be found in #1#, and on the BLAS Technical Forum webpage #http:##www.netlib.org#blas#blastforum##
Scientific Computing on Bulk Synchronous Parallel Architectures
"... We theoretically and experimentally analyse the efficiency with which a wide range of important scientific computations can be performed on bulk synchronous parallel architectures. ..."
Abstract

Cited by 70 (13 self)
 Add to MetaCart
We theoretically and experimentally analyse the efficiency with which a wide range of important scientific computations can be performed on bulk synchronous parallel architectures.
A Numerically Stable, Structure Preserving Method for Computing the Eigenvalues of Real Hamiltonian or Symplectic Pencils
 Numer. Math
, 1996
"... A new method is presented for the numerical computation of the generalized eigenvalues of real Hamiltonian or symplectic pencils and matrices. The method is strongly backward stable, i.e., it is numerically backward stable and preserves the structure (i.e., Hamiltonian or symplectic). In the case of ..."
Abstract

Cited by 68 (30 self)
 Add to MetaCart
A new method is presented for the numerical computation of the generalized eigenvalues of real Hamiltonian or symplectic pencils and matrices. The method is strongly backward stable, i.e., it is numerically backward stable and preserves the structure (i.e., Hamiltonian or symplectic). In the case of a Hamiltonian matrix the method is closely related to the square reduced method of Van Loan, but in contrast to that method which may suffer from a loss of accuracy of order p ", where " is the machine precision, the new method computes the eigenvalues to full possible accuracy. Keywords. eigenvalue problem, Hamiltonian pencil (matrix), symplectic pencil (matrix), skewHamiltonian matrix AMS subject classification. 65F15 1 Introduction The eigenproblem for Hamiltonian and symplectic matrices has received a lot of attention in the last 25 years, since the landmark papers of Laub [13] and Paige/Van Loan [20]. The reason for this is the importance of this problem in many applications in c...
NetSolve: A Networkenabled Server for Solving Computational Science Problems
 The International Journal of Supercomputer Applications and High Performance Computing
, 2000
"... This paper presents a new system, called NetSolve, that allows users to access computational resources, such as hardware and software, distributed across the network. The development of NetSolve was motivated by the need for an easytouse, efficient mechanism for using computational resources remot ..."
Abstract

Cited by 67 (4 self)
 Add to MetaCart
This paper presents a new system, called NetSolve, that allows users to access computational resources, such as hardware and software, distributed across the network. The development of NetSolve was motivated by the need for an easytouse, efficient mechanism for using computational resources remotely. Ease of use is obtained as a result of different interfaces, some of which require no programming effort from the user. Good performance is ensured by a loadbalancing policy that enables NetSolve to use the computational resources available as efficiently as possible. NetSolve offers the ability to look for computational resources on a network, choose the best one available, solve a problem (with retry for faulttolerance), and return the answer to the user.
Summa: Scalable universal matrix multiplication algorithm
, 1997
"... In this paper, we give a straight forward, highly e cient, scalable implementation of common matrix multiplication operations. The algorithms are much simpler than previously published methods, yield better performance, and require less work space. MPI implementations are given, as are performance r ..."
Abstract

Cited by 65 (4 self)
 Add to MetaCart
In this paper, we give a straight forward, highly e cient, scalable implementation of common matrix multiplication operations. The algorithms are much simpler than previously published methods, yield better performance, and require less work space. MPI implementations are given, as are performance results on the Intel Paragon system. 1
Parallel tiled QR factorization for multicore architectures
, 2007
"... As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these new processors. Fine grain parallelism becomes a major requ ..."
Abstract

Cited by 62 (31 self)
 Add to MetaCart
As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these new processors. Fine grain parallelism becomes a major requirement and introduces the necessity of loose synchronization in the parallel execution of an operation. This paper presents an algorithm for the QR factorization where the operations can be represented as a sequence of small tasks that operate on square blocks of data. These tasks can be dynamically scheduled for execution based on the dependencies among them and on the availability of computational resources. This may result in an out of order execution of the tasks which will completely hide the presence of intrinsically sequential tasks in the factorization. Performance comparisons are presented with the LAPACK algorithm for QR factorization where parallelism can only be exploited at the level of the BLAS operations.
PUMMA: Parallel Universal Matrix Multiplication Algorithms on Distributed Memory Concurrent Computers
, 1993
"... 05, NASA Ames Research Center, Moffet Field, CA 94035 134. William C. Skamarock, 3973 Escuela Court, Boulder, CO 80301 135. Richard Smith, Los Alamos National Laboratory, Group T3, Mail Stop B2316, Los Alamos, NM 87545 136. Peter Smolarkiewicz, National Center for Atmospheric Research, MMM Group, ..."
Abstract

Cited by 59 (10 self)
 Add to MetaCart
05, NASA Ames Research Center, Moffet Field, CA 94035 134. William C. Skamarock, 3973 Escuela Court, Boulder, CO 80301 135. Richard Smith, Los Alamos National Laboratory, Group T3, Mail Stop B2316, Los Alamos, NM 87545 136. Peter Smolarkiewicz, National Center for Atmospheric Research, MMM Group, P. O. Box 3000, Boulder, CO 80307 137. Jurgen Steppeler, DWD, Frankfurterstr 135, 6050 Offenbach, WEST GERMANY 138. Rick Stevens, Mathematics and Computer Science Division, Argonne National Laboratory, 9700 South Cass Avenue, Argonne, IL 60439 139. Paul N. Swarztrauber, National Center for Atmospheric Research, P. O. Box 3000, Boulder, CO 80307 140. Wei Pai Tang, Department of Computer Science, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1 141. Harold Trease, Los Alamos National Laboratory, Mail Stop B257, Los Alamos, NM 87545 142. Robert G. Voigt, ICASE, MS 132C, NASA Langley Research Center, Hampton, VA 23665 143. Mary F. Wheeler, Rice University, Department of Mathematical Sc
Solving Algebraic Riccati Equations on Parallel Computers Using Newton's Method with Exact Line Search
, 1999
"... We investigate the numerical solution of continuoustime algebraic Riccati equations via Newton's method on serial and parallel computers with distributed memory. We apply and extend the available theory for Newton's method endowed with exact line search to accelerate convergence. We also discuss a ..."
Abstract

Cited by 52 (7 self)
 Add to MetaCart
We investigate the numerical solution of continuoustime algebraic Riccati equations via Newton's method on serial and parallel computers with distributed memory. We apply and extend the available theory for Newton's method endowed with exact line search to accelerate convergence. We also discuss a new stopping criterion based on recent observations regarding condition and error estimates. In each iteration step of Newton's method a stable Lyapunov equation has too be solved. We propose to solve these Lyapunov equations using iterative schemes for computing the matrix sign function. This approach can be efficiently implemented on parallel computers using ScaLAPACK. Numerical experiments on an ibm sp2 multicomputer report the accuracy, scalability, and speedup of the implemented algorithms.
A new method for computing the stable invariant subspace of a real Hamiltonian matrix
, 1997
"... A new backward stable, structure preserving method of complexity O(n 3 ) is presented for computing the stable invariant subspace of a real Hamiltonian matrix and the stabilizing solution of the continuoustime algebraic Riccati equation. The new method is based on the relationship between the inv ..."
Abstract

Cited by 49 (28 self)
 Add to MetaCart
A new backward stable, structure preserving method of complexity O(n 3 ) is presented for computing the stable invariant subspace of a real Hamiltonian matrix and the stabilizing solution of the continuoustime algebraic Riccati equation. The new method is based on the relationship between the invariant subspaces of the Hamiltonian matrix H and the extended matrix 0 H H 0 and makes use of the symplectic URVlike decomposition that was recently introduced by the authors. Keywords. Eigenvalue problem, Hamiltonian matrix, algebraic Riccati equation, sign function, invariant subspace. AMS subject classification. 65F15, 93B40, 93B36, 93C60. 1 Introduction It is a well accepted fact in numerical analysis that a numerical algorithm should reflect as many of the structural properties of the physical problem or the resulting mathematical model. For the solution of eigenvalue problems this means that use of the symmetry structures of the matrix or the spectrum is made. While for symme...