Results 11  20
of
458
Summa: Scalable universal matrix multiplication algorithm
, 1997
"... In this paper, we give a straight forward, highly e cient, scalable implementation of common matrix multiplication operations. The algorithms are much simpler than previously published methods, yield better performance, and require less work space. MPI implementations are given, as are performance r ..."
Abstract

Cited by 78 (4 self)
 Add to MetaCart
In this paper, we give a straight forward, highly e cient, scalable implementation of common matrix multiplication operations. The algorithms are much simpler than previously published methods, yield better performance, and require less work space. MPI implementations are given, as are performance results on the Intel Paragon system. 1
Nonlinear Array Layouts for Hierarchical Memory Systems
, 1999
"... Programming languages that provide multidimensional arrays and a flat linear model of memory must implement a mapping between these two domains to order array elements in memory. This layout function is fixed at language definition time and constitutes an invisible, nonprogrammable array attribute. ..."
Abstract

Cited by 74 (5 self)
 Add to MetaCart
(Show Context)
Programming languages that provide multidimensional arrays and a flat linear model of memory must implement a mapping between these two domains to order array elements in memory. This layout function is fixed at language definition time and constitutes an invisible, nonprogrammable array attribute. In reality, modern memory systems are architecturally hierarchical rather than flat, with substantial differences in performance among different levels of the hierarchy. This mismatch between the model and the true architecture of memory systems can result in low locality of reference and poor performance. Some of this loss in performance can be recovered by reordering computations using transformations such as loop tiling. We explore nonlinear array layout functions as an additional means of improving locality of reference. For a benchmark suite composed of dense matrix kernels, we show by timing and simulation that two specific layouts (4D and Morton) have low implementation costs (25% of total running time) and high performance benefits (reducing execution time by factors of 1.12.5); that they have smooth performance curves, both across a wide range of problem sizes and over representative cache architectures; and that recursionbased control structures may be needed to fully exploit their potential.
Scientific Computing on Bulk Synchronous Parallel Architectures
"... We theoretically and experimentally analyse the efficiency with which a wide range of important scientific computations can be performed on bulk synchronous parallel architectures. ..."
Abstract

Cited by 72 (13 self)
 Add to MetaCart
(Show Context)
We theoretically and experimentally analyse the efficiency with which a wide range of important scientific computations can be performed on bulk synchronous parallel architectures.
NetSolve: A Networkenabled Server for Solving Computational Science Problems
 The International Journal of Supercomputer Applications and High Performance Computing
, 2000
"... This paper presents a new system, called NetSolve, that allows users to access computational resources, such as hardware and software, distributed across the network. The development of NetSolve was motivated by the need for an easytouse, efficient mechanism for using computational resources remot ..."
Abstract

Cited by 72 (5 self)
 Add to MetaCart
(Show Context)
This paper presents a new system, called NetSolve, that allows users to access computational resources, such as hardware and software, distributed across the network. The development of NetSolve was motivated by the need for an easytouse, efficient mechanism for using computational resources remotely. Ease of use is obtained as a result of different interfaces, some of which require no programming effort from the user. Good performance is ensured by a loadbalancing policy that enables NetSolve to use the computational resources available as efficiently as possible. NetSolve offers the ability to look for computational resources on a network, choose the best one available, solve a problem (with retry for faulttolerance), and return the answer to the user.
A Numerically Stable, Structure Preserving Method for Computing the Eigenvalues of Real Hamiltonian or Symplectic Pencils
 Numer. Math
, 1996
"... A new method is presented for the numerical computation of the generalized eigenvalues of real Hamiltonian or symplectic pencils and matrices. The method is strongly backward stable, i.e., it is numerically backward stable and preserves the structure (i.e., Hamiltonian or symplectic). In the case of ..."
Abstract

Cited by 68 (31 self)
 Add to MetaCart
(Show Context)
A new method is presented for the numerical computation of the generalized eigenvalues of real Hamiltonian or symplectic pencils and matrices. The method is strongly backward stable, i.e., it is numerically backward stable and preserves the structure (i.e., Hamiltonian or symplectic). In the case of a Hamiltonian matrix the method is closely related to the square reduced method of Van Loan, but in contrast to that method which may suffer from a loss of accuracy of order p ", where " is the machine precision, the new method computes the eigenvalues to full possible accuracy. Keywords. eigenvalue problem, Hamiltonian pencil (matrix), symplectic pencil (matrix), skewHamiltonian matrix AMS subject classification. 65F15 1 Introduction The eigenproblem for Hamiltonian and symplectic matrices has received a lot of attention in the last 25 years, since the landmark papers of Laub [13] and Paige/Van Loan [20]. The reason for this is the importance of this problem in many applications in c...
Parallel tiled QR factorization for multicore architectures
, 2007
"... As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these new processors. Fine grain parallelism becomes a major requ ..."
Abstract

Cited by 67 (35 self)
 Add to MetaCart
(Show Context)
As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these new processors. Fine grain parallelism becomes a major requirement and introduces the necessity of loose synchronization in the parallel execution of an operation. This paper presents an algorithm for the QR factorization where the operations can be represented as a sequence of small tasks that operate on square blocks of data. These tasks can be dynamically scheduled for execution based on the dependencies among them and on the availability of computational resources. This may result in an out of order execution of the tasks which will completely hide the presence of intrinsically sequential tasks in the factorization. Performance comparisons are presented with the LAPACK algorithm for QR factorization where parallelism can only be exploited at the level of the BLAS operations.
Balanced Truncation Model Reduction of LargeScale Dense Systems on Parallel Computers
, 1999
"... ..."
PUMMA: Parallel Universal Matrix Multiplication Algorithms on Distributed Memory Concurrent Computers
, 1993
"... 05, NASA Ames Research Center, Moffet Field, CA 94035 134. William C. Skamarock, 3973 Escuela Court, Boulder, CO 80301 135. Richard Smith, Los Alamos National Laboratory, Group T3, Mail Stop B2316, Los Alamos, NM 87545 136. Peter Smolarkiewicz, National Center for Atmospheric Research, MMM Group, ..."
Abstract

Cited by 64 (10 self)
 Add to MetaCart
(Show Context)
05, NASA Ames Research Center, Moffet Field, CA 94035 134. William C. Skamarock, 3973 Escuela Court, Boulder, CO 80301 135. Richard Smith, Los Alamos National Laboratory, Group T3, Mail Stop B2316, Los Alamos, NM 87545 136. Peter Smolarkiewicz, National Center for Atmospheric Research, MMM Group, P. O. Box 3000, Boulder, CO 80307 137. Jurgen Steppeler, DWD, Frankfurterstr 135, 6050 Offenbach, WEST GERMANY 138. Rick Stevens, Mathematics and Computer Science Division, Argonne National Laboratory, 9700 South Cass Avenue, Argonne, IL 60439 139. Paul N. Swarztrauber, National Center for Atmospheric Research, P. O. Box 3000, Boulder, CO 80307 140. Wei Pai Tang, Department of Computer Science, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1 141. Harold Trease, Los Alamos National Laboratory, Mail Stop B257, Los Alamos, NM 87545 142. Robert G. Voigt, ICASE, MS 132C, NASA Langley Research Center, Hampton, VA 23665 143. Mary F. Wheeler, Rice University, Department of Mathematical Sc
Stability of Travelling Waves
, 2002
"... An overview of various aspects related to the spectral and nonlinear stability of travellingwave solutions to partial differential equations is given. The point and the essential spectrum of the linearization about a travelling wave are discussed as is the relation between these spectra, Fredholm p ..."
Abstract

Cited by 55 (7 self)
 Add to MetaCart
(Show Context)
An overview of various aspects related to the spectral and nonlinear stability of travellingwave solutions to partial differential equations is given. The point and the essential spectrum of the linearization about a travelling wave are discussed as is the relation between these spectra, Fredholm properties, and the existence of exponential dichotomies (or Green's functions) for the linear operator. Among the other topics reviewed in this survey are the nonlinear stability of waves, the stability and interaction of wellseparated multibump pulses, the numerical computation of spectra, and the Evans function, which is a tool to locate isolated eigenvalues in the point spectrum and near the essential spectrum. Furthermore, methods for the stability of waves in Hamiltonian and monotone equations as well as for singularly perturbed problems are mentioned. Modulated waves, rotating waves on the plane, and travelling waves on cylindrical domains are also discussed briefly.