Results 1  10
of
24
The Matrix Template Library: A Unifying Framework for Numerical Linear Algebra
 In Parallel Object Oriented Scientific Computing. ECOOP
, 1998
"... . We present a uni#ed approach for expressing high performance numerical linear algebra routines for a class of dense and sparse matrix formats and shapes. As with the Standard Template Library #7#, we explicitly separate algorithms from data structures through the use of generic programming tec ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
(Show Context)
. We present a uni#ed approach for expressing high performance numerical linear algebra routines for a class of dense and sparse matrix formats and shapes. As with the Standard Template Library #7#, we explicitly separate algorithms from data structures through the use of generic programming techniques. We conclude that such an approach does not hinder high performance. On the contrary, writing portable high performance codes is actually enabled with such an approach because the performance critical code sections can be isolated from the algorithms and the data structures. 1 Introduction The traditional approach to writing basic linear algebra routines is a combinatorial a#air. There are typically four precision types that need to be handled #single and double precision real, single and double precision complex#, several dense storage types #general, banded, packed#, a multitude of sparse storage types #the Sparse BLAS Standard Proposal includes 13 #1##, as well as row and co...
Direct Solvers for Symmetric Eigenvalue Problems
 IN MODERN METHODS AND ALGORITHMS OF QUANTUM CHEMISTRY, J. GROTENDORST (EDITOR), PROCEEDINGS, NIC SERIES VOLUME
, 2000
"... ..."
pMapper: Automatic Mapping of Parallel Matlab Programs*
"... Algorithm implementation efficiency is key to delivering highperformance computing capabilities to demanding, high throughput signal and image processing (SIP) applications and simulations. Significant progress has been made in compiler optimization of serial programs, but many applications require ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Algorithm implementation efficiency is key to delivering highperformance computing capabilities to demanding, high throughput signal and image processing (SIP) applications and simulations. Significant progress has been made in compiler optimization of serial programs, but many applications require parallel processing, which brings with it the difficult task of determining efficient mappings of algorithms to multiprocessor computers. The pMapper infrastructure addresses the problem of performance optimization of multistage MATLAB® applications on parallel architectures. pMapper is an automatic performance tuning library written as a layer on top of pMatlab. pMatlab is a parallel Matlab toolbox that provides MATLAB users with global array semantics. While pMatlab abstracts the messagepassing interface, the responsibility of generating maps for numerical arrays still falls on the user. A processor map for a numerical array is defined as an assignment of blocks of data to processing elements. Choosing the best mapping for a set of numerical arrays in a program is a nontrivial task that requires significant knowledge of programming languages, parallel computing, and processor architecture. pMapper automates the task of map generation. This paper addresses the design details of the pMapper infrastructure and presents preliminary results.
The Generalized Newton Iteration for the Matrix Sign Function
, 1997
"... . In this paper we present modified algorithms for computing deflating subspaces of matrix pencils by means of the matrix sign function. Specifically, our new algorithms reduce the number of iterations to half, cut the cost of each Newton iteration by more than 50%, and improve the accuracy of the c ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
. In this paper we present modified algorithms for computing deflating subspaces of matrix pencils by means of the matrix sign function. Specifically, our new algorithms reduce the number of iterations to half, cut the cost of each Newton iteration by more than 50%, and improve the accuracy of the computed deflating subspaces. The matrix sign function is thus revealed as an effective technique for applications where a part of the spectrum has to be identified or only the deflating subspaces are required. When the complete spectrum is desired, the matrix sign function can be used as an initial divideandconquer technique. The high performance of the basic kernels involved in this iteration are also specially appropriate for current parallel architectures. Key words. matrix Sign function, spectral decomposition, nonsymmetric eigenvalue problem, invariant subspaces. AMS subject classifications. 65F15, 47A75. 1. Introduction In the last years the matrix sign function has received a co...
On the Reusability and Numeric Efficiency of C++ Packages in Scientific Computing
 In The ClusterWorld Conference and Expo
, 2003
"... In this paper, we discuss the reusability and numerical efficiency of selected ObjectOriented numerical packages, serial and parallel, for developing high performance scientific computing applications. ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
In this paper, we discuss the reusability and numerical efficiency of selected ObjectOriented numerical packages, serial and parallel, for developing high performance scientific computing applications.
A 3D Parallel CommunicationEfficient Dense Linear Solver
, 2000
"... We present new communicationefficient parallel dense linear solvers: a solver for triangular linear systems with multiple righthand sides and an LU factorization algorithm. These solvers are asymtotically work efficient and they perform a factor of P 1/6 less communication than any existing algori ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We present new communicationefficient parallel dense linear solvers: a solver for triangular linear systems with multiple righthand sides and an LU factorization algorithm. These solvers are asymtotically work efficient and they perform a factor of P 1/6 less communication than any existing algorithm, where P is number of processors. In other words, these solvers are likely to run faster than any other solvers on parallel computers with a large number of processors. These provablyefficient algorithms are the main contribution of this thesis. The new solvers reduce communication at the expense of using more temporary storage. Previously, algorithms that reduce communication by using more memory were only known for matrix multiplication. These socalled threedimensional matrixmultiplication algorithms use a threedimensional grid of processors and they replicate matrices on each twodimensional "layer" of the 3D processor grid. (the processor grid can be a virtual grid embeded in any ...
unknown title
, 2006
"... Parallel tools for solving incremental dense least squares problems. Application to space geodesy. LAPACK Working Note 179 ..."
Abstract
 Add to MetaCart
(Show Context)
Parallel tools for solving incremental dense least squares problems. Application to space geodesy. LAPACK Working Note 179
Parallel
"... tools for solving incremental dense least squares problems. Application to space geodesy. ..."
Abstract
 Add to MetaCart
(Show Context)
tools for solving incremental dense least squares problems. Application to space geodesy.
LAPACK Working Note 179
, 2006
"... Parallel tools for solving incremental dense least squares ..."
(Show Context)