Results 11  20
of
422
Optimizing the performance of sparse matrixvector multiplication
, 2000
"... Copyright 2000 by EunJin Im ..."
(Show Context)
A proposal for a heterogeneous cluster ScaLAPACK (dense linear solvers)
, 2001
"... In this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplication or linear system solvers, on heterogeneous networks of workstations. The uniform blockcyclic data distribution scheme commonly used for homogeneous collections of processors limits the perform ..."
Abstract

Cited by 61 (25 self)
 Add to MetaCart
(Show Context)
In this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplication or linear system solvers, on heterogeneous networks of workstations. The uniform blockcyclic data distribution scheme commonly used for homogeneous collections of processors limits the performance of these linear algebra kernels on heterogeneous grids to the speed of the slowest processor. We present and study more sophisticated data allocation strategies that balance the load on heterogeneous platforms with respect to the performance of the processors. When targeting unidimensional grids, the loadbalancing problem can be solved rather easily. When targeting twodimensional grids, which are the key to scalability and efficiency for numerical kernels, the problem turns out to be surprisingly difficult. We formally state the 2D loadbalancing problem and prove its NPcompleteness. Next, we introduce a data allocation heuristic, which turns out to be very satisfactory: Its practical usefulness is demonstrated by MPI experiments conducted with a heterogeneous network of workstations.
Active Disks  Remote Execution for NetworkAttached Storage
, 1997
"... The principal trend in the design of computer systems is the expectation of much greater computational power in future generations of microprocessors. This trend applies to embedded systems as well as host processors. As a result, devices such as storage controllers have excess capacity and growing ..."
Abstract

Cited by 57 (1 self)
 Add to MetaCart
The principal trend in the design of computer systems is the expectation of much greater computational power in future generations of microprocessors. This trend applies to embedded systems as well as host processors. As a result, devices such as storage controllers have excess capacity and growing computational capabilities. Storage system designers are exploiting this trend with higherlevel interfaces to storage and increased intelligence inside storage devices. One development in this direction is NetworkAttached Secure Disks (NASD) which attaches storage devices directly to the network and raises the storage interface above the simple (fixedsize block) memory abstraction of SCSI. This allows devices more freedom to provide efficient operations; promises more scalable subsystems by offloading file system and storage management functionality from dedicated servers; and reduces latency by executing common case requests directly at storage devices. In this paper, we push this increa...
Checkpointing strategies for parallel jobs
, 2011
"... This work provides an analysis of checkpointing strategies for minimizing expected job execution times in an environment that is subject to processor failures. In the case of both sequential and parallel jobs, we give the optimal solution for exponentially distributed failure interarrival times, wh ..."
Abstract

Cited by 54 (35 self)
 Add to MetaCart
(Show Context)
This work provides an analysis of checkpointing strategies for minimizing expected job execution times in an environment that is subject to processor failures. In the case of both sequential and parallel jobs, we give the optimal solution for exponentially distributed failure interarrival times, which, to the best of our knowledge, is the first rigorous proof that periodic checkpointing is optimal. For nonexponentially distributed failures, we develop a dynamic programming algorithm to maximize the amount of work completed before the next failure, which provides a good heuristic for minimizing the expected execution time. Our work considers various models of job parallelism and of parallel checkpointing overhead. We first perform extensive simulation experiments assuming that failures follow Exponential or Weibull distributions, the latter being more representative of realworld systems. The obtained results not only corroborate our theoretical findings, but also show that our dynamic programming algorithm significantly outperforms previously proposed solutions in the case of Weibull failures. We then discuss results from simulation experiments that use failure logs from production clusters. These results confirm that our dynamic programming algorithm significantly outperforms existing solutions for realworld clusters.
Matrix Multiplication on Heterogeneous Platforms
, 2001
"... this paper, we address the issue of implementing matrix multiplication on heterogeneous platforms. We target two different classes of heterogeneous computing resources: heterogeneous networks of workstations and collections of heterogeneous clusters. Intuitively, the problem is to load balance the ..."
Abstract

Cited by 53 (16 self)
 Add to MetaCart
this paper, we address the issue of implementing matrix multiplication on heterogeneous platforms. We target two different classes of heterogeneous computing resources: heterogeneous networks of workstations and collections of heterogeneous clusters. Intuitively, the problem is to load balance the work with different speed resources while minimizing the communication volume. We formally state this problem in a geometric framework and prove its NPcompleteness. Next, we introduce a (polynomial) columnbased heuristic, which turns out to be very satisfactory: We derive a theoretical performance guarantee for the heuristic and we assess its practical usefulness through MPI experiments
Solving Algebraic Riccati Equations on Parallel Computers Using Newton's Method with Exact Line Search
, 1999
"... We investigate the numerical solution of continuoustime algebraic Riccati equations via Newton's method on serial and parallel computers with distributed memory. We apply and extend the available theory for Newton's method endowed with exact line search to accelerate convergence. We also ..."
Abstract

Cited by 53 (9 self)
 Add to MetaCart
We investigate the numerical solution of continuoustime algebraic Riccati equations via Newton's method on serial and parallel computers with distributed memory. We apply and extend the available theory for Newton's method endowed with exact line search to accelerate convergence. We also discuss a new stopping criterion based on recent observations regarding condition and error estimates. In each iteration step of Newton's method a stable Lyapunov equation has too be solved. We propose to solve these Lyapunov equations using iterative schemes for computing the matrix sign function. This approach can be efficiently implemented on parallel computers using ScaLAPACK. Numerical experiments on an ibm sp2 multicomputer report the accuracy, scalability, and speedup of the implemented algorithms.
Making Sparse Gaussian Elimination Scalable by Static Pivoting
 In Proceedings of Supercomputing
, 1998
"... We propose several techniques as alternatives to partial pivoting to stabilize sparse Gaussian elimination. From numerical experiments we demonstrate that for a wide range of problems the new method is as stable as partial pivoting. The main advantage of the new method over partial pivoting is th ..."
Abstract

Cited by 46 (8 self)
 Add to MetaCart
(Show Context)
We propose several techniques as alternatives to partial pivoting to stabilize sparse Gaussian elimination. From numerical experiments we demonstrate that for a wide range of problems the new method is as stable as partial pivoting. The main advantage of the new method over partial pivoting is that it permits a priori determination of data structures and communication pattern for Gaussian elimination, which makes it more scalable on distributed memory machines. Based on this a priori knowledge, we design highly parallel algorithms for both sparse Gaussian elimination and triangular solve and we show that they are suitable for largescale distributed memory machines. Keywords: sparse unsymmetric linear systems, static pivoting, iterative refinement, MPI, 2D matrix decomposition. 1 Introduction In our earlier work [8, 9, 22], we developed new algorithms to solve unsymmetric sparse linear systems using Gaussian elimination with partial pivoting (GEPP). The new algorithms are hi...
Nonlinear eigenvalue problems: A challenge for modern eigenvalue methods
, 2004
"... We discuss the state of the art in numerical solution methods for large scale polynomial or rational eigenvalue problems. We present the currently available solution methods such as the JacobiDavidson, Arnoldi or the rational Krylov method and analyze their properties. We briefly introduce a new li ..."
Abstract

Cited by 45 (5 self)
 Add to MetaCart
We discuss the state of the art in numerical solution methods for large scale polynomial or rational eigenvalue problems. We present the currently available solution methods such as the JacobiDavidson, Arnoldi or the rational Krylov method and analyze their properties. We briefly introduce a new linearization technique and demonstrate how it can be used to improve structure preservation and with this the accuracy and efficiency of linearization based methods. We present several recent applications where structured and unstructured nonlinear eigenvalue problems arise and some numerical results.