Results 11  20
of
277
Solving Algebraic Riccati Equations on Parallel Computers Using Newton's Method with Exact Line Search
, 1999
"... We investigate the numerical solution of continuoustime algebraic Riccati equations via Newton's method on serial and parallel computers with distributed memory. We apply and extend the available theory for Newton's method endowed with exact line search to accelerate convergence. We also discuss a ..."
Abstract

Cited by 52 (7 self)
 Add to MetaCart
We investigate the numerical solution of continuoustime algebraic Riccati equations via Newton's method on serial and parallel computers with distributed memory. We apply and extend the available theory for Newton's method endowed with exact line search to accelerate convergence. We also discuss a new stopping criterion based on recent observations regarding condition and error estimates. In each iteration step of Newton's method a stable Lyapunov equation has too be solved. We propose to solve these Lyapunov equations using iterative schemes for computing the matrix sign function. This approach can be efficiently implemented on parallel computers using ScaLAPACK. Numerical experiments on an ibm sp2 multicomputer report the accuracy, scalability, and speedup of the implemented algorithms.
A proposal for a heterogeneous cluster ScaLAPACK (dense linear solvers)
, 2001
"... In this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplication or linear system solvers, on heterogeneous networks of workstations. The uniform blockcyclic data distribution scheme commonly used for homogeneous collections of processors limits the perform ..."
Abstract

Cited by 49 (24 self)
 Add to MetaCart
In this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplication or linear system solvers, on heterogeneous networks of workstations. The uniform blockcyclic data distribution scheme commonly used for homogeneous collections of processors limits the performance of these linear algebra kernels on heterogeneous grids to the speed of the slowest processor. We present and study more sophisticated data allocation strategies that balance the load on heterogeneous platforms with respect to the performance of the processors. When targeting unidimensional grids, the loadbalancing problem can be solved rather easily. When targeting twodimensional grids, which are the key to scalability and efficiency for numerical kernels, the problem turns out to be surprisingly difficult. We formally state the 2D loadbalancing problem and prove its NPcompleteness. Next, we introduce a data allocation heuristic, which turns out to be very satisfactory: Its practical usefulness is demonstrated by MPI experiments conducted with a heterogeneous network of workstations.
A Parallel Implementation of the Nonsymmetric QR Algorithm for Distributed Memory Architectures
 SIAM J. SCI. COMPUT
, 2002
"... One approach to solving the nonsymmetric eigenvalue problem in parallel is to parallelize the QR algorithm. Not long ago, this was widely considered to be a hopeless task. Recent efforts have led to significant advances, although the methods proposed up to now have suffered from scalability problems ..."
Abstract

Cited by 36 (3 self)
 Add to MetaCart
One approach to solving the nonsymmetric eigenvalue problem in parallel is to parallelize the QR algorithm. Not long ago, this was widely considered to be a hopeless task. Recent efforts have led to significant advances, although the methods proposed up to now have suffered from scalability problems. This paper discusses an approach to parallelizingthe QR algorithm that greatly improves scalability. A theoretical analysis indicates that the algorithm is ultimately not scalable, but the nonscalability does not become evident until the matrix dimension is enormous. Experiments on the Intel Paragon system, the IBM SP2 supercomputer, the SGI Origin 2000, and the Intel ASCI Option Red supercomputer are reported.
Matrix Multiplication on Heterogeneous Platforms
, 2001
"... this paper, we address the issue of implementing matrix multiplication on heterogeneous platforms. We target two different classes of heterogeneous computing resources: heterogeneous networks of workstations and collections of heterogeneous clusters. Intuitively, the problem is to load balance the ..."
Abstract

Cited by 36 (16 self)
 Add to MetaCart
this paper, we address the issue of implementing matrix multiplication on heterogeneous platforms. We target two different classes of heterogeneous computing resources: heterogeneous networks of workstations and collections of heterogeneous clusters. Intuitively, the problem is to load balance the work with different speed resources while minimizing the communication volume. We formally state this problem in a geometric framework and prove its NPcompleteness. Next, we introduce a (polynomial) columnbased heuristic, which turns out to be very satisfactory: We derive a theoretical performance guarantee for the heuristic and we assess its practical usefulness through MPI experiments
Numerical Libraries And The Grid: The GrADS Experiments With ScaLAPACK
"... This paper describes an overall framework for the design of numerical libraries on a computational Grid of processors where the processors may be geographically distributed and under the control of a Gridbased scheduling system. A set of experiments are presented in the context of solving systems o ..."
Abstract

Cited by 33 (9 self)
 Add to MetaCart
This paper describes an overall framework for the design of numerical libraries on a computational Grid of processors where the processors may be geographically distributed and under the control of a Gridbased scheduling system. A set of experiments are presented in the context of solving systems of linear equations using routines from the ScaLAPACK software collection along with various grid service components, such as Globus, NWS, and Autopilot. Motivation On The Grid The goal of the Grid Application Development Software (GrADS) project [1] is to simplify distributed heterogeneous computing in the same way that the World Wide Web simplified information sharing over the Internet. The GrADS project is exploring the scientific and technical problems that must be solved to make Grid applications development and performance tuning for real applications an everyday practice. This requires research in four key areas; each validated in a prototype infrastructure that will make programming on grids a routine task: 1. Grid software architectures that facilitate information flow and resource negotiation among applications, libraries, compilers, linkers, and runtime systems; 2. Base software technologies, such as scheduling, resource discovery, and communication, to support development and execution of performanceefficient Grid applications; 3. Languages, compilers, environments, and tools to support creation of applications for the Grid and solution of problems on the Grid; and 4. Mathematical and data structure libraries for Grid applications, including numerical methods for control of accuracy and latency tolerance.
Making Sparse Gaussian Elimination Scalable by Static Pivoting
 In Proceedings of Supercomputing
, 1998
"... We propose several techniques as alternatives to partial pivoting to stabilize sparse Gaussian elimination. From numerical experiments we demonstrate that for a wide range of problems the new method is as stable as partial pivoting. The main advantage of the new method over partial pivoting is th ..."
Abstract

Cited by 33 (8 self)
 Add to MetaCart
We propose several techniques as alternatives to partial pivoting to stabilize sparse Gaussian elimination. From numerical experiments we demonstrate that for a wide range of problems the new method is as stable as partial pivoting. The main advantage of the new method over partial pivoting is that it permits a priori determination of data structures and communication pattern for Gaussian elimination, which makes it more scalable on distributed memory machines. Based on this a priori knowledge, we design highly parallel algorithms for both sparse Gaussian elimination and triangular solve and we show that they are suitable for largescale distributed memory machines. Keywords: sparse unsymmetric linear systems, static pivoting, iterative refinement, MPI, 2D matrix decomposition. 1 Introduction In our earlier work [8, 9, 22], we developed new algorithms to solve unsymmetric sparse linear systems using Gaussian elimination with partial pivoting (GEPP). The new algorithms are hi...
Efficient Numerical Algorithms for Balanced Stochastic Truncation
, 2001
"... We propose an efficient numerical algorithm for relative error model reduction based on balanced stochastic truncation. The method uses fullrank factors of the Gramians to be balanced versus each other and exploits the fact that for largescale systems these Gramians are often of low numerical rank ..."
Abstract

Cited by 30 (2 self)
 Add to MetaCart
We propose an efficient numerical algorithm for relative error model reduction based on balanced stochastic truncation. The method uses fullrank factors of the Gramians to be balanced versus each other and exploits the fact that for largescale systems these Gramians are often of low numerical rank. We use the easytoparallelize sign function method as the major computational tool in determining these fullrank factors and demonstrate the numerical performance of the suggested implementation of balanced stochastic truncation model reduction.
Checkpointing strategies for parallel jobs
, 2011
"... This work provides an analysis of checkpointing strategies for minimizing expected job execution times in an environment that is subject to processor failures. In the case of both sequential and parallel jobs, we give the optimal solution for exponentially distributed failure interarrival times, wh ..."
Abstract

Cited by 30 (20 self)
 Add to MetaCart
This work provides an analysis of checkpointing strategies for minimizing expected job execution times in an environment that is subject to processor failures. In the case of both sequential and parallel jobs, we give the optimal solution for exponentially distributed failure interarrival times, which, to the best of our knowledge, is the first rigorous proof that periodic checkpointing is optimal. For nonexponentially distributed failures, we develop a dynamic programming algorithm to maximize the amount of work completed before the next failure, which provides a good heuristic for minimizing the expected execution time. Our work considers various models of job parallelism and of parallel checkpointing overhead. We first perform extensive simulation experiments assuming that failures follow Exponential or Weibull distributions, the latter being more representative of realworld systems. The obtained results not only corroborate our theoretical findings, but also show that our dynamic programming algorithm significantly outperforms previously proposed solutions in the case of Weibull failures. We then discuss results from simulation experiments that use failure logs from production clusters. These results confirm that our dynamic programming algorithm significantly outperforms existing solutions for realworld clusters.