Results 1 
5 of
5
AVERAGECASE STABILITY OF GAUSSIAN ELIMINATION
, 1990
"... Gaussian elimination with partial pivoting is unstable in the worst case: the "growth factor" can be as large as 2" l, where n is the matrix dimension, resulting in a loss of n bits of precision. It is proposed that an averagecase analysis can help explain why it is nevertheless stable in practice ..."
Abstract

Cited by 40 (2 self)
 Add to MetaCart
Gaussian elimination with partial pivoting is unstable in the worst case: the "growth factor" can be as large as 2" l, where n is the matrix dimension, resulting in a loss of n bits of precision. It is proposed that an averagecase analysis can help explain why it is nevertheless stable in practice. The results presented begin with the observation that for many distributions of matrices, the matrix elements after the first few steps of elimination are approximately normally distributed. From here, with the aid of estimates from extreme value statistics, reasonably accurate predictions ofthe average magnitudes of elements, pivots, multipliers, and growth factors are derived. For various distributions of matrices with dimensions n =< 1024, the average growth factor (normalized by the standard deviation of the initial matrix elements) is within a few percent of n 2/3 for partial pivoting and approximately n 1/2 for complete pivoting. The average maximum element of the residual with both kinds of pivoting appears to be of magnitude O(n), as compared with O(n /2) for QR factorization. The experiments and analysis presented show that small multipliers alone are not enough to explain the averagecase stability of Gaussian elimination; it is also important that the correction introduced in the remaining matrix at each elimination step is of rank 1. Because of this lowrank property, the signs of the elements and multipliers in Gaussian elimination are not independent, but are interrelated in such a way as to retard growth. By contrast, alternative pivoting strategies involving highrank corrections are sometimes unstable even though the multipliers are small.
A Compositional Framework for Developing Parallel Programs on Two Dimensional Arrays
, 2005
"... The METR technical reports are published as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electron ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
The METR technical reports are published as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author’s copyright. These works may not be reposted without the explicit permission of the copyright holder.
High performance dense linear system solver with soft error resilience
 in: Proceedings of the IEEE Cluster 2011, IEEE Computer
, 2011
"... In the multipetaflop era for supercomputers, the number of computing cores is growing exponentially. However, as integrated circuit technology scales below 65 nm, the critical charge required to flip a gate or a memory cell has been dangerously reduced, causing higher cosmicradiationsinduced sof ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
In the multipetaflop era for supercomputers, the number of computing cores is growing exponentially. However, as integrated circuit technology scales below 65 nm, the critical charge required to flip a gate or a memory cell has been dangerously reduced, causing higher cosmicradiationsinduced soft error rate. Soft error threatens computing system by producing silently data corruption which is hard to detect and correct. Current research of soft errors resilience for dense linear solver offers limited capability when facing large scale computing systems, and suffers from both soft error and roundoff error due to floating point arithmetic. This work proposes a fault tolerant algorithm that can recover the solution of a dense linear system Ax = b from multiple spatial and temporal soft errors. Experimental results on the Kraken Supercomputer confirm scalable performance of the proposed fault tolerance functionality and negligible overhead in solution recovery.
Finding and Exploiting Parallelism in a Production Combustion Simulation Program
, 1990
"... In pursuit of a systematic method for parallelizing large production FORTRAN codes, a parallel version of a combustion simulation was developed. The development was aided by an examination of the problem being solved, its mathematical formulation, and the computation methods employed. The ease with ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In pursuit of a systematic method for parallelizing large production FORTRAN codes, a parallel version of a combustion simulation was developed. The development was aided by an examination of the problem being solved, its mathematical formulation, and the computation methods employed. The ease with which the simulation was modified for parallel execution lends support to the hypothesis that the modular nature of production codes has important implications for parallelization. The success of the various analysis techniques and transformations lends insight to the design of automatic parallelization tools. iii ACKNOWLEDGMENTS I am indebted to a number of people for their assistance in producing this thesis. Rudi Eigenmann was instrumental in providing insight into the current state of restructuring compiler technology and the need for interprocedural analysis techniques. My analysis of the Premix benefited from Paul Petersen's powerful tools for symbolic and runtime dependence analyses...
HPP/TUD Cholesky by BLAS 1, 2 and 3 1998
"... > a column at a time, by the following algorithm. Sdot Cholesky for j = 1: n for i = 1: j \Gamma 1 r ij = (a ij \Gamma P i\Gamma1 k=1 r ki r kj )=r ii end r jj = (a jj \Gamma P j \Gamma1 k=1 r 2 kj ) 1=2 end This algorithm can variously be described as the sdot, inner product, dot produ ..."
Abstract
 Add to MetaCart
> a column at a time, by the following algorithm. Sdot Cholesky for j = 1: n for i = 1: j \Gamma 1 r ij = (a ij \Gamma P i\Gamma1 k=1 r ki r kj )=r ii end r jj = (a jj \Gamma P j \Gamma1 k=1 r 2 kj ) 1=2 end This algorithm can variously be described as the sdot, inner product, dot product, `jik', or Doolittle version of Cholesky factorization. The key point to note is that bulk of the work is in evaluating the summations in the inner loops; these are inner products and can be done with the level 1 BLAS sdot. Level 2 BLAS To derive a higher level algorithm it is natural to equate