Results 1 
5 of
5
Locality Of Reference In Lu Decomposition With Partial Pivoting
 SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS
, 1997
"... This paper presents a new partitioned algorithm for LU decomposition with partial pivoting. The new algorithm, called the recursively partitioned algorithm, is based on a recursive partitioning of the matrix. The paper analyzes the locality of reference in the new algorithm and the locality of refer ..."
Abstract

Cited by 96 (10 self)
 Add to MetaCart
This paper presents a new partitioned algorithm for LU decomposition with partial pivoting. The new algorithm, called the recursively partitioned algorithm, is based on a recursive partitioning of the matrix. The paper analyzes the locality of reference in the new algorithm and the locality of reference in a known and widely used partitioned algorithm for LU decomposition called the rightlooking algorithm. The analysis reveals that the new algorithm performs a factor of $\Theta(\sqrt{M/n})$ fewer I/O operations (or cache misses) than the rightlooking algorithm, where $n$ is the order of the matrix and $M$ is the size of primary memory. The analysis also determines the optimal block size for the rightlooking algorithm. Experimental comparisons between the new algorithm and the rightlooking algorithm show that an implementation of the new algorithm outperforms a similarly coded rightlooking algorithm on six different RISC architectures, that the new algorithm performs fewer cache misses than any other algorithm tested, and that it benefits more from Strassen's matrixmultiplication algorithm.
A Survey of OutofCore Algorithms in Numerical Linear Algebra
 DIMACS SERIES IN DISCRETE MATHEMATICS AND THEORETICAL COMPUTER SCIENCE
, 1999
"... This paper surveys algorithms that efficiently solve linear equations or compute eigenvalues even when the matrices involved are too large to fit in the main memory of the computer and must be stored on disks. The paper focuses on scheduling techniques that result in mostly sequential data acces ..."
Abstract

Cited by 59 (3 self)
 Add to MetaCart
This paper surveys algorithms that efficiently solve linear equations or compute eigenvalues even when the matrices involved are too large to fit in the main memory of the computer and must be stored on disks. The paper focuses on scheduling techniques that result in mostly sequential data accesses and in data reuse, and on techniques for transforming algorithms that cannot be effectively scheduled. The survey covers outofcore algorithms for solving dense systems of linear equations, for the direct and iterative solution of sparse systems, for computing eigenvalues, for fast Fourier transforms, and for Nbody computations. The paper also discusses reasonable assumptions on memory size, approaches for the analysis of outofcore algorithms, and relationships between outofcore, cacheaware, and parallel algorithms.
Communication Lower Bounds for DistributedMemory Matrix Multiplication
, 2004
"... this paper. More speci cally, we use the de nitions of [10]: (g(n)) is the set of functions f(n) such that there exist positive constants c 1 , c2 , and n0 such that 0 c1 g(n) f(n) c2 g(n) for all n n0 ; O(g(n)) is de ned similarly using the weaker condition 0 f(n) c 2 g(n); g(n)) is de ..."
Abstract

Cited by 46 (1 self)
 Add to MetaCart
this paper. More speci cally, we use the de nitions of [10]: (g(n)) is the set of functions f(n) such that there exist positive constants c 1 , c2 , and n0 such that 0 c1 g(n) f(n) c2 g(n) for all n n0 ; O(g(n)) is de ned similarly using the weaker condition 0 f(n) c 2 g(n); g(n)) is de ned with the condition 0 c 1 g(n) f(n). The set o(g(n)) consists of functions f(n) such that for any c 2 > 0 there exists a constant n0 > 0 such that 0 f(n) c 2 g(n) for all n n0
Locality of reference in sparse Cholesky factorization methods. Submitted to the Electronic Transactions on Numerical Analysis
, 2004
"... Abstract. This paper analyzes the cache efficiency of two highperformance sparse Cholesky factorization algorithms: the multifrontal algorithm and the leftlooking algorithm. These two are essentially the only two algorithms that are used in current codes; generalizations of these algorithms are us ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract. This paper analyzes the cache efficiency of two highperformance sparse Cholesky factorization algorithms: the multifrontal algorithm and the leftlooking algorithm. These two are essentially the only two algorithms that are used in current codes; generalizations of these algorithms are used in generalsymmetric and generalunsymmetric sparse triangular factorization codes. Our theoretical analysis shows that while both algorithms sometimes enjoy a high level of data reuse in the cache, they are incomparable: there are matrices on which one is cache efficient and the other is not, and vice versa. The theoretical analysis is backed up by detailed experimental evidence, which shows that our theoretical analyses do predict cachemiss rates and performance in practice, even though the theory uses a fairly simple cache model. We also show, experimentally, that on matrices arising from finiteelement structural analysis, the leftlooking algorithm consistently outperforms the multifrontal algorithm. Direct cachemiss measurements indicate that the difference in performance is largely due to differences in the number of level2 cache misses that the two algorithms generate. Finally, we also show that there are matrices where the multifrontal algorithm may require significantly more memory than the leftlooking algorithm. On the other hand, the leftlooking algorithm never uses more memory than the multifrontal one. Key words. Cholesky factorization, sparse cholesky, multifrontal methods, cacheefficiency, locality of reference AMS subject classifications. 15A23, 65F05, 65F50, 65Y10, 65Y20 1. Introduction. In
Quantitative Performance Modeling of Scientific Computations and Creating Locality in Numerical Algorithms
, 1995
"... you design an efficient outofcore iterative algorithm? These are the two questions answered in this thesis. ..."
Abstract
 Add to MetaCart
you design an efficient outofcore iterative algorithm? These are the two questions answered in this thesis.