Results 1 
2 of
2
Locality Of Reference In Lu Decomposition With Partial Pivoting
 SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS
, 1997
"... This paper presents a new partitioned algorithm for LU decomposition with partial pivoting. The new algorithm, called the recursively partitioned algorithm, is based on a recursive partitioning of the matrix. The paper analyzes the locality of reference in the new algorithm and the locality of refer ..."
Abstract

Cited by 96 (10 self)
 Add to MetaCart
This paper presents a new partitioned algorithm for LU decomposition with partial pivoting. The new algorithm, called the recursively partitioned algorithm, is based on a recursive partitioning of the matrix. The paper analyzes the locality of reference in the new algorithm and the locality of reference in a known and widely used partitioned algorithm for LU decomposition called the rightlooking algorithm. The analysis reveals that the new algorithm performs a factor of $\Theta(\sqrt{M/n})$ fewer I/O operations (or cache misses) than the rightlooking algorithm, where $n$ is the order of the matrix and $M$ is the size of primary memory. The analysis also determines the optimal block size for the rightlooking algorithm. Experimental comparisons between the new algorithm and the rightlooking algorithm show that an implementation of the new algorithm outperforms a similarly coded rightlooking algorithm on six different RISC architectures, that the new algorithm performs fewer cache misses than any other algorithm tested, and that it benefits more from Strassen's matrixmultiplication algorithm.
Locality of reference in sparse Cholesky factorization methods. Submitted to the Electronic Transactions on Numerical Analysis
, 2004
"... Abstract. This paper analyzes the cache efficiency of two highperformance sparse Cholesky factorization algorithms: the multifrontal algorithm and the leftlooking algorithm. These two are essentially the only two algorithms that are used in current codes; generalizations of these algorithms are us ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract. This paper analyzes the cache efficiency of two highperformance sparse Cholesky factorization algorithms: the multifrontal algorithm and the leftlooking algorithm. These two are essentially the only two algorithms that are used in current codes; generalizations of these algorithms are used in generalsymmetric and generalunsymmetric sparse triangular factorization codes. Our theoretical analysis shows that while both algorithms sometimes enjoy a high level of data reuse in the cache, they are incomparable: there are matrices on which one is cache efficient and the other is not, and vice versa. The theoretical analysis is backed up by detailed experimental evidence, which shows that our theoretical analyses do predict cachemiss rates and performance in practice, even though the theory uses a fairly simple cache model. We also show, experimentally, that on matrices arising from finiteelement structural analysis, the leftlooking algorithm consistently outperforms the multifrontal algorithm. Direct cachemiss measurements indicate that the difference in performance is largely due to differences in the number of level2 cache misses that the two algorithms generate. Finally, we also show that there are matrices where the multifrontal algorithm may require significantly more memory than the leftlooking algorithm. On the other hand, the leftlooking algorithm never uses more memory than the multifrontal one. Key words. Cholesky factorization, sparse cholesky, multifrontal methods, cacheefficiency, locality of reference AMS subject classifications. 15A23, 65F05, 65F50, 65Y10, 65Y20 1. Introduction. In