Results 1 -
6 of
6
Locality Of Reference In Lu Decomposition With Partial Pivoting
- SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS
, 1997
"... This paper presents a new partitioned algorithm for LU decomposition with partial pivoting. The new algorithm, called the recursively partitioned algorithm, is based on a recursive partitioning of the matrix. The paper analyzes the locality of reference in the new algorithm and the locality of refer ..."
Abstract
-
Cited by 84 (6 self)
- Add to MetaCart
This paper presents a new partitioned algorithm for LU decomposition with partial pivoting. The new algorithm, called the recursively partitioned algorithm, is based on a recursive partitioning of the matrix. The paper analyzes the locality of reference in the new algorithm and the locality of reference in a known and widely used partitioned algorithm for LU decomposition called the right-looking algorithm. The analysis reveals that the new algorithm performs a factor of $\Theta(\sqrt{M/n})$ fewer I/O operations (or cache misses) than the right-looking algorithm, where $n$ is the order of the matrix and $M$ is the size of primary memory. The analysis also determines the optimal block size for the right-looking algorithm. Experimental comparisons between the new algorithm and the right-looking algorithm show that an implementation of the new algorithm outperforms a similarly coded right-looking algorithm on six different RISC architectures, that the new algorithm performs fewer cache misses than any other algorithm tested, and that it benefits more from Strassen's matrix-multiplication algorithm.
The Design and Implementation of SOLAR, a Portable Library for Scalable Out-of-Core Linear Algebra Computations
- WORKSHOP ON I/O IN PARALLEL AND DISTRIBUTED SYSTEMS
, 1996
"... SOLAR is a portable high-performance library for out-of-core dense matrix computations. It combines portability with high performance by using existing high-performance in-core subroutine libraries and by using an optimized matrix input-output library. SOLAR works on parallel computers, workstations ..."
Abstract
-
Cited by 61 (4 self)
- Add to MetaCart
SOLAR is a portable high-performance library for out-of-core dense matrix computations. It combines portability with high performance by using existing high-performance in-core subroutine libraries and by using an optimized matrix input-output library. SOLAR works on parallel computers, workstations, and personal computers. It supports in-core computations on both shared-memory and distributed-memory machines, and its matrix input-output library supports both conventional I/O interfaces and parallel I/O interfaces. This paper discusses the overall design of SOLAR, its interfaces, and the design of several important subroutines. Experimental results show that SOLAR can factor on a single workstation an out-of-core positive-definite symmetric matrix at a rate exceeding 215 Mflops, and an out-of-core general matrix at a rate exceeding 195 Mflops. Less than 16 % of the running time is spent on I/O in these computations. These results indicate that SOLAR's portability does not compromise its performance. We expect that the combination of portability, modularity, and the use of a high-level I/O interface will make the library an important platform for research on out-of-core algorithms and on parallel I/O.
The LINPACK benchmark: Past, present, and future. Concurrency and Computation: Practice and Experience
- Concurrency and Computation: Practice and Experience
, 2003
"... This paper describes the LINPACK Benchmark [41] and some of its variations commonly used to assess performance of computer systems. Aside from the LINPACK benchmark suite, the TOP500 [43], and the HPL [48] code are presented. The latter is frequently used to obtained results for TOP500 submissions. ..."
Abstract
-
Cited by 55 (24 self)
- Add to MetaCart
This paper describes the LINPACK Benchmark [41] and some of its variations commonly used to assess performance of computer systems. Aside from the LINPACK benchmark suite, the TOP500 [43], and the HPL [48] code are presented. The latter is frequently used to obtained results for TOP500 submissions. Information is also given on how to interpret results of the benchmark and how the results fit into performance evaluation process.
A Survey of Out-of-Core Algorithms in Numerical Linear Algebra
- DIMACS SERIES IN DISCRETE MATHEMATICS AND THEORETICAL COMPUTER SCIENCE
, 1999
"... This paper surveys algorithms that efficiently solve linear equations or compute eigenvalues even when the matrices involved are too large to fit in the main memory of the computer and must be stored on disks. The paper focuses on scheduling techniques that result in mostly sequential data acces ..."
Abstract
-
Cited by 44 (2 self)
- Add to MetaCart
This paper surveys algorithms that efficiently solve linear equations or compute eigenvalues even when the matrices involved are too large to fit in the main memory of the computer and must be stored on disks. The paper focuses on scheduling techniques that result in mostly sequential data accesses and in data reuse, and on techniques for transforming algorithms that cannot be effectively scheduled. The survey covers out-of-core algorithms for solving dense systems of linear equations, for the direct and iterative solution of sparse systems, for computing eigenvalues, for fast Fourier transforms, and for N-body computations. The paper also discusses reasonable assumptions on memory size, approaches for the analysis of out-of-core algorithms, and relationships between out-of-core, cache-aware, and parallel algorithms.
Quantitative Performance Modeling of Scientific Computations and Creating Locality in Numerical Algorithms
, 1995
"... you design an efficient out-of-core iterative algorithm? These are the two questions answered in this thesis. ..."
Abstract
- Add to MetaCart
you design an efficient out-of-core iterative algorithm? These are the two questions answered in this thesis.
Numerical Algorithms Group, Ltd. and
"... Subprograms (Level 2 BLAS). Level 2 BLAS are targeted at matrix-vector operations with the aim of providing more efficient, but portable, implementations of algorithms on high-performance com-puters. The model implementation provides a portable set of FORTRAN 77 Level 2 BLAS for machines where speci ..."
Abstract
- Add to MetaCart
Subprograms (Level 2 BLAS). Level 2 BLAS are targeted at matrix-vector operations with the aim of providing more efficient, but portable, implementations of algorithms on high-performance com-puters. The model implementation provides a portable set of FORTRAN 77 Level 2 BLAS for machines where specialized implementations do not exist or are not required. The test software aims to verify that specialized implementations meet the specification of Level 2 BLAS and that imple-mentations are correctly installed. Categories and Subject Descriptors: F.2.1 [Analysis of Algorithms and Problem Complexity]: Numerical Algorithms and Problems-compututiott-s on matrices; G.l.O [Numerical Analysis]:

