Results 11  20
of
882
A Scalable Linear Algebra Library for Distributed Memory Concurrent Computers
, 1992
"... This paper describes ScaLAPACK, a distributed memory version of the LAPACK software package for dense and banded matrix computations. Key design features are the use of distributed versions of the Level LAS as building blocks, and an ob ectbased interface to the library routines. The square block s ..."
Abstract

Cited by 179 (32 self)
 Add to MetaCart
This paper describes ScaLAPACK, a distributed memory version of the LAPACK software package for dense and banded matrix computations. Key design features are the use of distributed versions of the Level LAS as building blocks, and an ob ectbased interface to the library routines. The square block scattered decomposition is described. The implementation of a distributed memory version of the rightlooking LU factorization algorithm on the Intel Delta multicomputer is discussed, and performance results are presented that demonstrated the scalability of the algorithm.
An UnsymmetricPattern Multifrontal Method for Sparse LU Factorization
 SIAM J. MATRIX ANAL. APPL
, 1994
"... Sparse matrix factorization algorithms for general problems are typically characterized by irregular memory access patterns that limit their performance on parallelvector supercomputers. For symmetric problems, methods such as the multifrontal method avoid indirect addressing in the innermost loops ..."
Abstract

Cited by 151 (27 self)
 Add to MetaCart
Sparse matrix factorization algorithms for general problems are typically characterized by irregular memory access patterns that limit their performance on parallelvector supercomputers. For symmetric problems, methods such as the multifrontal method avoid indirect addressing in the innermost loops by using dense matrix kernels. However, no efficient LU factorization algorithm based primarily on dense matrix kernels exists for matrices whose pattern is very unsymmetric. We address this deficiency and present a new unsymmetricpattern multifrontal method based on dense matrix kernels. As in the classical multifrontal method, advantage is taken of repetitive structure in the matrix by factorizing more than one pivot in each frontal matrix thus enabling the use of Level 2 and Level 3 BLAS. The performance is compared with the classical multifrontal method and other unsymmetric solvers on a CRAY YMP.
An overview of the trilinos project
 ACM Transactions on Mathematical Software
"... The Trilinos Project is an effort to facilitate the design, development, integration and ongoing support of mathematical software libraries within an objectoriented framework for the solution of largescale, complex multiphysics engineering and scientific problems. Trilinos addresses two fundament ..."
Abstract

Cited by 145 (17 self)
 Add to MetaCart
(Show Context)
The Trilinos Project is an effort to facilitate the design, development, integration and ongoing support of mathematical software libraries within an objectoriented framework for the solution of largescale, complex multiphysics engineering and scientific problems. Trilinos addresses two fundamental issues of developing software for these problems: (i) Providing a streamlined process and set of tools for development of new algorithmic implementations and (ii) promoting interoperability of independently developed software. Trilinos uses a twolevel software structure designed around collections of packages. A Trilinos package is an integral unit usually developed by a small team of experts in a particular algorithms area such as algebraic preconditioners, nonlinear solvers, etc. Packages exist underneath the Trilinos top level, which provides a common lookandfeel, including configuration, documentation, licensing, and bugtracking. Here we present the overall Trilinos design, describing our use of abstract interfaces and default concrete implementations. We discuss the services that Trilinos provides to a prospective package and how these services are used by various packages. We also illustrate how packages can be combined to rapidly develop new algorithms. Finally, we discuss how Trilinos facilitates highquality software engineering practices that are increasingly required from simulation software. Sandia is a multiprogram laboratory operated by Sandia Corporation, a LockheedMartin Company, for the United States Department of Energy under Contract DEAC0494AL85000. Permission to make digital/hard copy of all or part of this material without fee for personal or classroom use provided that the copies are not made or distributed for profit or commercial advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and
Intlab  Interval Laboratory
"... . INTLAB is a Matlab toolbox supporting real and complex interval scalars, vectors, and matrices, as well as sparse real and complex interval matrices. It is designed to be very fast. In fact, it is not much slower than the fastest pure floating point algorithms using the fastest compilers available ..."
Abstract

Cited by 134 (12 self)
 Add to MetaCart
. INTLAB is a Matlab toolbox supporting real and complex interval scalars, vectors, and matrices, as well as sparse real and complex interval matrices. It is designed to be very fast. In fact, it is not much slower than the fastest pure floating point algorithms using the fastest compilers available (the latter, of course, without verification of the result). Portability is assured by implementing all algorithms in Matlab itself with exception of exactly three routines for switching the rounding downwards, upwards and to nearest. Timing comparisons show that the used concept achieves the anticipated speed with identical code on a variety of computers, ranging from PC's to parallel computers. INTLAB may be freely copied from our home page. 1. Introduction. The INTLAB concept splits into two parts. First, a new concept of a fast interval library is introduced. The main advantage (and difference to existing interval libraries) is that identical code can be used on a variety of computer a...
An Updated Set of Basic Linear Algebra Subprograms (BLAS)
 ACM Transactions on Mathematical Software
, 2001
"... This paper summarizes the BLAS Technical Forum Standard, a speci #cation of a set of kernel routines for linear algebra, historically called the Basic Linear Algebra Subprograms and commonly known as the BLAS. The complete standard can be found in #1#, and on the BLAS Technical Forum webpage #http: ..."
Abstract

Cited by 119 (7 self)
 Add to MetaCart
(Show Context)
This paper summarizes the BLAS Technical Forum Standard, a speci #cation of a set of kernel routines for linear algebra, historically called the Basic Linear Algebra Subprograms and commonly known as the BLAS. The complete standard can be found in #1#, and on the BLAS Technical Forum webpage #http:##www.netlib.org#blas#blastforum##
The Uniform Memory Hierarchy Model of Computation
 Algorithmica
, 1992
"... The Uniform Memory Hierarchy (UMH) model introduced in this paper captures performancerelevant aspects of the hierarchical nature of computer memory. It is used to quantify architectural requirements of several algorithms and to ratify the faster speeds achieved by tuned implementations that use im ..."
Abstract

Cited by 114 (9 self)
 Add to MetaCart
The Uniform Memory Hierarchy (UMH) model introduced in this paper captures performancerelevant aspects of the hierarchical nature of computer memory. It is used to quantify architectural requirements of several algorithms and to ratify the faster speeds achieved by tuned implementations that use improved datamovement strategies. A sequential computer's memory is modelled as a sequence hM 0 ; M 1 ; :::i of increasingly large memory modules. Computation takes place in M 0 . Thus, M 0 might model a computer's central processor, while M 1 might be cache memory, M 2 main memory, and so on. For each module M U , a bus B U connects it with the next larger module M U+1 . All buses may be active simultaneously. Data is transferred along a bus in fixedsized blocks. The size of these blocks, the time required to transfer a block, and the number of blocks that fit in a module are larger for modules farther from the processor. The UMH model is parameterized by the rate at which the blocksizes i...
Algorithm 887: Cholmod, supernodal sparse cholesky factorization and update/downdate
 ACM Transactions on Mathematical Software
, 2008
"... CHOLMOD is a set of routines for factorizing sparse symmetric positive definite matrices of the form A or A A T, updating/downdating a sparse Cholesky factorization, solving linear systems, updating/downdating the solution to the triangular system Lx = b, and many other sparse matrix functions for b ..."
Abstract

Cited by 109 (8 self)
 Add to MetaCart
CHOLMOD is a set of routines for factorizing sparse symmetric positive definite matrices of the form A or A A T, updating/downdating a sparse Cholesky factorization, solving linear systems, updating/downdating the solution to the triangular system Lx = b, and many other sparse matrix functions for both symmetric and unsymmetric matrices. Its supernodal Cholesky factorization relies on LAPACK and the Level3 BLAS, and obtains a substantial fraction of the peak performance of the BLAS. Both real and complex matrices are supported. CHOLMOD is written in ANSI/ISO C, with both C and MATLAB TM interfaces. It appears in MATLAB 7.2 as x=A\b when A is sparse symmetric positive definite, as well as in several other sparse matrix functions.
GEMMBased Level 3 BLAS: HighPerformance Model Implementations and Performance Evaluation Benchmark
 ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE
, 1998
"... The level 3 Basic Linear Algebra Subprograms (BLAS) are designed to perform various matrix multiply and triangular system solving computations. Due to the complex hardware organization of advanced computer architectures the development of optimal level 3 BLAS code is costly and time consuming. Howev ..."
Abstract

Cited by 107 (8 self)
 Add to MetaCart
(Show Context)
The level 3 Basic Linear Algebra Subprograms (BLAS) are designed to perform various matrix multiply and triangular system solving computations. Due to the complex hardware organization of advanced computer architectures the development of optimal level 3 BLAS code is costly and time consuming. However, it is possible to develop a portable and highperformance level 3 BLAS library mainly relying on a highly optimized GEMM, the routine for the general matrix multiply and add operation. With suitable partitioning, all the other level 3 BLAS can be defined in terms of GEMM and a small amount of level 1 and level 2 computations. Our contribution is twofold. First, the model implementations in Fortran 77 of the GEMMbased level 3 BLAS are structured to reduced effectively data traffic in a memory hierarchy. Second, the GEMMbased level 3 BLAS performance evaluation benchmark is a tool for evaluating and comparing different implementations of the level 3 BLAS with the GEMMbased model implementations.
Interpolating Implicit Surfaces From Scattered Surface Data Using Compactly Supported Radial Basis Functions
, 2001
"... We describe algebraic methods for creating implicit surfaces using linear combinations of radial basis interpolants to form complex models from scattered surface points. Shapes with arbitrary topology are easily represented without the usual interpolation or aliasing errors arising from discrete sam ..."
Abstract

Cited by 107 (4 self)
 Add to MetaCart
(Show Context)
We describe algebraic methods for creating implicit surfaces using linear combinations of radial basis interpolants to form complex models from scattered surface points. Shapes with arbitrary topology are easily represented without the usual interpolation or aliasing errors arising from discrete sampling. These methods were first applied to implicit surfaces by Savchenko, et al. and later developed independently by Turk and O'Brien as a means of performing shape interpolation. Earlier approaches were limited as a modeling mechanism because of the order of the computational complexity involved. We explore and extend these implicit interpolating methods to make them suitable for systems of large numbers of scattered surface points by using compactly supported radial basis interpolants. The use of compactly supported elements generates a sparse solution space, reducing the computational complexity and making the technique practical for large models. The local nature of compactly supported radial basis functions permits the use of computational techniques and data structures such as kd trees for spatial subdivision, promoting fast solvers and methods to divide and conquer many of the subproblems associated with these methods. Moreover, the representation of complex models permits the exploration of diverse surface geometry. This reduction in computational complexity enables the application of these methods to the study of shape properties of large complex shapes.
An annotation language for optimizing software libraries
 In Second Conference on Domain Specific Languages
, 1999
"... Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein. ..."
Abstract

Cited by 97 (15 self)
 Add to MetaCart
Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein.