Results 1 
6 of
6
A Rational Approach to Portable High Performance: The Basic Linear Algebra Instruction Set (BLAIS) and the Fixed Algorithm Size Template (FAST) Library
 In Proceedings of ECOOP
, 1998
"... . Weintroduce a collection of high performance kernels for basic linear algebra. The kernels encapsulate small #xed size computations in order to provide building blocks for numerical libraries in C++. The sizes are templated parameters of the kernels, so they can be easily con #gured to a spec ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
(Show Context)
. Weintroduce a collection of high performance kernels for basic linear algebra. The kernels encapsulate small #xed size computations in order to provide building blocks for numerical libraries in C++. The sizes are templated parameters of the kernels, so they can be easily con #gured to a speci#c architecture for portability. In this way the BLAIS delivers the power of such code generation systems as PHiPAC #1# and ATLAS #8#. BLAIS has a simple and elegantinterface, so that one can write #exiblesized block algorithms without the complications of a code generation system. The BLAIS are implemented on the Fixed Algorithm Size Template #FAST# Library which we also introduce in this paper. The FAST routines provide equivalent functionality to the algorithms in the Standard Template Library #7#, but are tailored speci#cally for high performance kernels. 1 Introduction The bane of portable high performance numerical linear algebra is the need to tailor key routines to speci#...
Evaluation of Programs and Parallelizing Compilers Using Dynamic Analysis Techniques
, 1993
"... The dynamic evaluation of parallelizing compilers and the programs to which they are applied is a field of abundant opportunity. Observing the dynamic behavior of a program provides insights into the structure of a computation that may be unavailable by static analysis methods. A program may be repr ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
(Show Context)
The dynamic evaluation of parallelizing compilers and the programs to which they are applied is a field of abundant opportunity. Observing the dynamic behavior of a program provides insights into the structure of a computation that may be unavailable by static analysis methods. A program may be represented by a dataflow graph generated from the dynamic flow of information between the operations in the program. The minimum parallel execution time of the program, as it is written, is the longest (critical) path through the dynamic dataflow graph. An efficient method of finding the length of the critical path is presented for several parallel execution models. The inherent parallelism is defined as the ratio of the total number of operations executed to the number of operations in the critical path. The effectiveness of a commercial parallelizing compiler is measured by comparing, for the programs in the Perfect Benchmarks, the inherent parallelism of each program, with the parallelism explicitly recognized by the compiler. The general method of critical path analysis produces results for an unlimited number of processors. Upper and lower bounds of the inherent parallelism, for the case of limited
The Matrix Template Library: A Unifying Framework for Numerical Linear Algebra
 In Parallel Object Oriented Scientific Computing. ECOOP
, 1998
"... . We present a uni#ed approach for expressing high performance numerical linear algebra routines for a class of dense and sparse matrix formats and shapes. As with the Standard Template Library #7#, we explicitly separate algorithms from data structures through the use of generic programming tec ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
(Show Context)
. We present a uni#ed approach for expressing high performance numerical linear algebra routines for a class of dense and sparse matrix formats and shapes. As with the Standard Template Library #7#, we explicitly separate algorithms from data structures through the use of generic programming techniques. We conclude that such an approach does not hinder high performance. On the contrary, writing portable high performance codes is actually enabled with such an approach because the performance critical code sections can be isolated from the algorithms and the data structures. 1 Introduction The traditional approach to writing basic linear algebra routines is a combinatorial a#air. There are typically four precision types that need to be handled #single and double precision real, single and double precision complex#, several dense storage types #general, banded, packed#, a multitude of sparse storage types #the Sparse BLAS Standard Proposal includes 13 #1##, as well as row and co...
Generic Programming for High Performance Numerical Linear Algebra
, 1998
"... We present a generic programming methodology for expressing data structures and algorithms for highperformance numerical linear algebra. As with the Standard Template Library #14#, our approach explicitly separates algorithms from data structures, allowing a single set of numerical routines to o ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
(Show Context)
We present a generic programming methodology for expressing data structures and algorithms for highperformance numerical linear algebra. As with the Standard Template Library #14#, our approach explicitly separates algorithms from data structures, allowing a single set of numerical routines to operate with a wide variety of matrix types, including sparse, dense, and banded. Through the use of C++ template programming, in conjunction with modern optimizing compilers, this generality does not come at the expense of performance. In fact, writing portable highperformance codes is actually enabled through the use of generic programming because performance critical code sections can be concentrated into a small number of basic kernels. Two libraries based on our approach are described. The Matrix Template Library #MTL# is a highperformance library providing comprehensive linear algebra functionality. The IterativeTemplate Library, based on MTL, extends the generic programming a...
Mayfly A Pattern for Lightweight Generic Interfaces
 In PLOP99
, 1999
"... The Mayfly pattern describes an implementation approach to constructing interfaces for efficient data structures. In Nature, the mayfly is a creature well known for its short life span. Similarly, for our purposes, a Mayfly is a temporary object that resides on the stack or only in registers (never ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
The Mayfly pattern describes an implementation approach to constructing interfaces for efficient data structures. In Nature, the mayfly is a creature well known for its short life span. Similarly, for our purposes, a Mayfly is a temporary object that resides on the stack or only in registers (never in the heap). All of its member functions are typically inlined and it is always passed by value. These characteristics make Mayfly objects ideal for providing lightweight interfaces to efficient arraybased data structures such as compressed matrices, graphs, heaps, and trees. The Mayfly pattern cuts across several other patterns. For instance, many of the iterators in the Standard Template Library (STL [1, 19]) are Mayflies. Among the Mayflies described in this paper will be some Adapter [3] 1 and Aggregate [3, 14] objects. 1Intent Implement lightweight interfaces for efficient data structures using small temporary objects. 2 Motivation Many times the most efficient implementa...
Performance Benchmarking of Object Oriented MPI (OOMPI) Version 1.0.2g
, 1999
"... This paper describes performance testing of an object oriented approach to the Message Passing Interface (MPI) [36,10]. Object Oriented MPI (OOMPI) is a class library specification that encapsulates the functionality of MPI into a functional class hierarchy to provide a simple, flexible, and intui ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
This paper describes performance testing of an object oriented approach to the Message Passing Interface (MPI) [36,10]. Object Oriented MPI (OOMPI) is a class library specification that encapsulates the functionality of MPI into a functional class hierarchy to provide a simple, flexible, and intuitive interface. This paper was written to validate the "thin layer" C++ library concept. The performance numbers cited in this paper show that OOMPI version 1.0.2g creates negligible overhead on top of the underlying MPI. This is because of both the thin design of OOMPI