• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

ªGEMM-Based Level 3 BLAS: High-Performance Model Implementations and Performance Evaluation Benchmark,º (1995)

by B KaÊgstroÈm, P Ling, C van Loan
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 56
Next 10 →

Automated empirical optimizations of software and the ATLAS project

by R. Clint Whaley, Antoine Petitet, Jackj Dongarra - Parallel Computing , 2001
"... This paper describes the automatically tuned linear algebra software �ATLAS) project, as well as the fundamental principles that underly it. ATLAS is an instantiation of a new paradigm in high performance library production and maintenance, which we term automated empirical optimization of software ..."
Abstract - Cited by 233 (31 self) - Add to MetaCart
This paper describes the automatically tuned linear algebra software �ATLAS) project, as well as the fundamental principles that underly it. ATLAS is an instantiation of a new paradigm in high performance library production and maintenance, which we term automated empirical optimization of software �AEOS); this style of library management has been created in order to allow software to keep pace with the incredible rate of hardware advancement inherent in Moore's Law. ATLAS is the application of this new paradigm to linear algebra software, with the present emphasis on the basic linear algebra subprograms �BLAS), a widely used, performance-critical,

CSDP, a C library for semidefinite programming.

by Brian Borchers , 1997
"... this paper is organized as follows. First, we discuss the formulation of the semidefinite programming problem used by CSDP. We then describe the predictor corrector algorithm used by CSDP to solve the SDP. We discuss the storage requirements of the algorithm as well as its computational complexity. ..."
Abstract - Cited by 104 (1 self) - Add to MetaCart
this paper is organized as follows. First, we discuss the formulation of the semidefinite programming problem used by CSDP. We then describe the predictor corrector algorithm used by CSDP to solve the SDP. We discuss the storage requirements of the algorithm as well as its computational complexity. Finally, we present results from the solution of a number of test problems. 2 The SDP Problem We consider semidefinite programming problems of the form max tr (CX)

Self adapting linear algebra algorithms and software

by Jim Demmel, Jack Dongarra, Victor Eijkhout, Erika Fuentes, Antoine Petitet, Rich Vuduc, R. Clint Whaley, Katherine Yelick - Proceedings of the IEEE , 2005
"... One of the main obstacles to the efficient solution of scientific problems is the problem of tuning software, both to the available architecture and to the user problem at hand. We describe approaches for obtaining tuned high-performance kernels, and for automatically choosing suitable algorithms. S ..."
Abstract - Cited by 65 (19 self) - Add to MetaCart
One of the main obstacles to the efficient solution of scientific problems is the problem of tuning software, both to the available architecture and to the user problem at hand. We describe approaches for obtaining tuned high-performance kernels, and for automatically choosing suitable algorithms. Specifically, we describe the generation of dense and sparse blas kernels, and the selection of linear solver algorithms. However, the ideas presented here extend beyond these areas, which can be considered proof of concept. 1

ScaLAPACK: A Linear Algebra Library for Message-Passing Computers

by L.S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, R.C. Whaley - In SIAM Conference on Parallel Processing , 1997
"... This article outlines the content and performance of some of the ScaLAPACK software. ScaLAPACK is a collection of mathematical software for linear algebra computations on distributed-memory computers. The importance of developing standards for computational and message-passing interfaces is discusse ..."
Abstract - Cited by 26 (4 self) - Add to MetaCart
This article outlines the content and performance of some of the ScaLAPACK software. ScaLAPACK is a collection of mathematical software for linear algebra computations on distributed-memory computers. The importance of developing standards for computational and message-passing interfaces is discussed. We present the different components and building blocks of ScaLAPACK and provide initial performance results for selected PBLAS routines and a subset of ScaLAPACK driver routines.

Algorithmic redistribution methods for block cyclic decompositions

by Antoine Petitet - IEEE Trans. on PDS , 1996
"... ii To my parents iii Acknowledgments The writer expresses gratitude and appreciation to the members of his disser-tation committee, Michael Berry, Charles Collins, Jack Dongarra, Mark Jones and David Walker for their encouragement and participation throughout my doctoral experience. Special apprecia ..."
Abstract - Cited by 22 (2 self) - Add to MetaCart
ii To my parents iii Acknowledgments The writer expresses gratitude and appreciation to the members of his disser-tation committee, Michael Berry, Charles Collins, Jack Dongarra, Mark Jones and David Walker for their encouragement and participation throughout my doctoral experience. Special appreciation is due to Professor Jack Dongarra, Chairman, who pro-vided sound guidance, support and appropriate commentaries during the course of my graduate study. I also would like to thank Yves Robert and R. Clint Whaley for many useful and instructive discussions on general parallel algorithms and message passing software libraries. Many valuable comments for improving the presentation of this document were received from L. Susan Blackford. Finally, I am grateful to the Department of Computer Science at the University ofTennessee for allowing me to do this doctoral research work here. A special debt of gratitude is owed to Joanne Martin, IBM POWERparallel Division, for awarding me an IBM Corporation Fellowship covering the tuition as well as a stipend for the 1994-96 academic years. This work was also supported

A recursive formulation of Cholesky factorization of a matrix in packed storage

by Bjarne S. Andersen, Fred G. Gustavson, Jerzy Wasniewski , 2001
"... A new compact way to store a symmetric or triangular matrix called RPF for Recursive Packed Format is fully described. Novel ways to transform RPF to and from standard packed format is included. A new algorithm, called RPC for Recursive Packed Cholesky that operates on the RPF format is presente ..."
Abstract - Cited by 19 (2 self) - Add to MetaCart
A new compact way to store a symmetric or triangular matrix called RPF for Recursive Packed Format is fully described. Novel ways to transform RPF to and from standard packed format is included. A new algorithm, called RPC for Recursive Packed Cholesky that operates on the RPF format is presented. Algorithm RPC is level 3 BLAS based and require algorithms TRSM and SYRK that work on RPF. We thus introduce and fully describe novel recursive algorithms RP TRSM and RP SYRK that the RPC algorithm requires. It turns out, that both RP TRSM and RP SYRK only call GEMM. Hence RPC mostly calls GEMM during execution. The advantage of this storage scheme compared to traditional packed storage is demonstrated. First, both storage schemes use the minimal amount of storage for the symmetric or triangular matrix. Second, RPC gives a level 3 implementation of Cholesky factorization that only requires standard full format GEMM whereas standard packed implementations are only level 2. Hence...

MULTISHIFT VARIANTS OF THE QZ ALGORITHM WITH Aggressive Early Deflation

by Bo Kågström, Daniel Kressner
"... New variants of the QZ algorithm for solving the generalized eigenvalue problem are proposed. An extension of the small-bulge multishift QR algorithm is developed, which chases chains of many small bulges instead of only one bulge in each QZ iteration. This allows the effective use of level 3 BLAS o ..."
Abstract - Cited by 16 (11 self) - Add to MetaCart
New variants of the QZ algorithm for solving the generalized eigenvalue problem are proposed. An extension of the small-bulge multishift QR algorithm is developed, which chases chains of many small bulges instead of only one bulge in each QZ iteration. This allows the effective use of level 3 BLAS operations, which in turn can provide efficient utilization of high performance computing systems with deep memory hierarchies. Moreover, an extension of the aggressive early deflation strategy is proposed, which can identify and deflate converged eigenvalues long before classic deflation strategies would. Consequently, the number of overall QZ iterations needed until convergence is considerably reduced. As a third ingredient, we reconsider the deflation of infinite eigenvalues and present a new deflation algorithm, which is particularly effective in the presence of a large number of infinite eigenvalues. Combining all these developments, our implementation significantly improves existing implementations of the QZ algorithm. This is demonstrated by numerical experiments with random matrix pairs as well as with matrix pairs arising from various applications.

Execution Time of Symmetric Eigensolvers

by Kendall Swenson Stanley , 1997
"... ..."
Abstract - Cited by 10 (1 self) - Add to MetaCart
Abstract not found

Anatomy of high-performance matrix multiplication

by Kazushige Goto, Robert A. Van De Geijn - ACM Transactions on Mathematical Software , 2008
"... We present the basic principles that underlie the high-performance implementation of the matrixmatrix multiplication that is part of the widely used GotoBLAS library. Design decisions are justified by successively refining a model of architectures with multilevel memories. A simple but effective alg ..."
Abstract - Cited by 10 (2 self) - Add to MetaCart
We present the basic principles that underlie the high-performance implementation of the matrixmatrix multiplication that is part of the widely used GotoBLAS library. Design decisions are justified by successively refining a model of architectures with multilevel memories. A simple but effective algorithm for executing this operation results. Implementations on a broad selection of architectures are shown to achieve near-peak performance.

Parallel ScaLAPACK-style Algorithms for Solving Continuous-Time Sylvester Equations

by Robert Granat, Peter Poromaa - In Euro-Par 2003 Parallel Processing, H. Kosch and et al, Eds. Lecture Notes in Computer Science , 2003
"... Abstract. An implementation of a parallel ScaLAPACK-style solver for the general Sylvester equation, op(A)X − Xop(B) = C, where op(A) denotes A or its transpose A T, is presented. The parallel algorithm is based on explicit blocking of the Bartels-Stewart method. An initial transformation of the co ..."
Abstract - Cited by 8 (7 self) - Add to MetaCart
Abstract. An implementation of a parallel ScaLAPACK-style solver for the general Sylvester equation, op(A)X − Xop(B) = C, where op(A) denotes A or its transpose A T, is presented. The parallel algorithm is based on explicit blocking of the Bartels-Stewart method. An initial transformation of the coefficient matrices A and B to Schur form leads to a reduced triangular matrix equation. We use different matrix traversing strategies to handle the transposes in the problem to solve, leading to different new parallel wave-front algorithms. We also present a strategy to handle the problem when 2 x 2 diagonal blocks of the matrices in Schur form, corresponding to complex conjugate pairs of eigenvalues, are split between several blocks in the block partitioned matrices. Finally, the solution of the reduced matrix equation is transformed back to the originally coordinate system. The implementation acts in a ScaLAPACK environment using 2-dimensional block cyclic mapping of the matrices onto a rectangular grid of processes. Real performance results are presented which verify that our parallel algorithms are reliable and scalable. Keywords: Sylvester matrix equation, continuous-time, Bartels–Stewart
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University