Results 1  10
of
380
Strassen’s Matrix Multiplication on GPUs
"... Abstract—We provide efficient singleprecision and integer GPU implementations of Strassen’s algorithm as well as of Winograd’s variant. On an NVIDIA C1060 GPU, a speedup of 32 % (35%) is obtained for Strassen’s 4level implementation and 33 % (36%) for Winograd’s variant relative to the sgemm (inte ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract—We provide efficient singleprecision and integer GPU implementations of Strassen’s algorithm as well as of Winograd’s variant. On an NVIDIA C1060 GPU, a speedup of 32 % (35%) is obtained for Strassen’s 4level implementation and 33 % (36%) for Winograd’s variant relative to the sgemm
Gemmw: A Portable Level 3 Blas Winograd Variant Of Strassen's MatrixMatrix Multiply Algorithm
, 1994
"... . Matrixmatrix multiplication is normally computed using one of the BLAS or a reinvention of part of the BLAS. Unfortunately, the BLAS were designed with small matrices in mind. When huge, well conditioned matrices are multiplied together, the BLAS perform like the blahs, even on vector machines. ..."
Abstract

Cited by 39 (1 self)
 Add to MetaCart
BLAS, matrix multiplication, Winograd's variant of Strassen's algorithm, multilevel algorithms AMS(MOS) subject classification. Numerical Analysis: Numerical Linear Algebra 1. Preliminaries. Matrixmatrix multiplication is a very basic computer operation. A very clear description of how
Implementation of Strassen's Algorithm for Matrix Multiplication
 In Proceedings of Supercomputing '96
, 1996
"... In this paper we report on the development of an efficient and portable implementation of Strassen's matrix multiplication algorithm. Our implementation is designed to be used in place of DGEMM, the Level 3 BLAS matrix multiplication routine. Efficient performance will be obtained for all matri ..."
Abstract

Cited by 42 (0 self)
 Add to MetaCart
In this paper we report on the development of an efficient and portable implementation of Strassen's matrix multiplication algorithm. Our implementation is designed to be used in place of DGEMM, the Level 3 BLAS matrix multiplication routine. Efficient performance will be obtained for all
GEMMW: A PORTABLE LEVEL 3 BLAS WINOGRAD VARIANT OF STRASSEN'S MATRIX{MATRIX MULTIPLY ALGORITHM
"... Abstract. Matrix{matrix multiplication is normally computed using one of the BLAS or a reinvention of part of the BLAS. Unfortunately, the BLAS were designed with small matrices in mind. When huge, well conditioned matrices are multiplied together, the BLAS perform like the blahs, even on vector mac ..."
Abstract
 Add to MetaCart
. Level 3 BLAS, matrix multiplication, Winograd's variant of Strassen's algorithm, multilevel algorithms AMS(MOS) subject classi cations. Numerical Analysis: Numerical Linear Algebra 1. Preliminaries. Matrix{matrix multiplication is a very basic computer operation. A very clear description
Adaptive Winograd’s Matrix Multiplications
, 2008
"... Modern architectures have complex memory hierarchies and increasing parallelism (e.g., multicores). These features make achieving and maintaining good performance across rapidly changing architectures increasingly difficult. Performance has become a complex tradeoff, not just a simple matter of cou ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
of counting cost of simple CPU operations. We present a novel, hybrid, and adaptive recursive StrassenWinograd’s matrix multiplication (MM) that uses automatically tuned linear algebra software (ATLAS) or GotoBLAS. Our algorithm applies to any size and shape matrices stored in either row or column major
EXPERIMENTS WITH STRASSEN’S ALGORITHM: FROM SEQUENTIAL TO PARALLEL
"... This paper studies Strassen’s matrix multiplication algorithm by implementing it in a variety of methods: sequential, workflow, and in parallel. All the methods show better performance than the wellknown scientific libraries for medium to large size matrices. The sequential recursive program is imp ..."
Abstract
 Add to MetaCart
This paper studies Strassen’s matrix multiplication algorithm by implementing it in a variety of methods: sequential, workflow, and in parallel. All the methods show better performance than the wellknown scientific libraries for medium to large size matrices. The sequential recursive program
EXPERIMENTS WITH STRASSEN’S ALGORITHM: FROM SEQUENTIAL TO PARALLEL
"... This paper studies Strassen’s matrix multiplication algorithm by implementing it in a variety of methods: sequential, workflow, and in parallel. All the methods show better performance than the wellknown scientific libraries for medium to large size matrices. The sequential recursive program is imp ..."
Abstract
 Add to MetaCart
This paper studies Strassen’s matrix multiplication algorithm by implementing it in a variety of methods: sequential, workflow, and in parallel. All the methods show better performance than the wellknown scientific libraries for medium to large size matrices. The sequential recursive program
A Tensor Product Formulation of Strassen's Matrix Multiplication Algorithm
 Appl. Math Letters
, 1990
"... In this paper, we present a program generation strategy of Strassen's matrix multiplication algorithm using a programming methodology based on tensor product formulas. In this methodology, block recursive programs such as the fast Fourier Transforms and Strassen's matrix multiplication alg ..."
Abstract

Cited by 28 (13 self)
 Add to MetaCart
algorithm are expressed as algebraic formulas involving tensor products and other matrix operations. Such formulas can be systematically translated to highperformance parallel/vector codes for various architectures. In this paper, we present a nonrecursive implementation of Strassen's algorithm
Tuning Strassen's Matrix Multiplication for Memory Efficiency
 IN PROCEEDINGS OF SC98 (CDROM
, 1998
"... Strassen's algorithm for matrix multiplication gains its lower arithmetic complexity at the expense of reduced locality of reference, which makes it challenging to implement the algorithm efficiently on a modern machine with a hierarchical memory system. We report on an implementation of thi ..."
Abstract

Cited by 43 (5 self)
 Add to MetaCart
Strassen's algorithm for matrix multiplication gains its lower arithmetic complexity at the expense of reduced locality of reference, which makes it challenging to implement the algorithm efficiently on a modern machine with a hierarchical memory system. We report on an implementation
Results 1  10
of
380