Results 1  10
of
15
ON THE IDEALS OF SECANT VARIETIES OF SEGRE VARIETIES
, 2003
"... We establish basic techniques for studying the ideals of secant varieties of Segre varieties. We solve a conjecture of Garcia, Stillman and Sturmfels on the generators of the ideal of the first secant variety in the case of three factors and solve the conjecture settheoretically for an arbitrary n ..."
Abstract

Cited by 66 (13 self)
 Add to MetaCart
We establish basic techniques for studying the ideals of secant varieties of Segre varieties. We solve a conjecture of Garcia, Stillman and Sturmfels on the generators of the ideal of the first secant variety in the case of three factors and solve the conjecture settheoretically for an arbitrary number of factors. We determine the low degree components of the ideals of secant varieties of small dimension in a few cases.
Geometry and the complexity of matrix multiplication
, 2007
"... Abstract. We survey results in algebraic complexity theory, focusing on matrix multiplication. Our goals are (i) to show how open questions in algebraic complexity theory are naturally posed as questions in geometry and representation theory, (ii) to motivate researchers to work on these questions, ..."
Abstract

Cited by 35 (5 self)
 Add to MetaCart
(Show Context)
Abstract. We survey results in algebraic complexity theory, focusing on matrix multiplication. Our goals are (i) to show how open questions in algebraic complexity theory are naturally posed as questions in geometry and representation theory, (ii) to motivate researchers to work on these questions, and (iii) to point out relations with more general problems in geometry. The key geometric objects for our study are the secant varieties of Segre varieties. We explain how these varieties are also useful for algebraic statistics, the study of phylogenetic invariants, and quantum computing.
Communicationoptimal parallel algorithm for Strassen’s matrix multiplication
 In Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’12
, 2012
"... Parallel matrix multiplication is one of the most studied fundamental problems in distributed and high performance computing. We obtain a new parallel algorithm that is based on Strassen’s fast matrix multiplication and minimizes communication. The algorithm outperforms all known parallel matrix mul ..."
Abstract

Cited by 28 (17 self)
 Add to MetaCart
(Show Context)
Parallel matrix multiplication is one of the most studied fundamental problems in distributed and high performance computing. We obtain a new parallel algorithm that is based on Strassen’s fast matrix multiplication and minimizes communication. The algorithm outperforms all known parallel matrix multiplication algorithms, classical and Strassenbased, both asymptotically and in practice. A critical bottleneck in parallelizing Strassen’s algorithm is the communication between the processors. Ballard, Demmel, Holtz, and Schwartz (SPAA’11) prove lower bounds on these communication costs, using expansion properties of the underlying computation graph. Our algorithm matches these lower bounds, and so is communicationoptimal. It exhibits perfect strong scaling within the maximum possible range.
A Lower Bound for Matrix Multiplication
 SIAM J. Comput
, 1988
"... We prove that computing the product of two n × n matrices over the binary field requires at least 2.5 n 2  o ( n 2 ) multiplications. Key Words : matrix multiplication, arithmetic complexity, lower bounds, linear codes. 1. INTRODUCTION Let x = ( x 1 , . . . , x n ) T and y = ( y 1 , . . . , y ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
(Show Context)
We prove that computing the product of two n × n matrices over the binary field requires at least 2.5 n 2  o ( n 2 ) multiplications. Key Words : matrix multiplication, arithmetic complexity, lower bounds, linear codes. 1. INTRODUCTION Let x = ( x 1 , . . . , x n ) T and y = ( y 1 , . . . , y m ) T be column vectors of indeterminates. A straightline algorithm for computing a set of bilinear forms in x and y is called quadratic ( respectively bilinear ), if all its nonscalar multiplication are of the shape l ( x , y ) . l ( x , y ) , (respectively l ( x ) . l ( y ) ) where l and l are linear forms of the indeterminates. 1 In this paper we establish the new 2.5 n 2  o ( n 2 ) lower bound on the multiplicative complexity of quadratic algorithms for multiplying n × n matrices over the binary field Z 2 . Let M F ( n , m , k ) and M ## F ( n , m , k ) denote the number of multiplications required to compute the product of n ×m and m ×k matrices by means of quadratic ...
Strassen’s Matrix Multiplication on GPUs
"... Abstract—We provide efficient singleprecision and integer GPU implementations of Strassen’s algorithm as well as of Winograd’s variant. On an NVIDIA C1060 GPU, a speedup of 32 % (35%) is obtained for Strassen’s 4level implementation and 33 % (36%) for Winograd’s variant relative to the sgemm (inte ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract—We provide efficient singleprecision and integer GPU implementations of Strassen’s algorithm as well as of Winograd’s variant. On an NVIDIA C1060 GPU, a speedup of 32 % (35%) is obtained for Strassen’s 4level implementation and 33 % (36%) for Winograd’s variant relative to the sgemm (integer version of sgemm) code in CUBLAS 3.0 when multiplying 16384×16384 matrices. The maximum numerical error for the singleprecision implementations is about 2 orders of magnitude higher than those for sgemm when n = 16384 and is zero for the integer implementations. KeywordsGPU; CUDA; matrix multiplication; Strassen’s algorithm; Winograd’s variant; accuracy; I.
unknown title
"... Most chapters in this handbook are concerned with various aspects and implications of linearity; Chapter 14 and this chapter are unusual in that they are about multilinearity. Just as linear operators and their coordinate representations, i.e., matrices, are the main objects of interest in other ch ..."
Abstract
 Add to MetaCart
Most chapters in this handbook are concerned with various aspects and implications of linearity; Chapter 14 and this chapter are unusual in that they are about multilinearity. Just as linear operators and their coordinate representations, i.e., matrices, are the main objects of interest in other chapters, tensors and their coordinate representations, i.e., hypermatrices, are the main objects of interest in this chapter. The parallel is summarized in the following schematic: linearity → linear operators, bilinear forms, dyads → matrices multilinearity → tensors → hypermatrices Chapter 14, or indeed the monographs on multilinear algebra such as [Gre78, Mar23, Nor84, Yok92], are about properties of a whole space of tensors. This chapter is about properties of a single tensor and its coordinate representation, a hypermatrix. The first two sections introduce (1) a hypermatrix, (2) a tensor as an element of a tensor product of vector spaces, its coordinate representation as a hypermatrix, and a tensor as a multilinear functional. The next sections discuss the various generalizations of wellknown linear algebraic and matrix theoretic notions, such as rank, norm, and determinant, to
Improving Numerical Accuracy for NonNegative Matrix Multiplication on GPUs using Recursive Algorithms
"... Scientific computing is only bound by the limits of Moore’s Law and the scalability of high performance mathematical library implementations. Most mathematical libraries however tend to focus only on general inputs, limiting their potential performance and scalability by not tailoring their implemen ..."
Abstract
 Add to MetaCart
(Show Context)
Scientific computing is only bound by the limits of Moore’s Law and the scalability of high performance mathematical library implementations. Most mathematical libraries however tend to focus only on general inputs, limiting their potential performance and scalability by not tailoring their implementation to specific inputs, such as nonnegative inputs. By removing this limitation it is possible to improve the performance and accuracy of a range of problems. In this paper we explore the limitations of hardware to improve accuracy of nonnegative matrix multiply by specifically comparing implementations on the GPU and CPU and propose algorithmic solutions to improve accuracy. Next, we demonstrate a matrix multiply implementation that takes advantage of asymptotically fast matrix multiply algorithms, which have been shown to scale better than O(N 3) matrix multiply implementations, and improve accuracy by up to a whole digit while increasing performance by up to 27 % for matrices where the input is positive. Finally, we propose to extend the BLAS level 3 specification to nonnegative matrices to allow easy integration of our solution and allow other library authors to implement their own solutions as part of an existing standard.