Results 1 
8 of
8
QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment
"... Previous studies have reported that common dense linear algebra operations do not achieve speed up by using multiple geographical sites of a computational grid. Because such operations are the building blocks of most scientific applications, conventional supercomputers are still strongly predominant ..."
Abstract

Cited by 21 (7 self)
 Add to MetaCart
that trade flops for communication. In this paper, we present a new approach for computing a QR factorization – one of the main dense linear algebra kernels – of tall and skinny matrices in a grid computing environment that overcomes these two bottlenecks. Our contribution is to articulate a recently
Direct QR factorizations for tallandskinny matrices
 in MapReduce architectures, arXiv:1301.1071 [cs.DC], 2013
"... Abstract—The QR factorization and the SVD are two fundamental matrix decompositions with applications throughout scientific computing and data analysis. For matrices with many more rows than columns, socalled “tallandskinny matrices, ” there is a numerically stable, efficient, communicationavoi ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Abstract—The QR factorization and the SVD are two fundamental matrix decompositions with applications throughout scientific computing and data analysis. For matrices with many more rows than columns, socalled “tallandskinny matrices, ” there is a numerically stable, efficient, communication
Computing the R of the QR factorization of tall and skinny matrices using MPI Reduce
, 2010
"... Abstract. A QR factorization of a tall and skinny matrix with n columns can be represented as a reduction. The operation used along the reduction tree has in input two nbyn upper triangular matrices and in output an nbyn upper triangular matrix which is defined as the R factor of the two input ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract. A QR factorization of a tall and skinny matrix with n columns can be represented as a reduction. The operation used along the reduction tree has in input two nbyn upper triangular matrices and in output an nbyn upper triangular matrix which is defined as the R factor of the two input
Tall and Skinny QR Matrix Factorization Using Tile Algorithms on Multicore Architectures
"... Abstract. To exploit the potential of multicore architectures, recent dense linear algebra libraries have used tile algorithms, which consist in scheduling a Directed Acyclic Graph (DAG) of tasks of fine granularity where nodes represent tasks, either panel factorization or update of a blockcolumn, ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
column, and edges represent dependencies among them. Although past approaches already achieve high performance on moderate and large square matrices, their way of processing a panel in sequence leads to limited performance when factorizing tall and skinny matrices or small square matrices. We present a fully
Communicationavoiding QR decomposition for
 GPU,” GPU Technology Conference, Research Poster A01
, 2010
"... Abstract—We describe an implementation of the CommunicationAvoiding QR (CAQR) factorization that runs entirely on a single graphics processor (GPU). We show that the reduction in memory traffic provided by CAQR allows us to outperform existing parallel GPU implementations of QR for a large class of ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
of tallskinny matrices. Other GPU implementations of QR handle panel factorizations by either sending the work to a generalpurpose processor or using entirely bandwidthbound operations, incurring data transfer overheads. In contrast, our QR is done entirely on the GPU using computebound kernels
Scalable Tile CommunicationAvoiding QR Factorization on Multicore Cluster Systems
"... Abstract—As tile linear algebra algorithms continue achieving high performance on sharedmemory multicore architectures, it is a challenging task to make them scalable on distributedmemory multicore cluster machines. The main contribution of this paper is the extension to the distributedmemory env ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
memory environment of the previous work done by Hadri et al. on CommunicationAvoiding QR (CAQR) factorizations for tall and skinny matrices (initially done on sharedmemory multicore systems). The fine granularity of tile algorithms associated with communicationavoiding techniques for the QR factorization presents
Enhancing Parallelism of Tile QR Factorization for Multicore Architectures
"... To exploit the potential of multicore architectures, recent dense linear algebra libraries have used tile algorithms, which consist of scheduling a Directed Acyclic Graph (DAG) of fine granularity tasks where nodes represent tasks, either panel factorization or update of a blockcolumn, and edges re ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
represent dependencies among them. Although past approaches already achieve high performance on moderate and large square matrices, their way of processing a panel in sequence leads to limited performance when factorizing tall and skinny matrices or small square matrices. We present a new, fully
Tile QR Factorization with Parallel Panel Processing for Multicore Architectures
, 2009
"... To exploit the potential of multicore architectures, recent dense linear algebra libraries have used tile algorithms, which consist in scheduling a Directed Acyclic Graph (DAG) of tasks of fine granularity where nodes represent tasks, either panel factorization or update of a blockcolumn, and edges ..."
Abstract

Cited by 14 (7 self)
 Add to MetaCart
column, and edges represent dependencies among them. Although past approaches already achieve high performance on moderate and large square matrices, their way of processing a panel in sequence leads to limited performance when factorizing tall and skinny matrices or small square matrices. We present a new fully