Results 1  10
of
382,557
Communicationavoiding QR decomposition for
 GPU,” GPU Technology Conference, Research Poster A01
, 2010
"... Abstract—We describe an implementation of the CommunicationAvoiding QR (CAQR) factorization that runs entirely on a single graphics processor (GPU). We show that the reduction in memory traffic provided by CAQR allows us to outperform existing parallel GPU implementations of QR for a large class of ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
Abstract—We describe an implementation of the CommunicationAvoiding QR (CAQR) factorization that runs entirely on a single graphics processor (GPU). We show that the reduction in memory traffic provided by CAQR allows us to outperform existing parallel GPU implementations of QR for a large class
CommunicationAvoiding SymmetricIndefinite Factorization
, 2013
"... Abstract. We describe and analyze a novel symmetric triangular factorization algorithm. The algorithm is essentially a block version of Aasen’s triangular tridiagonalization. It factors a dense symmetric matrix A as the product A = P LT L T P T where P is a permutation matrix, L is lower triangular, ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
, and T is block tridiagonal and banded. The algorithm is the first symmetricindefinite communicationavoiding factorization: it performs an asymptotically optimal amount of communication in a twolevel memory hierarchy for almost any cacheline size. Adaptations of the algorithm to parallel computers are likely
Parallel Numerical Linear Algebra
, 1993
"... We survey general techniques and open problems in numerical linear algebra on parallel architectures. We first discuss basic principles of parallel processing, describing the costs of basic operations on parallel machines, including general principles for constructing efficient algorithms. We illust ..."
Abstract

Cited by 766 (23 self)
 Add to MetaCart
illustrate these principles using current architectures and software systems, and by showing how one would implement matrix multiplication. Then, we present direct and iterative algorithms for solving linear systems of equations, linear least squares problems, the symmetric eigenvalue problem
GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems
 SIAM J. SCI. STAT. COMPUT
, 1986
"... We present an iterative method for solving linear systems, which has the property ofminimizing at every step the norm of the residual vector over a Krylov subspace. The algorithm is derived from the Arnoldi process for constructing an l2orthogonal basis of Krylov subspaces. It can be considered a ..."
Abstract

Cited by 2046 (40 self)
 Add to MetaCart
We present an iterative method for solving linear systems, which has the property ofminimizing at every step the norm of the residual vector over a Krylov subspace. The algorithm is derived from the Arnoldi process for constructing an l2orthogonal basis of Krylov subspaces. It can be considered
LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares
 ACM Trans. Math. Software
, 1982
"... An iterative method is given for solving Ax ~ffi b and minU Ax b 112, where the matrix A is large and sparse. The method is based on the bidiagonalization procedure of Golub and Kahan. It is analytically equivalent to the standard method of conjugate gradients, but possesses more favorable numerica ..."
Abstract

Cited by 649 (21 self)
 Add to MetaCart
gradient algorithms, indicating that I~QR is the most reliable algorithm when A is illconditioned. Categories and Subject Descriptors: G.1.2 [Numerical Analysis]: ApprorJmationleast squares approximation; G.1.3 [Numerical Analysis]: Numerical Linear Algebralinear systems (direct and
Fast Parallel Algorithms for ShortRange Molecular Dynamics
 JOURNAL OF COMPUTATIONAL PHYSICS
, 1995
"... Three parallel algorithms for classical molecular dynamics are presented. The first assigns each processor a fixed subset of atoms; the second assigns each a fixed subset of interatomic forces to compute; the third assigns each a fixed spatial region. The algorithms are suitable for molecular dyn ..."
Abstract

Cited by 622 (6 self)
 Add to MetaCart
. The algorithms are tested on a standard LennardJones benchmark problem for system sizes ranging from 500 to 100,000,000 atoms on several parallel supercomputers  the nCUBE 2, Intel iPSC/860 and Paragon, and Cray T3D. Comparing the results to the fastest reported vectorized Cray YMP and C90 algorithm shows
The nas parallel benchmarks
 The International Journal of Supercomputer Applications
, 1991
"... A new set of benchmarks has been developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of ve \parallel kernel " benchmarks and three \simulated application" benchmarks. Together they mimic the computation and data movement characterist ..."
Abstract

Cited by 686 (10 self)
 Add to MetaCart
benchmarking approaches on highly parallel systems are avoided. 1
Planning Algorithms
, 2004
"... This book presents a unified treatment of many different kinds of planning algorithms. The subject lies at the crossroads between robotics, control theory, artificial intelligence, algorithms, and computer graphics. The particular subjects covered include motion planning, discrete planning, planning ..."
Abstract

Cited by 1108 (51 self)
 Add to MetaCart
This book presents a unified treatment of many different kinds of planning algorithms. The subject lies at the crossroads between robotics, control theory, artificial intelligence, algorithms, and computer graphics. The particular subjects covered include motion planning, discrete planning
Results 1  10
of
382,557