Results 1 - 10
of
172
Scalable Dense Linear Algebra on Heterogeneous Hardware
"... Design of systems exceeding 1 Pflop/s and the push toward 1 Eflop/s, forced a dramatic shift in hardware design. Various physical and engineering constraints resulted in introduction of massive parallelism and functional hybridization with the use of accelerator units. This paradigm change brings a ..."
Abstract
- Add to MetaCart
common scenarios. In the context of shared-memory multicore installations, we show how high performance and scalability go hand in hand, when the well-known linear algebra algorithms are recast in terms of Direct Acyclic Graphs (DAGs), which are then transparently scheduled at runtime inside the Parallel
Benchmarking GPUs to tune dense linear algebra
, 2008
"... We present performance results for dense linear algebra using recent NVIDIA GPUs. Our matrix-matrix multiply routine (GEMM) runs up to 60 % faster than the vendor’s implementation and approaches the peak of hardware capabilities. Our LU, QR and Cholesky factorizations achieve up to 80–90 % of the pe ..."
Abstract
-
Cited by 242 (2 self)
- Add to MetaCart
We present performance results for dense linear algebra using recent NVIDIA GPUs. Our matrix-matrix multiply routine (GEMM) runs up to 60 % faster than the vendor’s implementation and approaches the peak of hardware capabilities. Our LU, QR and Cholesky factorizations achieve up to 80
NetSolve: A Network Server for Solving Computational Science Problems
- The International Journal of Supercomputer Applications and High Performance Computing
, 1995
"... This paper presents a new system, called NetSolve, that allows users to access computational resources, such as hardware and software, distributed across the network. This project has been motivated by the need for an easy-to-use, efficient mechanism for using computational resources remotely. Ease ..."
Abstract
-
Cited by 304 (30 self)
- Add to MetaCart
on any heterogeneous network and is implemented as a fault-tolerant client-server application. Keywords Distributed System, Heterogeneity, Load Balancing, Client-Server, Fault Tolerance, Linear Algebra, Virtual Library. University of Tennessee - Technical report No cs-95-313 Department of Computer
A Scalable Linear Algebra Library for Distributed Memory Concurrent Computers
, 1992
"... This paper describes ScaLAPACK, a distributed memory version of the LAPACK software package for dense and banded matrix computations. Key design features are the use of distributed versions of the Level LAS as building blocks, and an ob ect-based interface to the library routines. The square block s ..."
Abstract
-
Cited by 176 (29 self)
- Add to MetaCart
This paper describes ScaLAPACK, a distributed memory version of the LAPACK software package for dense and banded matrix computations. Key design features are the use of distributed versions of the Level LAS as building blocks, and an ob ect-based interface to the library routines. The square block
Dense Linear Algebra on Distributed Heterogeneous Hardware with a Symbolic DAG Approach
, 2012
"... Among the various factors that drive the momentous changes occurring in the design of microprocessors and high end systems [1], three stand out as especially notable: 1. the number of transistors per chip will continue the current trend, i.e. double roughly every 18 months, while the speed of proces ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Among the various factors that drive the momentous changes occurring in the design of microprocessors and high end systems [1], three stand out as especially notable: 1. the number of transistors per chip will continue the current trend, i.e. double roughly every 18 months, while the speed of processor clocks will cease to in-
A proposal for a heterogeneous cluster ScaLAPACK (dense linear solvers)
, 2001
"... In this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplication or linear system solvers, on heterogeneous networks of workstations. The uniform block-cyclic data distribution scheme commonly used for homogeneous collections of processors limits the perform ..."
Abstract
-
Cited by 59 (24 self)
- Add to MetaCart
In this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplication or linear system solvers, on heterogeneous networks of workstations. The uniform block-cyclic data distribution scheme commonly used for homogeneous collections of processors limits
Heterogenous Acceleration for Linear Algebra in Mulit-Coprocessor Environments
"... We present an efficient and scalable programming model for the development of linear algebra in heterogeneous multi-coprocessor environments. The model incorporates some of the current best design and implementation practices for the heterogeneous acceleration of dense linear algebra (DLA). Exampl ..."
Abstract
- Add to MetaCart
We present an efficient and scalable programming model for the development of linear algebra in heterogeneous multi-coprocessor environments. The model incorporates some of the current best design and implementation practices for the heterogeneous acceleration of dense linear algebra (DLA
ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers -- Design Issues and Performance (Technical Paper)
, 1996
"... This paper outlines the content and performance of ScaLAPACK, a collection of mathematical software for linear algebra computations on distributed memory computers. The importance of developing standards for computational and message passing interfaces is discussed. We present the different componen ..."
Abstract
-
Cited by 170 (56 self)
- Add to MetaCart
components and building blocks of ScaLAPACK. This paper outlines the difficulties inherent in producing correct codes for networks of heterogeneous processors. We define a theoretical model of parallel computers dedicated to linear algebra applications: the Distributed Linear Algebra Machine (DLAM
Starpu: a unified platform for task scheduling on heterogeneous multicore architectures,
- Concurrency and Computation: Practice and Experience
, 2011
"... Abstract. In the field of HPC, the current hardware trend is to design multiprocessor architectures that feature heterogeneous technologies such as specialized coprocessors (e.g., Cell/BE SPUs) or data-parallel accelerators (e.g., GPGPUs). Approaching the theoretical performance of these architectu ..."
Abstract
-
Cited by 172 (15 self)
- Add to MetaCart
Abstract. In the field of HPC, the current hardware trend is to design multiprocessor architectures that feature heterogeneous technologies such as specialized coprocessors (e.g., Cell/BE SPUs) or data-parallel accelerators (e.g., GPGPUs). Approaching the theoretical performance
Scalability Issues Affecting the Design of a Dense Linear Algebra Library
- JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1994
"... This paper discusses the scalability of Cholesky, LU, and QR factorization routines on MIMD distributed memory concurrent computers. These routines form part of the ScaLAPACK mathematical software library that extends the widely-used LAPACK library to run efficiently on scalable concurrent computers ..."
Abstract
-
Cited by 29 (15 self)
- Add to MetaCart
This paper discusses the scalability of Cholesky, LU, and QR factorization routines on MIMD distributed memory concurrent computers. These routines form part of the ScaLAPACK mathematical software library that extends the widely-used LAPACK library to run efficiently on scalable concurrent
Results 1 - 10
of
172