• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 172
Next 10 →

Scalable Dense Linear Algebra on Heterogeneous Hardware

by George Bosilca, et al.
"... Design of systems exceeding 1 Pflop/s and the push toward 1 Eflop/s, forced a dramatic shift in hardware design. Various physical and engineering constraints resulted in introduction of massive parallelism and functional hybridization with the use of accelerator units. This paradigm change brings a ..."
Abstract - Add to MetaCart
common scenarios. In the context of shared-memory multicore installations, we show how high performance and scalability go hand in hand, when the well-known linear algebra algorithms are recast in terms of Direct Acyclic Graphs (DAGs), which are then transparently scheduled at runtime inside the Parallel

Benchmarking GPUs to tune dense linear algebra

by Vasily Volkov, James W. Demmel, Geforce Geforce, Geforce Geforce , 2008
"... We present performance results for dense linear algebra using recent NVIDIA GPUs. Our matrix-matrix multiply routine (GEMM) runs up to 60 % faster than the vendor’s implementation and approaches the peak of hardware capabilities. Our LU, QR and Cholesky factorizations achieve up to 80–90 % of the pe ..."
Abstract - Cited by 242 (2 self) - Add to MetaCart
We present performance results for dense linear algebra using recent NVIDIA GPUs. Our matrix-matrix multiply routine (GEMM) runs up to 60 % faster than the vendor’s implementation and approaches the peak of hardware capabilities. Our LU, QR and Cholesky factorizations achieve up to 80

NetSolve: A Network Server for Solving Computational Science Problems

by Henri Casanova, Jack Dongarra - The International Journal of Supercomputer Applications and High Performance Computing , 1995
"... This paper presents a new system, called NetSolve, that allows users to access computational resources, such as hardware and software, distributed across the network. This project has been motivated by the need for an easy-to-use, efficient mechanism for using computational resources remotely. Ease ..."
Abstract - Cited by 304 (30 self) - Add to MetaCart
on any heterogeneous network and is implemented as a fault-tolerant client-server application. Keywords Distributed System, Heterogeneity, Load Balancing, Client-Server, Fault Tolerance, Linear Algebra, Virtual Library. University of Tennessee - Technical report No cs-95-313 Department of Computer

A Scalable Linear Algebra Library for Distributed Memory Concurrent Computers

by Jaeyoung Choi, Jack J. Dongarra, Roldan Pozo, David W. Walker , 1992
"... This paper describes ScaLAPACK, a distributed memory version of the LAPACK software package for dense and banded matrix computations. Key design features are the use of distributed versions of the Level LAS as building blocks, and an ob ect-based interface to the library routines. The square block s ..."
Abstract - Cited by 176 (29 self) - Add to MetaCart
This paper describes ScaLAPACK, a distributed memory version of the LAPACK software package for dense and banded matrix computations. Key design features are the use of distributed versions of the Level LAS as building blocks, and an ob ect-based interface to the library routines. The square block

Dense Linear Algebra on Distributed Heterogeneous Hardware with a Symbolic DAG Approach

by George Bosilca, Aurelien Bouteiller, Anthony Danalis, Thomas Herault, Piotr Luszczek, Jack J. Dongarra , 2012
"... Among the various factors that drive the momentous changes occurring in the design of microprocessors and high end systems [1], three stand out as especially notable: 1. the number of transistors per chip will continue the current trend, i.e. double roughly every 18 months, while the speed of proces ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
Among the various factors that drive the momentous changes occurring in the design of microprocessors and high end systems [1], three stand out as especially notable: 1. the number of transistors per chip will continue the current trend, i.e. double roughly every 18 months, while the speed of processor clocks will cease to in-

A proposal for a heterogeneous cluster ScaLAPACK (dense linear solvers)

by Olivier Beaumont, Vincent Boudet, Antoine Petitet, Fabrice Rastello, Yves Robert , 2001
"... In this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplication or linear system solvers, on heterogeneous networks of workstations. The uniform block-cyclic data distribution scheme commonly used for homogeneous collections of processors limits the perform ..."
Abstract - Cited by 59 (24 self) - Add to MetaCart
In this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplication or linear system solvers, on heterogeneous networks of workstations. The uniform block-cyclic data distribution scheme commonly used for homogeneous collections of processors limits

Heterogenous Acceleration for Linear Algebra in Mulit-Coprocessor Environments

by Azzam Haidar, Piotr Luszczek, Stanimire Tomov, Jack Dongarra
"... We present an efficient and scalable programming model for the development of linear algebra in heterogeneous multi-coprocessor environments. The model incorporates some of the current best design and implementation practices for the heterogeneous acceleration of dense linear algebra (DLA). Exampl ..."
Abstract - Add to MetaCart
We present an efficient and scalable programming model for the development of linear algebra in heterogeneous multi-coprocessor environments. The model incorporates some of the current best design and implementation practices for the heterogeneous acceleration of dense linear algebra (DLA

ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers -- Design Issues and Performance (Technical Paper)

by J. Choi, J. Demmel, I. Dhillon, J. Dongarra, S. Ostrouchov, A. Petitet, K. Stanley, D. Walker, R. C. Whaley , 1996
"... This paper outlines the content and performance of ScaLAPACK, a collection of mathematical software for linear algebra computations on distributed memory computers. The importance of developing standards for computational and message passing interfaces is discussed. We present the different componen ..."
Abstract - Cited by 170 (56 self) - Add to MetaCart
components and building blocks of ScaLAPACK. This paper outlines the difficulties inherent in producing correct codes for networks of heterogeneous processors. We define a theoretical model of parallel computers dedicated to linear algebra applications: the Distributed Linear Algebra Machine (DLAM

Starpu: a unified platform for task scheduling on heterogeneous multicore architectures,

by Cédric Augonnet , Samuel Thibault , Raymond Namyst , Pierre-André Wacrenier - Concurrency and Computation: Practice and Experience , 2011
"... Abstract. In the field of HPC, the current hardware trend is to design multiprocessor architectures that feature heterogeneous technologies such as specialized coprocessors (e.g., Cell/BE SPUs) or data-parallel accelerators (e.g., GPGPUs). Approaching the theoretical performance of these architectu ..."
Abstract - Cited by 172 (15 self) - Add to MetaCart
Abstract. In the field of HPC, the current hardware trend is to design multiprocessor architectures that feature heterogeneous technologies such as specialized coprocessors (e.g., Cell/BE SPUs) or data-parallel accelerators (e.g., GPGPUs). Approaching the theoretical performance

Scalability Issues Affecting the Design of a Dense Linear Algebra Library

by Jack J. Dongarra, Robert A. van de Geijn, David W. Walker - JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING , 1994
"... This paper discusses the scalability of Cholesky, LU, and QR factorization routines on MIMD distributed memory concurrent computers. These routines form part of the ScaLAPACK mathematical software library that extends the widely-used LAPACK library to run efficiently on scalable concurrent computers ..."
Abstract - Cited by 29 (15 self) - Add to MetaCart
This paper discusses the scalability of Cholesky, LU, and QR factorization routines on MIMD distributed memory concurrent computers. These routines form part of the ScaLAPACK mathematical software library that extends the widely-used LAPACK library to run efficiently on scalable concurrent
Next 10 →
Results 1 - 10 of 172
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University