Results 1 -
7 of
7
Fault-tolerant linear solvers via selective reliability
, 2014
"... Energy increasingly constrains modern computer hardware, yet protecting computations and data against errors costs energy. This holds at all scales, but especially for the largest parallel computers being built and planned today. As processor counts continue to grow, the cost of ensuring reliability ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
Energy increasingly constrains modern computer hardware, yet protecting computations and data against errors costs energy. This holds at all scales, but especially for the largest parallel computers being built and planned today. As processor counts continue to grow, the cost of ensuring reliability consistently throughout an application will become unbearable. However, many algorithms only need reliability for certain data and phases of computation. This suggests an algorithm and system codesign approach. We show that if the system lets applications apply reliability selectively, we can develop algorithms that compute the right answer despite faults. These “fault-tolerant ” iterative methods either converge eventually, at a rate that degrades gracefully with increased fault rate, or return a clear failure indication in the rare case that they cannot converge. Furthermore, they store most of their data unreliably, and spend most of their time in unreliable mode. We demonstrate this for the specific case of detected but uncorrectable mem-ory faults, which we argue are representative of all kinds of faults. We developed a cross-layer application / operating system framework that intercepts and re-ports uncorrectable memory faults to the application, rather than killing the application, as current operating systems do. The application in turn can mark memory allocations as subject to such faults. Using this framework, we wrote a fault-tolerant iterative linear solver using components from the Trilinos solvers library. Our solver exploits hybrid parallelism (MPI and threads). It performs just as well as other solvers if no faults occur, and converges where other solvers do not in the presence of faults. We show convergence results for representative test problems. Near-term future work will include performance tests.
Multi-Target Vectorization with MTPS C++ Generic Library
- in "PARA 2010 - 10th International Conference on Applied Parallel and Scientific Computing
"... Abstract. This article introduces a C++ template library dedicated at vectorizing algorithms for different target architectures: Multi-Target Parallel Skeleton (MTPS). Skeletons describing the data structures and algorithms are provided and allow MTPS to generate a code with optimized memory access ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Abstract. This article introduces a C++ template library dedicated at vectorizing algorithms for different target architectures: Multi-Target Parallel Skeleton (MTPS). Skeletons describing the data structures and algorithms are provided and allow MTPS to generate a code with optimized memory access patterns for the choosen architecture. MTPS currently supports x86-64 multicore CPUs and CUDA enabled GPUs. On these architectures, performances close to hardware limits are observed.
Report on the Workshop on Extreme-Scale Solvers: Transition to Future Architectures
, 2012
"... (Top left) Communication pattern between processors (each of which holds a column and a row communication group) in an implementation of the Lanczos algorithm for computing eigenvales of a sparse matrix; courtesy of H. M. Aktulga, Lawrence Berkeley National Laboratory. (Top middle) Communication pat ..."
Abstract
- Add to MetaCart
(Show Context)
(Top left) Communication pattern between processors (each of which holds a column and a row communication group) in an implementation of the Lanczos algorithm for computing eigenvales of a sparse matrix; courtesy of H. M. Aktulga, Lawrence Berkeley National Laboratory. (Top middle) Communication pattern for one of the coarse levels of the setup phase in an implementation of the algebraic multigrid method; courtesy of H. Gahvari and W. Gropp, Univ. of Illinois at Urbana-Champaign. (Top right) Matrix from the solution of Maxwell's equation for a highfrequency circuit using finite-element modeling. The matrix is taken
Oh, $#*@! Exascale! The Effect of Emerging Architectures on Scientific Discovery
"... Fig. 1: This 1988 rocket-sled test has nothing to do with exascale computing per se, but it makes for an effective metaphor for the “brick wall ” we anticipate our high-performance computing code to collide with. Abstract—The predictions for exascale computing are dire. Although we have benefited fr ..."
Abstract
- Add to MetaCart
(Show Context)
Fig. 1: This 1988 rocket-sled test has nothing to do with exascale computing per se, but it makes for an effective metaphor for the “brick wall ” we anticipate our high-performance computing code to collide with. Abstract—The predictions for exascale computing are dire. Although we have benefited from a consistent supercomputer architecture design, even across manufacturers, for well over a decade, recent trends indicate that future high-performance computers will have different hardware structure and programming models to which software must adapt. This paper provides an informal discussion on the ways in which changes in highperformance computing architecture will profoundly affect the scalability of our current generation of scientific visualization and analysis codes and how we must adapt our applications, workflows, and attitudes to continue our success at exascale computing. I.
A Classification of Scientific Visualization Algorithms for Massive Threading
"... As the number of cores in processors increase and accelerator architectures are becoming more common, an ever greater number of threads is required to achieve full processor utilization. Our current parallel scientific visualization codes rely on partitioning data to achieve parallel processing, but ..."
Abstract
- Add to MetaCart
(Show Context)
As the number of cores in processors increase and accelerator architectures are becoming more common, an ever greater number of threads is required to achieve full processor utilization. Our current parallel scientific visualization codes rely on partitioning data to achieve parallel processing, but this approach will not scale as we approach massive threading in which work is distributed in such a fine level that each thread is responsible for a minute portion of data. In this paper we characterize the challenges of refactoring our current visualization algorithms by considering the finest portion of work each performs and examining the domain of input data, overlaps of output domains, and interdependencies among work instances. We divide our visualization algorithms into eight categories, each containing algorithms with the same interdependencies. By focusing our research efforts to solving these categorial challenges rather than this legion of individual algorithms, we can make attainable advancement for extreme computing. 1.
Playa: High-performance programmable linear algebra
, 2012
"... This paper introduces Playa, a high-level user interface layer for composing algorithms for complex multiphysics problems out of objects from other Trilinos packages. Among other features, Playa provides very high-performance overloaded operators implemented through an expression template mechanism. ..."
Abstract
- Add to MetaCart
(Show Context)
This paper introduces Playa, a high-level user interface layer for composing algorithms for complex multiphysics problems out of objects from other Trilinos packages. Among other features, Playa provides very high-performance overloaded operators implemented through an expression template mechanism. In this paper, we give an overview of the central Playa objects from a user’s perspective, show application to a sequence of increasingly complex solver algorithms, provide timing results for Playa’s overloaded operators and other functions, and briefly survey some of the implementation issues involved. 1
Preliminary Implementation of PETSc Using
"... Abstract PETSc is a scalable solver library for the solution of algebraic equations arising from the discretization of partial differential equations and related problems. PETSc is organized as a class library with classes for vectors, sparse and dense ma-trices, Krylov methods, preconditioners, non ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract PETSc is a scalable solver library for the solution of algebraic equations arising from the discretization of partial differential equations and related problems. PETSc is organized as a class library with classes for vectors, sparse and dense ma-trices, Krylov methods, preconditioners, nonlinear solvers, and differential equation integrators. A new subclass of the vector class has been introduced that performs its operations on NVIDIA GPU processors. In addition, a new sparse matrix sub-class that performs matrix-vector products on the GPU was introduced. The Krylov methods, nonlinear solvers, and integrators in PETSc run unchanged in parallel us-ing these new subclasses. These can be used transparently from existing PETSc application codes in C, C++, Fortran, or Python. The implementation is done with the Thrust and Cusp C++ packages from NVIDIA.