Results 11  20
of
26
Parallel Extensions to the Matrix Template Library
, 1997
"... We present the preliminary design for a C++ template library to enable the compositional construction of matrix classes suitable for high performance numerical linear algebra computations. The library based on our interface definition  the Matrix Template Library (MTL)  is written in C++ an ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We present the preliminary design for a C++ template library to enable the compositional construction of matrix classes suitable for high performance numerical linear algebra computations. The library based on our interface definition  the Matrix Template Library (MTL)  is written in C++ and consists of a small number of template classes that can be composed to represent commonly used matrix formats (both sparse and dense) and parallel data distributions. A comprehensive set of generic algorithms provide high performance for all MTL matrix objects without requiring specific functionality for particular types of matrices. We present performance data to demonstrate that there is little or no performance penalty caused by the powerful MTL abstractions. 1 Introduction There is a common perception in scientific computing that abstraction is the enemy of performance. Although there is continual interest in using languages such as C or C++ and the powerful data abstractions th...
Writing Parallel Libraries with MPI Common Practice, Issues, and Extensions
"... Abstract. Modular programming is an important software design concept. We discuss principles for programming parallel libraries, show several successful library implementations, and introduce a taxonomy for existing parallel libraries. We derive common requirements that parallel libraries pose on th ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract. Modular programming is an important software design concept. We discuss principles for programming parallel libraries, show several successful library implementations, and introduce a taxonomy for existing parallel libraries. We derive common requirements that parallel libraries pose on the programming framework. We then show how those requirements are supportedin theMessage Passing Interface (MPI) standard. We also note several potential pitfalls for library implementers using MPI. Finally, we conclude with a discussion of stateofthe art of parallel library programming and we provide some guidelines for library designers. 1
An objectoriented model for developing parallel PDE software
, 1998
"... We study the application of objectoriented programming techniques in developing parallel software for the numerical solution of partial different equations. In this context, we discuss a simulatorparallel programming model that has overlapping Schwarz methods as its numerical foundation. Using obj ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We study the application of objectoriented programming techniques in developing parallel software for the numerical solution of partial different equations. In this context, we discuss a simulatorparallel programming model that has overlapping Schwarz methods as its numerical foundation. Using objectoriented programming techniques, a generic implementation framework is devised. This paper gives a detailed description of this framework and related issues. Concrete case studies show how the framework can be used to develop parallel simulators in an efficient and systematic way. Key words: partial differential equation, object orientation, overlapping domain decomposition, parallel computing. 1 Introduction and motivation In recent years, the application of objectoriented (OO) programming techniques in developing software for the numerical solution of partial differential equations (PDEs) has received increasing attention. The C++ programming language, which is widely used in devel...
Iterative Data Partitioning Scheme of Parallel PDE Solver for Heterogeneous Computing Cluster
, 2002
"... This paper presents a static load balancing scheme for a parallel PDE solver targeting heterogeneous computing clusters. The proposed scheme adopts a mathematical programming approach and optimizes the execution time of the PDE solver, considering both computation and communication time. While tradi ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
This paper presents a static load balancing scheme for a parallel PDE solver targeting heterogeneous computing clusters. The proposed scheme adopts a mathematical programming approach and optimizes the execution time of the PDE solver, considering both computation and communication time. While traditional task graph scheduling algorithms only distribute loads to processors, the proposed scheme adopts a combined approach of iterative data partitioning and load distribution to make total execution time minimal. The approximation algorithm presented here shows good accuracy and is solvable in practical time.
ShyLU: A hybrid–hybrid solver for multicore platforms
 IN PROC. OF 26TH IEEE INTL. PARALLEL AND DISTRIBUTED PROCESSING SYMP. (IPDPS’12). IEEE
, 2012
"... With the ubiquity of multicore processors, it is crucial that solvers adapt to the hierarchical structure of modern architectures. We present ShyLU, a “hybridhybrid” solver for general sparse linear systems that is hybrid in two ways: First, it combines direct and iterative methods. The iterative p ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
With the ubiquity of multicore processors, it is crucial that solvers adapt to the hierarchical structure of modern architectures. We present ShyLU, a “hybridhybrid” solver for general sparse linear systems that is hybrid in two ways: First, it combines direct and iterative methods. The iterative part is based on approximate Schur complements where we compute the approximate Schur complement using a valuebased dropping strategy or structurebased probing strategy. Second, the solver uses two levels of parallelism via hybrid programming (MPI+threads). ShyLU is useful both in sharedmemory environments and on large parallel computers with distributed memory. In the latter case, it should be used as a subdomain solver. We argue that with the increasing complexity of compute nodes, it is important to exploit multiple levels of parallelism even within a single compute node. We show the robustness of ShyLU against other algebraic preconditioners. ShyLU scales well up to 384 cores for a given problem size. We also study the MPIonly performance of ShyLU against a hybrid implementation and conclude that on present multicore nodes MPIonly implementation is better. However, for future multicore machines (96 or more cores) hybrid / hierarchical algorithms and implementations are important for sustained performance.
Parallel Continuous Optimization
, 2000
"... . Parallel continuous optimization methods are motivated here by applications in science and engineering. The key issues are addressed at different computational levels including local and global optimization as well as strategies for large, sparse versus small but expensive problems. Topics covered ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
. Parallel continuous optimization methods are motivated here by applications in science and engineering. The key issues are addressed at different computational levels including local and global optimization as well as strategies for large, sparse versus small but expensive problems. Topics covered include global optimization, direct search with and without surrogates, optimization of linked subsystems, and variable and constraint distribution. Finally, there is a discussion of future research directions. Key Words. Parallel optimization, local and global optimization, largescale optimization, direct search methods, surrogate optimization, optimization of linked subsystems, design optimization, cluster simulation, macromolecular modeling 1 Introduction Optimization has broad applications in engineering, science, and management. Many of these applications either have large numbers of variables or require expensive function evaluations. In some cases, there may be many local minimiz...
COMPUTING APPLICATIONS PARALLEL RESERVOIR SIMULATION PARALLEL COMPOSITIONAL RESERVOIR SIMULATION ON CLUSTERS OF PCS
"... The authors have ported a fully implicit equationofstate (EOS) compositional parallel reservoir simulator to run on clusters of PCs. They report on the performance of the code on two clusters, both in terms of scalability and in absolute performance relative to an IBM SP. The simulator scales well ..."
Abstract
 Add to MetaCart
The authors have ported a fully implicit equationofstate (EOS) compositional parallel reservoir simulator to run on clusters of PCs. They report on the performance of the code on two clusters, both in terms of scalability and in absolute performance relative to an IBM SP. The simulator scales well through 16 processors on the clusters and is comparable in execution time with the SP. Address reprint requests to Kamy Sepehrnoori, CPGE, UTAustin,
GENERIC PROGRAMMING FOR HIGHPERFORMANCE SCIENTIFIC COMPUTING
, 2002
"... by LieQuan Lee Generic programming is an important paradigm for software development, with an emphasis on reusability and performance, qualities that would seemingly make this paradigm especially suited for application to scientific computing. We apply generic programming to the development of a ..."
Abstract
 Add to MetaCart
by LieQuan Lee Generic programming is an important paradigm for software development, with an emphasis on reusability and performance, qualities that would seemingly make this paradigm especially suited for application to scientific computing. We apply generic programming to the development of a message passing framework (the Generic Message Passing library) for parallel computing in hybrid execution architectures (i.e., those having both shared and distributed memory). Although GMP supports both sharedmemory and distributedmemory execution, it explicitly separates its programming and execution models, presenting a uniform messagebased programming interface to enable sourcecode portability of parallel programs. At the same time, the implementation of GMP fully exploits the architectural characteristics of its execution target for maximum runtime performance. GMP is specifically designed to seamlessly integrate with modern generic C++ libraries such as the C++ Standard Library. C++ objects with complex data
Using ADIFOR and ADIC to Provide Jacobians for the SNES Component of PETSc
 Technical Memorandum ANL/MCSTM233, Mathematics and Computer Science Division, Argonne National Laboratory
, 1997
"... 1 1 Introduction 1 2 Outline of Procedure 1 3 Isolating ADFunction 3 4 Running ADIFOR/ADIC 3 5 Constructing FormJacobian 7 5.1 Jacobian : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 5.1.1 AD Setup : : : : : : : : : : : : : : : : : : : : : : : : : ..."
Abstract
 Add to MetaCart
1 1 Introduction 1 2 Outline of Procedure 1 3 Isolating ADFunction 3 4 Running ADIFOR/ADIC 3 5 Constructing FormJacobian 7 5.1 Jacobian : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 5.1.1 AD Setup : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 5.1.2 Call g$ADFunction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8 5.1.3 Transfer ADgenerated Jacobian into PETSc Data Structures : : : : : : : : : : : : : : 8 5.2 Sparse Jacobian : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9 5.2.1 AD Setup : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9 5.2.2 Call g ADFunction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9 5.2.3 Transfer ADgenerated Jacobian into PETSc Data Structures : : : : : : : : : : : : : : 9 5.3 Compressed Jacobian : : : : : : : : : : : : : : : : :...
Oslo Scientific Computing Archive
"... This report describes software tools in Diffpack that makes it easy to parallelize an existing sequential solver. The scope is limited to solvers that employ explicit finite difference methods. This class of problems allows parallellization via exact domain decomposition procedures. The main emphasi ..."
Abstract
 Add to MetaCart
This report describes software tools in Diffpack that makes it easy to parallelize an existing sequential solver. The scope is limited to solvers that employ explicit finite difference methods. This class of problems allows parallellization via exact domain decomposition procedures. The main emphasis of the report is devoted to userfriendly abstractions for communicating field values between different processes. Both standard and staggered grids can be handled. The software setup is in principle general and not limited to explicit finite difference schemes. 1 Introduction In this report we describe software tools that make it easy to take a standard sequential Diffpack simulation code and develop a version of it that can run effectively on parallel computers. The basic idea of this approach is to formulate the original mathematical problem as a set of subproblems over different parts of the domain. This strategy is usually referred to as domain decomposition. All the subproblems can ...