Results 1 - 10
of
17
Assessing Performance of Hybrid MPI/OpenMP Programs on SMP Clusters
, 2001
"... Computational experiences with hybrid message passing and multithreading techniques on SMP clusters generally show poorer performance than pure message passing approaches. ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Computational experiences with hybrid message passing and multithreading techniques on SMP clusters generally show poorer performance than pure message passing approaches.
Hierarchical partitioning and dynamic load balancing for scientific computation
- Williams College Department of Computer Science
, 2004
"... Abstract. Cluster and grid computing has made hierarchical and heterogeneous computing systems increasingly common as target environments for large-scale scientific computation. A cluster may consist of a network of multiprocessors. A grid computation may involve communication across slow interfaces ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
Abstract. Cluster and grid computing has made hierarchical and heterogeneous computing systems increasingly common as target environments for large-scale scientific computation. A cluster may consist of a network of multiprocessors. A grid computation may involve communication across slow interfaces. Modern supercomputers are often large clusters with hierarchical network structures. For maximum efficiency, software must adapt to the computing environment. We focus on partitioning and dynamic load balancing, in particular on hierarchical procedures implemented within the Zoltan Toolkit, guided by DRUM, the Dynamic Resource Utilization Model. Here, different balancing procedures are used in different parts of the domain. Preliminary results show benefits to using hierarchical partitionings on hierarchical systems. Modern three-dimensional scientific computations must execute in parallel to achieve acceptable performance. Target parallel environments range from clusters of workstations to the largest tightly-coupled supercomputers. Hierarchical
A Distributed Computing Environment for Interdisciplinary Applications
- Currency and Computation: Practice and Experience
, 2002
"... Practical applications are generally interdisciplinary in nature. The technology is well matured for addressing individual discipline applications and not for interdisciplinary applications. Hence, there is a need to couple the capabilities of several different computational disciplines to address t ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Practical applications are generally interdisciplinary in nature. The technology is well matured for addressing individual discipline applications and not for interdisciplinary applications. Hence, there is a need to couple the capabilities of several different computational disciplines to address these interdisciplinary practical applications. One approach is to use coupled or multi-physics software, which typically involves developing and validating the entire software spectrum for a specific application, which will be time consuming and may require more time to get to the end user. The other approach is to integrate individual well-matured computational technology disciplines software by taking advantage of the existing scalable software and validation investments, and tremendous developments in computer science and computational sciences. This integrated approach requires consistent data model, data format, data management, seamless data movement, and robust modular scalable including coupling algorithms. To address these requirements, we developed a new flexible data exchange mechanism for HPC codes and tools, known as the eXtensible Data Model and Format (XDMF). XDMF provides computational engines with the tools necessary to exist in a modern computing environment with minimal modification. Instead of imposing a new programming paradigm on HPC codes, XDMF uses the existing concept of file I/O for distributed coordination. XDMF incorporates Network Distributed Global Memory (NDGM), Hierarchical Data Format version 5 (HDF5), and eXtensible Markup Language (XML) to provide a flexible yet efficient data exchange mechanism. . This paper discusses development and implementation of distributed computing environment for interdisciplinary applications utilizing the concept...
DPS - Dynamic Parallel Schedules
- Proc. 8th Int’l Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS 2003), 17 th International Parallel and Distributed Processing Symposium (IPDPS'03
, 2003
"... Dynamic Parallel Schedules (DPS) is a high-level framework for developing parallel applications on distributed memory computers (e.g. clusters of PCs). Its model relies on compositional customizable split-compute-merge graphs of operations (directed acyclic flow graphs). The graphs and the mapping ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
Dynamic Parallel Schedules (DPS) is a high-level framework for developing parallel applications on distributed memory computers (e.g. clusters of PCs). Its model relies on compositional customizable split-compute-merge graphs of operations (directed acyclic flow graphs). The graphs and the mapping of operations to processing nodes are specified dynamically at runtime. DPS applications are pipelined and multithreaded by construction, ensuring a maximal overlap of computations and communications. DPS applications can call parallel services exposed by other DPS applications, enabling the creation of reusable parallel components. The DPS framework relies on a C++ class library. Thanks to its dynamic nature, DPS offers new perspectives for the creation and deployment of parallel applications running on server clusters.
Library Support for Hierarchical Multi-Processor Tasks
- In Proc. of the Supercomputing 2002
, 2002
"... The paper considers the modular programming with hierarchically structured multi-processor tasks on top of SPMD tasks for distributed memory machines. The parallel execution requires a corresponding decomposition of the set of processors into a hierarchical group structure onto which the tasks are m ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
The paper considers the modular programming with hierarchically structured multi-processor tasks on top of SPMD tasks for distributed memory machines. The parallel execution requires a corresponding decomposition of the set of processors into a hierarchical group structure onto which the tasks are mapped. This results in a multi-level group SPMD computation model with varying processor group structures. The advantage of this kind of mixed task and data parallelism is a potential to reduce the communication overhead and to increase scalability. We present a runtime library to support the coordination of hierarchically structured multi-processor tasks. The library exploits an extended parallel group SPMD programming model and manages the entire task execution including the dynamic hierarchy of processor groups. The library is built on top of MPI, has an easy-to-use interface, and leads to only a marginal overhead while allowing static planning and dynamic restructuring.
ORT - A Communication Library for Orthogonal Processor Groups
- In Proc. of the ACM/IEEE SC 2001
, 2001
"... Many implementations on message-passing machines can benefit from an exploitation of mixed task and data parallelism. A suitable parallel programming model is a group-SPMD model, which requires a structuring of the processors into subsets and a partition of the program into multi-processor tasks. In ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
Many implementations on message-passing machines can benefit from an exploitation of mixed task and data parallelism. A suitable parallel programming model is a group-SPMD model, which requires a structuring of the processors into subsets and a partition of the program into multi-processor tasks. In this paper, we introduce a library support for the specification of message-passing programs in a group-SPMD style allowing different partitions in a single program. We describe the functionality and the implementation of the library functions and illustrate the library programming style with example programs. The examples show that the runtime on distributed memory machines can be considerably reduced by using the library.
Approaches to architecture-aware parallel scientific computation
- Williams College Department of Computer Science
, 2005
"... Abstract. Modern large-scale scientific computation problems must execute in a parallel computational environment to achieve acceptable performance. Target parallel environments range from the largest tightly-coupled supercomputers to heterogeneous clusters of workstations. Grid technologies make In ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. Modern large-scale scientific computation problems must execute in a parallel computational environment to achieve acceptable performance. Target parallel environments range from the largest tightly-coupled supercomputers to heterogeneous clusters of workstations. Grid technologies make Internet execution more likely. Hierarchical and heterogeneous systems are increasingly common. Processing and communication capabilities can be nonuniform, non-dedicated, transient or unreliable. Even when targeting homogeneous computing environments, each environment may differ in the number of processors per node, the relative costs of computation, communication, and memory access, and the availability of programming paradigms and software tools. Architecture-aware computation requires knowledge of the computing environment and software performance characteristics, and tools to make use of this knowledge. These challenges may be addressed by compilers, low-level tools, dynamic load balancing or solution procedures, middleware layers, high-level software development techniques, and choice of programming languages and paradigms. Computation and communication may be reordered. Data or computation may be replicated or a load imbalance may be tolerated to avoid costly communication. This paper samples a variety of approaches to architecture-aware parallel computation.
A scalable parallel Poisson solver in three dimensions with infinite-domain boundary conditions
- In 7th International Workshop on High Performance Scientific and Engineering Computing (HPSEC-05
, 2005
"... We present an elliptic free space solver that offers vastly improved performance over a previous variant of the algorithm. We currently scale up to 1024 processors of an IBM SP system, and we are planning to port the solver to Blue Gene/L. The solver employs a method of local corrections that avoids ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We present an elliptic free space solver that offers vastly improved performance over a previous variant of the algorithm. We currently scale up to 1024 processors of an IBM SP system, and we are planning to port the solver to Blue Gene/L. The solver employs a method of local corrections that avoids the need for costly communication, while retaining parallel scalability of the method. Communication costs are generally small: 25 percent of the total running time or less for runs on up to 512 processors and 37 percent of the total time on 1024 processors. The numerical overheads incurred are independent of the number of processors for a wide range of problem sizes. The solver currently handles infinite-domain (free space) boundary conditions, but may be reformulated to accommodate other kinds of boundary conditions as well. 1.
SCALLOP: A Terascale Poisson Solver in Three Dimensions
, 2003
"... SCALLOP is a scalable solver and library for elliptic partial dierential equations on regular blockstructured domains. SCALLOP hides the latency of communication algorithmically by taking advantage of the locality properties of the solution. Communication costs are small, on the order of a few perce ..."
Abstract
- Add to MetaCart
SCALLOP is a scalable solver and library for elliptic partial dierential equations on regular blockstructured domains. SCALLOP hides the latency of communication algorithmically by taking advantage of the locality properties of the solution. Communication costs are small, on the order of a few percent of the total running time. Numerical overheads are independent of the number of processors for a wide range of problem sizes. SCALLOP is implicitly designed for in nite domain (free space) boundary conditions, but the algorithm can be reformulated to accommodate other boundary conditions. The SCALLOP library is built on top of the KeLP programming system and runs on a variety of platforms. We report results on up to 512 processors of NPACI's Blue Horizon and NERSC's Seaborg IBM SP systems, and demonstrate low communication overheads.

