• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

A Programming Model for Block-Structured Scientific Calculations on SMP Clusters (1998)

by S J Fink
Add To MetaCart

Tools

Sorted by:
Results 1 - 8 of 8

Parallelization of Structured, Hierarchical Adaptive Mesh Refinement Algorithms

by Charles A. Rendleman, Vincent E. Beckner, Mike Lijewski, William Crutchfield, John B. Bell , 1999
"... We describe an approach to parallelization of structured adaptive mesh refinement algorithms. This type of adaptive methodology is based on the use of local grids superimposed on a coarse grid to achieve sufficient resolution in the solution. The key elements of the approach to parallelization are a ..."
Abstract - Cited by 19 (5 self) - Add to MetaCart
We describe an approach to parallelization of structured adaptive mesh refinement algorithms. This type of adaptive methodology is based on the use of local grids superimposed on a coarse grid to achieve sufficient resolution in the solution. The key elements of the approach to parallelization are a dynamic load-balancing technique to distribute work to processors and a software methodology for managing data distribution and communications. The methodology is based on a message-passing model that exploits the coarse-grained parallelism inherent in the algorithms. The approach is illustrated for an adaptive algorithm for hyperbolic systems of conservation laws in three space dimensions. A numerical example computing the interaction of a shock with a helium bubble is presented. We give timings to illustrate the performance of the method.

Library Support for Hierarchical Multi-Processor Tasks

by Thomas Rauber, Gudula Rünger - In Proc. of the Supercomputing 2002 , 2002
"... The paper considers the modular programming with hierarchically structured multi-processor tasks on top of SPMD tasks for distributed memory machines. The parallel execution requires a corresponding decomposition of the set of processors into a hierarchical group structure onto which the tasks are m ..."
Abstract - Cited by 5 (2 self) - Add to MetaCart
The paper considers the modular programming with hierarchically structured multi-processor tasks on top of SPMD tasks for distributed memory machines. The parallel execution requires a corresponding decomposition of the set of processors into a hierarchical group structure onto which the tasks are mapped. This results in a multi-level group SPMD computation model with varying processor group structures. The advantage of this kind of mixed task and data parallelism is a potential to reduce the communication overhead and to increase scalability. We present a runtime library to support the coordination of hierarchically structured multi-processor tasks. The library exploits an extended parallel group SPMD programming model and manages the entire task execution including the dynamic hierarchy of processor groups. The library is built on top of MPI, has an easy-to-use interface, and leads to only a marginal overhead while allowing static planning and dynamic restructuring.

Mechanisms for Programming SMP Clusters

by Attila Gursoy, Ilker Cengiz, A. Gursoy, I. Cengiz - Proc. of Intl Conf. Parallel and Distributed Processing Techniques and Applications, PDPTA'99, Las Vegas, June 28- July 1, 1999, Vol IV , 1999
"... Cluster of symmetric multiprocessor systems (SMP Cluster) are becoming increasingly attractive for cost effective high performance computing. Besides building such platforms, providing programming mechanisms, layers of abstractions, or libraries to enable programmers to gain the power of clusters is ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
Cluster of symmetric multiprocessor systems (SMP Cluster) are becoming increasingly attractive for cost effective high performance computing. Besides building such platforms, providing programming mechanisms, layers of abstractions, or libraries to enable programmers to gain the power of clusters is another challenging field of research. In this paper, we discuss mechanisms on how to take advantage of SMP clusters in a parallel object oriented programming environment. Particularly, we discuss node level replicated parallel objects, parallel objects with a representative per SMP node, as a reusable pattern to perform a set of common collective communication/computations. Keywords: smp cluster, parallel programming 1 Introduction Symmetric Multiprocessor (SMP) platforms are going towards being a general interest in research. As workstations having multiprocessor architectures with shared-memory appear on market, it becomes attractive to build larger multiprocessor systems by connecting ...

A Cache-Friendly Liquid Load Balancer

by Federico David Sacerdoti , 2002
"... ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Abstract not found

MOLD: A System for Breaking Down Large Visualization and Post-Processing Problems

by William Kerney, William Kerney , 2002
"... xii CHAPTER I. ..."
Abstract - Add to MetaCart
xii CHAPTER I.

Table of Contents................................................................................................ iv

by Federico David Sacerdoti, Federico David Sacerdoti
"... Signature Page..................................................................................................... iii ..."
Abstract - Add to MetaCart
Signature Page..................................................................................................... iii

Optimizing MPI Collective . . .

by Matthias Kühnemann, Thomas Rauber, Gudula Rünger
"... Many parallel applications from scientific computing use MPI collective communication operations to collect or distribute data. Since the execution times of these communication operations increase with the number of participating processors, scalability problems might occur. In this article, we show ..."
Abstract - Add to MetaCart
Many parallel applications from scientific computing use MPI collective communication operations to collect or distribute data. Since the execution times of these communication operations increase with the number of participating processors, scalability problems might occur. In this article, we show for different MPI implementations how the execution time of collective communication operations can be significantly improved by a restructuring based on orthogonal processor structures with two or more levels. As platform, we consider a dual Xeon cluster, a Beowulf cluster and a Cray T3E with different MPI implementations. We show that the execution time of operations like MPI Bcast or MPI Allgather can be reduced by 40 % and 70 % on the dual Xeon cluster and the Beowulf cluster. But also on a Cray T3E a significant improvement can be obtained by a careful selection of the processor groups. We demonstrate that the optimized communication operations can be used to reduce the execution time of data parallel implementations of complex application programs without any other change of the computation and communication structure. Furthermore, we investigate how the execution time of orthogonal realization can be modeled using runtime functions. In particular, we consider the modeling of two-phase realizations of communication operations. We present runtime functions for the modeling and verify that these runtime functions can predict the execution time both for communication operations in isolation and in the context of application programs.

Improving the Execution Time of Global Communication Operations

by Matthias Kühnemann, Thomas Rauber, Gudula Rünger , 2004
"... Many parallel applications from scientific computing use MPI global communication operations to collect or distribute data. Since the execution times of these communication operations increase with the number of participating processors, scalability problems might occur. In this article, we show for ..."
Abstract - Add to MetaCart
Many parallel applications from scientific computing use MPI global communication operations to collect or distribute data. Since the execution times of these communication operations increase with the number of participating processors, scalability problems might occur. In this article, we show for different MPI implementations how the execution time of global communication operations can be significantly improved by a restructuring based on orthogonal processor structures. As platform, we consider a dual Xeon cluster, a Beowulf cluster and a Cray T3E with different MPI implementations. We show that the execution time of operations like MPI Bcast() or MPI Allgather() can be reduced by 40 % and 70 % on the dual Xeon cluster and the Beowulf cluster. But also on a Cray T3E a significant improvement can be obtained by a careful selection of the processor groups. We demonstrate that the optimized communication operations can be used to reduce the execution time of data parallel implementations of complex application programs without any other reordering of the computation and communication structure.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University