Results 1 - 10
of
24
Modeling Application Performance by Convolving Machine Signatures with Application Profiles
, 2001
"... This paper presents a performance modeling methodology that is faster than traditional cycle-accurate simulation, more sophisticated than performance estimation based on system peak-performance metrics, and is shown to be effective on a class of High Performance Computing benchmarks. The method ..."
Abstract
-
Cited by 34 (5 self)
- Add to MetaCart
This paper presents a performance modeling methodology that is faster than traditional cycle-accurate simulation, more sophisticated than performance estimation based on system peak-performance metrics, and is shown to be effective on a class of High Performance Computing benchmarks. The method yields insight into the factors that affect performance on single-processor and parallel computers.
A Framework for Performance Modeling and Prediction
- IN SC 2002
, 2002
"... Cycle-accurate simulation is far too slow for modeling the expected performance of full parallel applications on large HPC systems. And just running an application on a system and observing wallclock time tells you nothing about why the application performs as it does (and is anyway impossible on ..."
Abstract
-
Cited by 30 (5 self)
- Add to MetaCart
Cycle-accurate simulation is far too slow for modeling the expected performance of full parallel applications on large HPC systems. And just running an application on a system and observing wallclock time tells you nothing about why the application performs as it does (and is anyway impossible on yet-to-be-built systems). Here we present a framework for performance modeling and prediction that is faster than cycle-accurate simulation, more informative than simple benchmarking, and is shown useful for performance investigations in several dimensions.
Design and Performance of a Scalable Parallel Community Climate Model
, 1995
"... . We describe the design of a parallel global atmospheric circulation model, PCCM2. This parallel model is functionally equivalent to the National Center for Atmospheric Research's Community Climate Model, CCM2, but is structured to exploit distributed memory multicomputers. PCCM2 incorporates paral ..."
Abstract
-
Cited by 26 (13 self)
- Add to MetaCart
. We describe the design of a parallel global atmospheric circulation model, PCCM2. This parallel model is functionally equivalent to the National Center for Atmospheric Research's Community Climate Model, CCM2, but is structured to exploit distributed memory multicomputers. PCCM2 incorporates parallel spectral transform, semi-Lagrangian transport, and load balancing algorithms. We present detailed performance results on the IBM SP2 and Intel Paragon. These results provide insights into the scalability of the individual parallel algorithms and of the parallel model as a whole. 1. Introduction. Computer models of the atmospheric circulation are used both to predict tomorrow's weather and to study the mechanisms of global climate change. Over the last several years, we have studied the numerical methods, algorithms, and programming techniques required to implement these models on so-called massively parallel processing (MPP) computers: that is, computers with hundreds or thousands of pro...
A Parallel Spectral Model for Atmospheric Transport Processes
, 1995
"... This paper describes a parallel implementation of a grand challenge problem: global atmospheric modeling. The novel contributions of our work include: (1) a detailed investigation of opportunities for parallelism in atmospheric transport based on spectral solution methods, (2) the experimental evalu ..."
Abstract
-
Cited by 24 (18 self)
- Add to MetaCart
This paper describes a parallel implementation of a grand challenge problem: global atmospheric modeling. The novel contributions of our work include: (1) a detailed investigation of opportunities for parallelism in atmospheric transport based on spectral solution methods, (2) the experimental evaluation of overheads arising from load imbalances and data movement for alternative parallelization methods, and (3) the development of a parallel code that can be monitored and steered interactively based on output data visualizations and animations of program functionality or performance. Code parallelization takes advantage of the relative independence of computations at different levels in the earth's atmosphere, resulting in parallelism of up to 40 processors, each independently performing computations for different atmospheric levels and requiring few communications between different levels across model time steps. Next, additional parallelism is attained within each level by taking adva...
A 26.58 Tflops Global Atmospheric Simulation with the Spectral Transform Method on the Earth Simulator
- In Proceedings of the ACM / IEEE Supercomputing SC’2002 conference
, 2002
"... A spectral atmospheric general circulation model called AFES (AGCM for Earth Simulator) was developed and optimized for the architecture of the Earth Simulator (ES). The ES is a massively parallel vector supercomputer that consists of 640 processor nodes interconnected by a single stage crossbar n ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
A spectral atmospheric general circulation model called AFES (AGCM for Earth Simulator) was developed and optimized for the architecture of the Earth Simulator (ES). The ES is a massively parallel vector supercomputer that consists of 640 processor nodes interconnected by a single stage crossbar network with its total peak performance of 40.96 Tflops. The sustained performance of 26.58 Tflops was achieved for a high resolution simulation (T1279L96) with AFES by utilizing the full 640-node configuration of the ES. The resulting computing efficiency is 64.9% of the peak performance, well surpassing that of conventional weather/climate applications having just 25-50% efficiency even on vector parallel computers. This remarkable performance proves the effectiveness of the ES as a viable means for practical applications.
Parallel Spectral Transform Shallow Water Model: A Runtime--Tunable Parallel Benchmark Code
- Proceedings of the Scalable High Performance Computing Conference
, 1994
"... Fairness is an important issue when benchmarking parallel computers using application codes. The best parallel algorithm on one platform may not be the best on another. While it is not feasible to reevaluate parallel algorithms and reimplement large codes whenever new machines become available, it i ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
Fairness is an important issue when benchmarking parallel computers using application codes. The best parallel algorithm on one platform may not be the best on another. While it is not feasible to reevaluate parallel algorithms and reimplement large codes whenever new machines become available, it is possible to embed algorithmic options into codes that allow them to be "tuned" for a particular machine without requiring code modifications. In this paper, we describe a code in which such an approach was taken. PSTSWM was developed for evaluating parallel algorithms for the spectral transform method in atmospheric circulation models. Many levels of runtimeselectable algorithmic options are supported. We discuss these options and our evaluation methodology. We also provide empirical results from a number of parallel machines, indicating the importance of tuning for each platform before making a comparison. 1 Introduction Benchmarking parallel (and sequential) computers is a varied activi...
Opportunities and Tools for Highly Interactive Distributed and Parallel Computing
- Proceedings of the Workshop
, 1996
"... Advances in networking, visualization and parallel computing signal the end of the days of batchmode processing for computationally intensive applications. The ability to control and interact with these applications in real-time offers both opportunities and challenges. This paper examines two compu ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
Advances in networking, visualization and parallel computing signal the end of the days of batchmode processing for computationally intensive applications. The ability to control and interact with these applications in real-time offers both opportunities and challenges. This paper examines two computationally intensive scientific applications and discusses the ways in which more interactivity in their computations presents opportunities for gain. It briefly examines the requirements for systems trying to exploit these opportunities and discusses Falcon, a system that attempts to fulfill these requirements. 1 Introduction The world of computationally intensive computing is moving away from the batch-oriented style of processing. Users accustomed to spreadsheets and WYSIWYG word processing are not satisfied with the traditional hands-off, you'll-get-your-data-when-the-batch-queue-empties mode of running parallel programs. At the same time, high-speed network interfaces and the prolifera...
Optimizing Collective I/O Performance on Parallel Computers: A Multisystem Study
- In Proceedings of the 11th ACM International Conference on Supercomputing
, 1997
"... While individual parallel I/O systems can incorporate sophisticated techniques and achieve impressive performance in particular situations, researchers as yet have only limited understanding of the impact of various design decisions or of the techniques required for performance robustness. One remed ..."
Abstract
-
Cited by 12 (8 self)
- Add to MetaCart
While individual parallel I/O systems can incorporate sophisticated techniques and achieve impressive performance in particular situations, researchers as yet have only limited understanding of the impact of various design decisions or of the techniques required for performance robustness. One remedy is to perform detailed comparative studies of different I/O libraries. In this paper, we describe such a study for the Disk Resident Array and Panda libraries, both designed to support high-performance I/O for arrays. While the two systems have many similarities, their designs and implementations are based on different assumptions and target different applications. We base our study on two I/O structures commonly encountered in scientific applications: the collective read/write of an entire array and the collective read/write of an arbitrary array section. Experiments are performed on two parallel file systems (IBM PIOFS and Intel PFS) and one commodity Unix file system (AIX JFS). Our resu...
A Users' Guide To Pstswm
, 1995
"... this report, we describe how to obtain, compile, and use the code. We also discuss what is involved in porting the code to a new parallel platform. - v - 1. Introduction PSTSWM Version 4.0 is a message-passing benchmark code and parallel algorithm testbed that solves the nonlinear shallow water equ ..."
Abstract
-
Cited by 11 (7 self)
- Add to MetaCart
this report, we describe how to obtain, compile, and use the code. We also discuss what is involved in porting the code to a new parallel platform. - v - 1. Introduction PSTSWM Version 4.0 is a message-passing benchmark code and parallel algorithm testbed that solves the nonlinear shallow water equations on a rotating sphere using the spectral transform method. PSTSWM was developed to evaluate parallel algorithms for the spectral transform method as it is used in global atmospheric circulation models [6]. Multiple parallel algorithms are embedded in the code and can be selected at run-time, as can the problem size, number of processors, and data decomposition. Six different problem test cases are also supported, each with associated solution and error analysis options. The extensive selection of run-time options are included to make a fair parallel algorithm comparison tractable. On each platform, each major algorithm is first tuned to achieve optimum performance before comparing between the algorithms. Developing, validating, maintaining, and executing separate versions of the code for each variant of each parallel algorithm would have been impossible. The algorithm comparison is also sensitive to problem specifics, motivating the run-time selection of the problem size and problem test case, and to the parallel platform. To avoid maintaining significantly different versions of the code for outwardly similar parallel architectures, PSTSWM has been structured to be easily ported. PSTSWM is written in Fortran 77 with VMS extensions and a small number of C preprocessor directives. Message passing is implemented using MPI [2], PICL [8], PVM [7], or native message passing libraries, with the choice being made at compile time. Additionally, all message passing is encapsulat...
Algorithm Comparison And Benchmarking Using A Parallel Spectral Transform Shallow Water Model
"... this paper, presenting benchmark results for the Intel Paragon, the Cray T3D, and the IBM SP1. ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
this paper, presenting benchmark results for the Intel Paragon, the Cray T3D, and the IBM SP1.

