• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Performance characteristics of a network of commodity multiprocessors for the NAS benchmarks using a hybrid memory model (1999)

by F Cappello, O Richard
Venue:In IFIP PACT 99
Add To MetaCart

Tools

Sorted by:
Results 1 - 8 of 8

Performance Evaluation of the Omni OpenMP Compiler

by Kazuhiro Kusano, Shigehisa Satoh, Mitsuhisa Sato - In Proceedings of International Workshop on OpenMP: Experiences and Implementations (WOMPEI), volume 1940 of LNCS , 2000
"... . We developed an OpenMP compiler, called Omni. This paper describes a performance evaluation of the Omni OpenMP compiler. We take two commercial OpenMP C compilers, the KAI GuideC and the PGI C compiler, for comparison. Microbenchmarks and a program in Parkbench are used for the evaluation. The res ..."
Abstract - Cited by 13 (0 self) - Add to MetaCart
. We developed an OpenMP compiler, called Omni. This paper describes a performance evaluation of the Omni OpenMP compiler. We take two commercial OpenMP C compilers, the KAI GuideC and the PGI C compiler, for comparison. Microbenchmarks and a program in Parkbench are used for the evaluation. The results using a SUN Enterprise 450 with four processors show the performance of Omni is comparable to a commercial OpenMP compiler, KAI GuideC. The parallelization using OpenMP directives is effective and scales well if the loop contains enough operations, according to the results.

Investigating the performance of two programming models for clusters of SMP PCs

by Franck Cappello, Olivier Richard, Daniel Etiemble , 2000
"... Multiprocessors and high performance networks allow to build CLUsters of MultiProcessors (CLUMPs). A main distinctive feature over traditional parallel computers is their hybrid memory model (message passing between the nodes and shared memory inside the nodes). We eval- uate the performance of a ..."
Abstract - Cited by 12 (2 self) - Add to MetaCart
Multiprocessors and high performance networks allow to build CLUsters of MultiProcessors (CLUMPs). A main distinctive feature over traditional parallel computers is their hybrid memory model (message passing between the nodes and shared memory inside the nodes). We eval- uate the performance of a cluster of 2-way SMP PCs connected by a Myrinet network for NAS benchmarks from two programming: a Single Memory Model based on the MPICH-PM/CLUMP library of the RWCP and a Hybrid Memory Model using MPICH-PM and OpenMP. We compare 2-way SMP configurations speed-up versus single CPU configurations for each model. We demonstrate that better model depends on the features of the applications. In particular, we detail the speed-up results from breakdowns of the benchmarks execution times and from measurements of hardware counters. Then, we show that these two models give performance for PC based CLUMPs close to performance of scalable high-end multicomputers up to large con- figurations (36 nodes).

Effective Cross-Platform, Multilevel Parallelism via Dynamic Adaptive Execution

by Walden Ko, Mark Yankelevsky, Dimitrios S. Nikolopoulos, Constantine D. Polychronopoulos - In 7 th International Workshop on High-Level Parallel Programming Models and Supportive , 2002
"... This paper presents preliminary efforts to develop compilation and execution environments that achieve performance portability of multilevel parallelization on hierarchical architectures. Using the NAS parallel benchmarks, we first illustrate the lack of portable performance on stateof-the-art scala ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
This paper presents preliminary efforts to develop compilation and execution environments that achieve performance portability of multilevel parallelization on hierarchical architectures. Using the NAS parallel benchmarks, we first illustrate the lack of portable performance on stateof-the-art scalable parallel systems despite the use of two portable programming models, MPI and OpenMP. Then we present a dynamic compilation and execution framework that provides the desired portability through the use of program slices. These slices are used to select the optimal program decomposition on each architecture. Currently, our framework uses a simple incremental algorithm, which effectively identifies single or multi-level program decompositions that maximize performance. This algorithm can be used as a rule of thumb for automatic multilevel parallelization. The effectiveness of the approach is demonstrated on the NAS benchmarks running on two architectural platforms. 1.

Understanding performance of SMP clusters running MPI programs

by Franck Capello, Olivier Richard, Daniel Etiemble - Future Generation Computer Systems , 2001
"... CLUsters of MultiProcessors (CLUMPS) have an hybrid memory model, with message passing between nodes and shared memory inside nodes. We examine the performance of Myrinet clusters of SMP PCs when using a Single Memory Model (SMM) based on the MPICH-PM/CLUMP library of the RWCP, which can directly us ..."
Abstract - Add to MetaCart
CLUsters of MultiProcessors (CLUMPS) have an hybrid memory model, with message passing between nodes and shared memory inside nodes. We examine the performance of Myrinet clusters of SMP PCs when using a Single Memory Model (SMM) based on the MPICH-PM/CLUMP library of the RWCP, which can directly use the MPI programs written for a cluster of uniprocessors. The specicities of the communication patterns with the SMM approach are detailed. PC clusters with 2-way and 4-way nodes are considered and compared.

Exploiting Clusters of Shared Memory Multiprocessors with BIP-SMP: the Parallel Simulation Application

by P. Geoffray, C.D. Pham, B. Tourancheau, Bernard Lyon , 1999
"... Parallel simulation of small grain models have traditionally been performed on powerful parallel machines because the communication cost between the processors must be kept small in order to obtain signicant speedups. As we approach the next century, parallel machines are gradually and incrementally ..."
Abstract - Add to MetaCart
Parallel simulation of small grain models have traditionally been performed on powerful parallel machines because the communication cost between the processors must be kept small in order to obtain signicant speedups. As we approach the next century, parallel machines are gradually and incrementally being replaced by clusters of commodity workstations. Clusters of SMPs (CLUMPS) machines appear very attractive because of their high performance/price ratio and will certainly be one of the most demanding architecture. To achieve the maximum of performances, the communication layer needs to provide a multi protocol support, thereby allowing several processes per node to simultaneously send a message to another process on the same physical node and to send a message via the network. In this paper, we present the BIP-SMP software for exploiting CLUMPS and how parallel simulation applications with challenging communication properties can benet from BIPSMP. 1 Introduction Simulation is a pe...

BIP-SMP : High Performance Message Passing over a Cluster of Commodity SMPs

by Patrick Geoffray Loc, Patrick Geoffray, Loïc Prylli, Bernard Tourancheau - In Supercomputing (SC’99 , 1999
"... Device Interface Channel Interface NX Check_incoming "short", "eager", P4 TCP/IP Paragon SP/2 Generic ADI code, datatype mgmt, heterogeneity request queues mgmt " Protocol interface" SGI port. other ports shared-mem port MPL BIP MPI BIP "rendez-vous" Protocols Figure 1: The arc ..."
Abstract - Add to MetaCart
Device Interface Channel Interface NX Check_incoming "short", "eager", P4 TCP/IP Paragon SP/2 Generic ADI code, datatype mgmt, heterogeneity request queues mgmt " Protocol interface" SGI port. other ports shared-mem port MPL BIP MPI BIP "rendez-vous" Protocols Figure 1: The architecture of MPI-BIP implemented with one or several messages of the underlying communication system (BIP in our case). The cost of MPI-BIP is approximately an overhead of 2 s (mainly CPU) over BIP for the latency on our cluster. Thus, the latency of the non-SMP MPI-BIP is very good, 7 s, and the bandwidth reaches 110 MB/s. 3 Related Work Efficient management of CLUMPs is a new research topic. We have investigated issues related to a multi-protocol message passing interface using both shared memory and the network within a CLUMP. Several projects have proposed solutions for this problem in the last few years, and BIP-SMP is in this research line. Projects like MPI-StarT [8] or Sta...

Evaluation des performances de Little TiPI: le reseau de PCs multiprocesseurs du LRI

by Franck Cappello, Olivier Richard, Emaih Fcilri. Fr , 1999
"... La disponibilk de standards de mukiprocesseurs et de rseaux d'interconnexion rapides offre l'oppor- tunk de raliser des "CLUsters" de MukiProcesseurs (CLUMP) a bas cofit. Ces architectures peuvent atre udlises comme des plate-formes pour le calcul parallle. La principale difference entre les CLUMP ..."
Abstract - Add to MetaCart
La disponibilk de standards de mukiprocesseurs et de rseaux d'interconnexion rapides offre l'oppor- tunk de raliser des "CLUsters" de MukiProcesseurs (CLUMP) a bas cofit. Ces architectures peuvent atre udlises comme des plate-formes pour le calcul parallle. La principale difference entre les CLUMPs et les architectures parallles tradkionnelles est leur module mmoire hybride (passage de messages entre les nceuds et mmoire partagUe a l'intrieur de chaque mukiprocesseur). Dans cet article, nous prsentons les rsukats de quelques unes des experiences que nous avons menes sur Little TiPI : le reseau de mukiprocesseurs du LRI. Nous presentons trois rsukats: 1) une tude des performaces intrinsques des PCs mukiprocesseurs pour valuer leur intOrat potendel en rant que nceud d'une plate-forme parallle, 2) une comparaison de performance de Little TiPI et de calculateurs parallles hautes performances, 3) une tude de l'accdradon intra-nceud pour dterminer l'apport rel des nceuds mukiprocesseurs. Cette demiire tude est ralise pour deux modules de programmarlon: le module unifi a passage de messages et le module hybride malant la mmoire partagUe et passage de messages. Nous dcrirons comment nous avons implment les deux modales; le premier a pardr de BIP et le deuxime en paralldisant les NAS.

Parallel Hierarchical Architectures

by Martin Schmollinger , 2002
"... ..."
Abstract - Add to MetaCart
Abstract not found
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University