• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Intra node parallelization of MPI programs with OpenMP (1998)

by Franck Cappello, Olivier Richard
Add To MetaCart

Tools

Sorted by:
Results 1 - 3 of 3

B.A.Abderazek: Performance Enhancement for Matrix Multiplication on a SMP PC Cluster, IPSJ SIG technical Report

by Ta Quoc Viet, Tsutomu Yoshinaga, Ben A. Abderazek - Department of Computer Science & Engineering at Visvesvaraya National Institute of Technology, Nagpur (India). His , 2005
"... Our study proposes a Reducing-size Task Assignation technique (RTA), which is a novel approach to solve the grain-size problem for the hybrid MPI-OpenMP thread-to-thread (hybrid TC) programming model in performing distributed matrix mulitplication on SMP PC clusters. Applying RTA, hybrid TC achieves ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Our study proposes a Reducing-size Task Assignation technique (RTA), which is a novel approach to solve the grain-size problem for the hybrid MPI-OpenMP thread-to-thread (hybrid TC) programming model in performing distributed matrix mulitplication on SMP PC clusters. Applying RTA, hybrid TC achieves an acceptable computation performance while retaining the dynamic task scheduling capability, thereby it can yield a 22 % performance improvement for a 16-node cluster of Xeon dual-processor SMPs in comparison with the pure MPI model. Moreover, we provide formulas to predict hybrid TC performance in different circumstances. 1.

Parallel Homologous Search with Hirschberg Algorithm: A Hybrid MPI-Pthreads Solution

by Nuraini Abdul Rashid, Rosni Abdullah, Abdullah Zawawi, Hj. Talib
"... Abstract:- In this paper, we apply two different parallel programming model, the message passing model using Message Passing Interface (MPI) and the multithreaded model using Pthreads, to protein sequence homologous search. The protein sequence homologous search uses Hirschberg algorithm for the pai ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract:- In this paper, we apply two different parallel programming model, the message passing model using Message Passing Interface (MPI) and the multithreaded model using Pthreads, to protein sequence homologous search. The protein sequence homologous search uses Hirschberg algorithm for the pair-wise sequence alignment. The performance of the homologous search using the MPI-Pthread is compared to the implementation using pure message passing programming model MPI. The evaluation results show that there is a 50 % decrease in computing time when the parallel homologous search is implemented using MPI-Phtreads compared to when using MPI.
(Show Context)

Citation Context

...s with the introduction of cheap multiprocessor personal computers. These low-cost multiprocessors can be clustered together to create a new parallel computing platform call CLUMPs ( Clusters of SMPs)=-=[1]-=- which is a hybrid of shared memory and distributed computing platform. The hybrid parallel computing platform allows user to implement data parallelism at large and medium grain level. The large grai...

Optimization for Hybrid MPI-OpenMP Programs on a Cluster of SMP PCs

by Tsutomu Yoshinaga, Ta Quoc Viet
"... This paper applies a Hybrid MPI-OpenMP program-ming model with a thread-to-thread communication method on a cluster of Dual Intel Xeon Processor SMPs connected by a Gigabit Ethernet network. The experiments include the well-known HPL and CG benchmarks. We also describe optimization tech-niques to ge ..."
Abstract - Add to MetaCart
This paper applies a Hybrid MPI-OpenMP program-ming model with a thread-to-thread communication method on a cluster of Dual Intel Xeon Processor SMPs connected by a Gigabit Ethernet network. The experiments include the well-known HPL and CG benchmarks. We also describe optimization tech-niques to get a high cache hit ratio with the given architecture. As a result, the hybrid model shows performance prominence over the pure MPI model with about 27 % for CG and 12 % for HPL. Besides, with a relatively small programming effort, we have succeeded in reducing the cache miss ratio and thus significantly risen up performance for the CG bench-mark as much as 4.5 times in some cases. 1
(Show Context)

Citation Context

...byn/npcols sparse sub-matrix A. Computation volume is equally distributed among processes. Performance analysis shows that, more than 90% computation costs are spent for a matrix-vector multiplication=-=[9]-=-. Figure 3 shows the part of the original Fortran source code that executes the multiplication operation between matrix A and vector p and stores the result to a vector w. do j=1, lastrow - firstrow +...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University