Results 1  10
of
17
A highperformance, portable implementation of the MPI message passing interface standard
 Parallel Computing
, 1996
"... MPI (Message Passing Interface) is a specification for a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, and applications specialists. Multiple implementations of MPI have been developed. In this paper, we d ..."
Abstract

Cited by 719 (43 self)
 Add to MetaCart
MPI (Message Passing Interface) is a specification for a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, and applications specialists. Multiple implementations of MPI have been developed. In this paper, we describe MPICH, unique among existing implementations in its design goal of combining portability with high performance. We document its portability and performance and describe the architecture by which these features are simultaneously achieved. We also discuss the set of tools that accompany the free distribution of MPICH, which constitute the beginnings of a portable parallel programming environment. A project of this scope inevitably imparts lessons about parallel computing, the specification being followed, the current hardware and software environment for parallel computing, and project management; we describe those we have learned. Finally, we discuss future developments for MPICH, including those necessary to accommodate extensions to the MPI Standard now being contemplated by the MPI Forum. 1
SIMPLE: A methodology for programming high performance algorithms on clusters of symmetric multiprocessors (SMPs
 Journal of Parallel and Distributed Computing
, 1999
"... We describe a methodology for developing high performance programs running on clusters of SMP nodes. Our methodology is based on a small kernel (SIMPLE) of collective communication primitives that make e cient use of the hybrid shared and message passing environment. We illustrate the power of our m ..."
Abstract

Cited by 53 (13 self)
 Add to MetaCart
We describe a methodology for developing high performance programs running on clusters of SMP nodes. Our methodology is based on a small kernel (SIMPLE) of collective communication primitives that make e cient use of the hybrid shared and message passing environment. We illustrate the power of our methodology by presenting experimental results for sorting integers, twodimensional fast Fourier transforms (FFT), and constraintsatis ed searching. Our testbed is a cluster of DEC AlphaServer 2100 4/275 nodes interconnected by anATM switch.
Mpi2: Extending the messagepassing interface
 In EuroPar ’96: Proceedings of the Second International EuroPar Conference on Parallel Processing
, 1996
"... the MPI2 Forum. The MPI2 Forum is a group of parallel computer vendors, library writers, and application specialists working together to de ne a set of extensions to MPI (Message Passing Interface). MPI was de ned by the same process and now has many implementations, both vendorproprietary and pu ..."
Abstract

Cited by 36 (15 self)
 Add to MetaCart
the MPI2 Forum. The MPI2 Forum is a group of parallel computer vendors, library writers, and application specialists working together to de ne a set of extensions to MPI (Message Passing Interface). MPI was de ned by the same process and now has many implementations, both vendorproprietary and publicly available, for a wide variety of parallel computing environments. In this paper we present the salient aspects of the evolving MPI2 document asitnow stands. We discuss proposed extensions and enhancements to MPI in the areas of dynamic process management, onesided operations, collective operations, new language binding, realtime computing, external interfaces, and miscellaneous topics. 1
Parallelization of the Vehicle Routing Problem with Time Windows
, 2001
"... Routing with time windows (VRPTW) has been an area of research that have
attracted many researchers within the last 10 { 15 years. In this period a number
of papers and technical reports have been published on the exact solution of the
VRPTW.
The VRPTW is a generalization of the wellknown capacitat ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
Routing with time windows (VRPTW) has been an area of research that have
attracted many researchers within the last 10 { 15 years. In this period a number
of papers and technical reports have been published on the exact solution of the
VRPTW.
The VRPTW is a generalization of the wellknown capacitated routing problem
(VRP or CVRP). In the VRP a
eet of vehicles must visit (service) a number
of customers. All vehicles start and end at the depot. For each pair of customers
or customer and depot there is a cost. The cost denotes how much is costs a
vehicle to drive from one customer to another. Every customer must be visited
exactly ones. Additionally each customer demands a certain quantity of goods
delivered (know as the customer demand). For the vehicles we have an upper
limit on the amount of goods that can be carried (known as the capacity). In
the most basic case all vehicles are of the same type and hence have the same
capacity. The problem is now for a given scenario to plan routes for the vehicles
in accordance with the mentioned constraints such that the cost accumulated
on the routes, the xed costs (how much does it cost to maintain a vehicle) or
a combination hereof is minimized.
In the more general VRPTW each customer has a time window, and between
all pairs of customers or a customer and the depot we have a travel time. The
vehicles now have to comply with the additional constraint that servicing of the
customers can only be started within the time windows of the customers. It
is legal to arrive before a time window \opens" but the vehicle must wait and
service will not start until the time window of the customer actually opens.
For solving the problem exactly 4 general types of solution methods have
evolved in the literature: dynamic programming, DantzigWolfe (column generation),
Lagrange decomposition and solving the classical model formulation
directly.
Presently the algorithms that uses DantzigWolfe given the best results
(Desrochers, Desrosiers and Solomon, and Kohl), but the Ph.D. thesis of Kontoravdis
shows promising results for using the classical model formulation directly.
In this Ph.D. project we have used the DantzigWolfe method. In the
DantzigWolfe method the problem is split into two problems: a \master problem"
and a \subproblem". The master problem is a relaxed set partitioning
v
vi
problem that guarantees that each customer is visited exactly ones, while the
subproblem is a shortest path problem with additional constraints (capacity and
time window). Using the master problem the reduced costs are computed for
each arc, and these costs are then used in the subproblem in order to generate
routes from the depot and back to the depot again. The best (improving) routes
are then returned to the master problem and entered into the relaxed set partitioning
problem. As the set partitioning problem is relaxed by removing the
integer constraints the solution is seldomly integral therefore the DantzigWolfe
method is embedded in a separationbased solutiontechnique.
In this Ph.D. project we have been trying to exploit structural properties in
order to speed up execution times, and we have been using parallel computers
to be able to solve problems faster or solve larger problems.
The thesis starts with a review of previous work within the eld of VRPTW
both with respect to heuristic solution methods and exact (optimal) methods.
Through a series of experimental tests we seek to dene and examine a number
of structural characteristics.
The rst series of tests examine the use of dividing time windows as the
branching principle in the separationbased solutiontechnique. Instead of using
the methods previously described in the literature for dividing a problem into
smaller problems we use a methods developed for a variant of the VRPTW. The
results are unfortunately not positive.
Instead of dividing a problem into two smaller problems and try to solve
these we can try to get an integer solution without having to branch. A cut is an
inequality that separates the (nonintegral) optimal solution from all the integer
solutions. By nding and inserting cuts we can try to avoid branching. For the
VRPTW Kohl has developed the 2path cuts. In the separationalgorithm for
detecting 2path cuts a number of test are made. By structuring the order in
which we try to generate cuts we achieved very positive results.
In the DantzigWolfe process a large number of columns may be generated,
but a signicant fraction of the columns introduced will not be interesting with
respect to the master problem. It is a priori not possible to determine which
columns are attractive and which are not, but if a column does not become part
of the basis of the relaxed set partitioning problem we consider it to be of no
benet for the solution process. These columns are subsequently removed from
the master problem. Experiments demonstrate a signicant cut of the running
time.
Positive results were also achieved by stopping the routegeneration process
prematurely in the case of timeconsuming shortest path computations. Often
this leads to stopping the shortest path subroutine in cases where the information
(from the dual variables) leads to \bad" routes. The premature exit
from the shortest path subroutine restricts the generation of \bad" routes signi
cantly. This produces very good results and has made it possible to solve
problem instances not solved to optimality before.
The parallel algorithm is based upon the sequential DantzigWolfe based
algorithm developed earlier in the project. In an initial (sequential) phase unsolved
problems are generated and when there are unsolved problems enough
vii
to start work on every processor the parallel solution phase is initiated. In the
parallel phase each processor runs the sequential algorithm. To get a good workload
a strategy based on balancing the load between neighbouring processors is
implemented. The resulting algorithm is eÆcient and capable of attaining good
speedup values. The loadbalancing strategy shows an even distribution of work
among the processors. Due to the large demand for using the IBM SP2 parallel
computer at UNIC it has unfortunately not be possible to run as many tests
as we would have liked. We have although managed to solve one problem not
solved before using our parallel algorithm.
CRI/EPCC MPI for CRAY T3D
"... MPI is the standard interface for message passing in parallel systems. The standard was defined in an open collaborative forum involving about 60 people from 40 different organisations, and the finished version was completed in May 1994. EPCC and CRI have established a collaboration to develop and s ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
MPI is the standard interface for message passing in parallel systems. The standard was defined in an open collaborative forum involving about 60 people from 40 different organisations, and the finished version was completed in May 1994. EPCC and CRI have established a collaboration to develop and support a robust, high performance implementation of MPI for the T3D. The product was released from EPCC in May 1995. In this paper we describe the architecture of MPI for the T3D and present performance measurements of the implementation. It is shown that a significant part of the T3D capability is available through MPI, demonstrating that MPI is capable of delivering both performance and portability to applications. 1 Introduction MPI [5] is the standard interface for message passing in parallel systems. The standard was defined in an open collaborative forum involving about 60 people from 40 different organisations. An initial draft of the standard was released at the Supercomputing '93 ...
Performance Modeling and Evaluation of MPI
 JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 2001
"... Users of parallel machines need to have a good grasp for how different communication patterns and styles affect the performance of messagepassing applications. LogGP is a simple performance model that reflects the most important parameters required to estimate the communication performance of paral ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Users of parallel machines need to have a good grasp for how different communication patterns and styles affect the performance of messagepassing applications. LogGP is a simple performance model that reflects the most important parameters required to estimate the communication performance of parallel computers. The message passing interface (MPI) standard provides new opportunities for developing high performance parallel and distributed applications. In this paper, we use LogGP as a conceptual framework for evaluating the performance of MPI communications on three platforms: CrayResearch T3D, Convex Exemplar 1600SP, and a network of workstations (NOW). Our objective is to identify a performance model suitable for MPI performance characterization and to compare the performance of MPI communications on several platforms.
An Exact Parallel Algorithm For The Maximum Clique Problem
 In High Performance and Software in Nonlinear Optimization
, 1998
"... . In this paper we present a portable exact parallel algorithm for the maximum clique problem on general graphs. Computational results with random graphs and some test graphs from applications are presented. The algorithm is parallelized using the Message Passing Interface (MPI) standard. The algori ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
. In this paper we present a portable exact parallel algorithm for the maximum clique problem on general graphs. Computational results with random graphs and some test graphs from applications are presented. The algorithm is parallelized using the Message Passing Interface (MPI) standard. The algorithm is based on the CarraghanPardalos exact algorithm (for unweighted graphs) and incorporates a variant of the greedy randomized adaptive search procedure (GRASP) for maximum independent set of Feo, Resende, and Smith (1994) to obtain good starting solutions. 1. INTRODUCTION Let G= (V,E) be an undirected weighted graph where V = {v 1 , v 2 , . . . , v n } is the set of vertices in G, and E #V×V is the set of edges in G. Each vertex v i #V is associated with a positive weight w i . For a subset S #V , we define the weight of S to be W (S) =å i#S w i and G(S) = (S,E #S×S) as the subgraph induced by S. The size of the vertex set is throughout this paper denoted by n. The adjacenc...
Performance evaluation of MPI implementations and MPI based parallel ELLPACK solvers
 In 2 nd MPI Developers Coneference
, 1996
"... In this study, we are concerned with the parallelizationof finite element mesh generation and its decomposition, and the parallel solution of sparse algebraic equations which are obtained from the parallel discretization of second order elliptic partial differential equations (PDEs) using finite dif ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
In this study, we are concerned with the parallelizationof finite element mesh generation and its decomposition, and the parallel solution of sparse algebraic equations which are obtained from the parallel discretization of second order elliptic partial differential equations (PDEs) using finite difference and finite element techniques. For this we use the Parallel ELLPACK (//ELLPACK) problem solving environment (PSE) which supports PDE computations on several MIMD platforms. We have considered the ITPACK library of stationary iterative solvers which we have parallelized and integrated into the //ELLPACK PSE. This Parallel ITPACK package has been implemented using the MPI, PVM, PICL, PARMACS, nCUBE Vertex and Intel NX message passing communication libraries. It performs very efficiently on a variety of hardware and communication platforms. To study the efficiency of three MPI library implementations, the performance of the Parallel ITPACK solvers was measured on several distributed memory architectures and on clusters of workstations for a testbed of elliptic boundary value PDE problems. We present a comparison of these MPI library implementationswith PVM and the native communication libraries, based on their performance on these tests. Moreover we have implemented in MPI, a parallel mesh generator that concurrently produces a semi–optimal partitioning of the mesh to support various domain decomposition solution strategies across the above platforms. The results indicate that the MPI overhead varies among the various implementations without significantly affecting the algorithmic speedup even on clusters of workstations.
Implementing Parallel Algorithms based on Prototype Evaluation and Transformation
 Department of Computer Science, University of Dortmund
, 1997
"... Combining parallel programming with prototyping is aimed at alleviating parallel programming by enabling the programmer to make practical experiments with ideas for parallel algorithms at a high level, neglecting lowlevel considerations of specific parallel architectures in the beginning of prog ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Combining parallel programming with prototyping is aimed at alleviating parallel programming by enabling the programmer to make practical experiments with ideas for parallel algorithms at a high level, neglecting lowlevel considerations of specific parallel architectures in the beginning of program development. Therefore, prototyping parallel algorithms is aimed at bridging the gap between conceptual design of parallel algorithms and practical implementation on specific parallel systems. The essential prototyping activities are programming, evaluation and transformation of prototypes. This paper gives a report on some experience with implementing parallel algorithms based on prototype evaluation and transformation employing the ProSetLinda approach. 1 Introduction Parallel programming is conceptually harder to undertake and to understand than sequential programming, because a programmer often has to cope with the coexistence and coordination of multiple parallel activities....
Goals Guiding Design: PVM and MPI
 In the
, 2002
"... PVM and MPI, two systems for programming clusters, are often compared. The comparisons usually start with the unspoken assumption that PVM and MPI represent different solutions to the same problem. In this paper we show that, in fact, the two systems often are solving different problems. In cases wh ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
PVM and MPI, two systems for programming clusters, are often compared. The comparisons usually start with the unspoken assumption that PVM and MPI represent different solutions to the same problem. In this paper we show that, in fact, the two systems often are solving different problems. In cases where the problems do match but the solutions chosen by PVM and MPI are different, we explain the reasons for the differences. Usually such differences can be traced to explicit differences in the goals of the two systems, their origins, or the relationship between their specifications and their implementations. For example, we show that the requirement for portability and performance across many platforms caused MPI to choose approaches different from those made by PVM, which is able to exploit the similarities of networkconnected systems.