Results 1 - 10
of
18
Portals 3.0: Protocol Building Blocks for Low Overhead Communication
- in Proceedings of the 2002 Workshop on Communication Architecture for Clusters
, 2002
"... This paper describes the evolution of the Portals message passing architecture and programming interface from its initial development on tightly-coupled massively parallel platforms to the current implementation running on a 1792-node commodity PC Linux cluster. Portals provides the basic building b ..."
Abstract
-
Cited by 38 (17 self)
- Add to MetaCart
This paper describes the evolution of the Portals message passing architecture and programming interface from its initial development on tightly-coupled massively parallel platforms to the current implementation running on a 1792-node commodity PC Linux cluster. Portals provides the basic building blocks needed for higher-level protocols to implement scalable, low-overhead communication. Portals has several unique characteristics that differentiate it from other high-performance system-area data movement layers. This paper discusses several of these features and illustrates how they can impact the scalability and performance of higher-level message passing protocols.
The Hyperion system: Compiling multithreaded Java bytecode for distributed execution
, 2001
"... Our work combines Java compilation to native code with a run-time library that executes Java threads in a distributed-memory environment. This allows a Java programmer to view a cluster of processors as executing a single Java virtual machine. The separate processors are simply resources for executi ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
Our work combines Java compilation to native code with a run-time library that executes Java threads in a distributed-memory environment. This allows a Java programmer to view a cluster of processors as executing a single Java virtual machine. The separate processors are simply resources for executing Java threads with true parallelism, and the run-time system provides the illusion of a shared memory on top of the private memories of the processors. The environment we present is available on top of several UNIX systems and can use a large variety of communication interfaces thanks to the high portability of its run-time system. To evaluate our approach, we compare serial C, serial Java, and multithreaded Java implementations of a branch-and-bound solution to the minimal-cost map-coloring problem. All measurements have been carried out on two platforms using two dierent communication interfaces: SISCI/SCI and MPI-BIP/Myrinet. Key words: Java, compiling, distributed shared memory, Java consistency, multithreading, Hyperion, PM2 1
MPICH/Madeleine: a True Multi-Protocol MPI for High Performance Networks
- In Proc. 15th International Parallel and Distributed Processing Symposium (IPDPS 2001
, 2001
"... Device Interface (ADI) allows to plug different network support modules (aka devices) into the layered structure of MPICH. It is then theoretically possible to support network heterogeneity in MPICH, since the ADI data structures are to some extend multi-device-ready. Practically, however, taking ad ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
Device Interface (ADI) allows to plug different network support modules (aka devices) into the layered structure of MPICH. It is then theoretically possible to support network heterogeneity in MPICH, since the ADI data structures are to some extend multi-device-ready. Practically, however, taking advantage of such a support in MPICH turns out to be a hard issue. Indeed, a rather heavy integration work needs to be done each time a new device has to be supported, in order to preserve inter-device coexistence. An alternate solution is to get a multi-protocol version of MPICH through the use of a generic multiprotocol communication library such as Madeleine [3], the communication subsystem of the PM programming environment. There are two key points in this approach: software re-usability (since Madeleine has not to be modified) as well as extensibility (avoiding ADI modifications prevents from future incompatibilities due to MPICH changes). Nevertheless, one may wonder if such an approach could really be efficient. This paper answers in the affirmative by reporting several results obtained on a number of high-performance networks. Comparisons with other MPI implementations prove that there is no significant loss of performance using our proposal (we even got improvements in some cases). This fact is made possible because Madeleine's conception was multi-protocol-oriented from the very beginning.
Tripwire: A Synchronisation Primitive for Virtual Memory Mapped Communication
, 2000
"... Existing user-level network interfaces deliver high bandwidth, low latency performance to applications, but are typically unable to support diverse styles of communication and are unsuitable for use in multiprogrammed environments. Often this is because the network abstraction is presented at too hi ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
Existing user-level network interfaces deliver high bandwidth, low latency performance to applications, but are typically unable to support diverse styles of communication and are unsuitable for use in multiprogrammed environments. Often this is because the network abstraction is presented at too high a level, and support for synchronisation is inflexible. In this
Distributed Computing with the CLAN Network
- In Proceedings of the 27th Conference on Local Computer Networks
, 2002
"... CLAN (Collapsed LAN) is a high performance userlevel network targeted at the server room. It presents a simple low-level interface to applications: connectionoriented non-coherent shared memory for data transfer, and Tripwire, a user-level programmable CAM for synchronisation. This simple interface ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
CLAN (Collapsed LAN) is a high performance userlevel network targeted at the server room. It presents a simple low-level interface to applications: connectionoriented non-coherent shared memory for data transfer, and Tripwire, a user-level programmable CAM for synchronisation. This simple interface is implemented using only hardware state machines on the NIC, yet is flexible enough to support many different applications and communications paradigms.
Improving Reactivity to I/O Events in Multithreaded . . .
, 2002
"... Reactivity to I/O events is a crucial factor for the performance of modern multithreaded distributed systems. In our scheduler-centric approach, an application detects I/O events by requesting a service from a detection server, through a simple, uniform API. We show that a good choice for this detec ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Reactivity to I/O events is a crucial factor for the performance of modern multithreaded distributed systems. In our scheduler-centric approach, an application detects I/O events by requesting a service from a detection server, through a simple, uniform API. We show that a good choice for this detection server is the thread scheduler. This approach simplifies application programming, significantly improves performance, and provides a much tighter control on reactivity.
The MPC Parallel Computer: Hardware, Low-level Protocols and Performances
- in Proc of IASTED Parallel and Distributed Computing and Systems (PDCS 2000), Las Vegas
, 2000
"... This paper presents the MPC parallel computer and its MPI implementation performed at the Laboratoire LIP6 of Univ. Pierre and Marie Curie, Paris. MPC is a low cost and high performance parallel computer using standard PC motherboards as processing nodes connected through the specific FastHSL board ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper presents the MPC parallel computer and its MPI implementation performed at the Laboratoire LIP6 of Univ. Pierre and Marie Curie, Paris. MPC is a low cost and high performance parallel computer using standard PC motherboards as processing nodes connected through the specific FastHSL board to a high speed communication network using HSL 1 Gbits/s serial links, IEEE 1355 compliant. Two Asics are presented : RCUBE which is the HSL network router, and PCI-DDC the network controller implementing the Direct Deposit State Less receiver protocol.
Design and Implementation of MPI on Portals 3.0
"... Abstract. This paper describes an implementation of the Message Passing Interface (MPI) on the Portals 3.0 data movement layer. Portals 3.0 provides low-level building blocks that are flexible enough to support higher-level message passing layers such as MPI very efficiently. Portals 3.0 is also des ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. This paper describes an implementation of the Message Passing Interface (MPI) on the Portals 3.0 data movement layer. Portals 3.0 provides low-level building blocks that are flexible enough to support higher-level message passing layers such as MPI very efficiently. Portals 3.0 is also designed to allow for programmable network interface cards to offload message processing from the host processor. We will describe the basic building blocks in Portals 3.0, show how they can be put together to implement MPI, and describe the protcols of an MPI implementation. We will look at several key operations within an MPI implementation and describe the effects that a Portals 3.0 implementation has on scalability and performance. 1

