Results 1 - 10
of
14
Active Ports: A Performance-oriented Operating System Support to Fast LAN Communications
- In Proc. Euro-Par'98, LNCS 1470, pp.620-- 624
, 1998
"... . The Genoa Active Message MAchine (GAMMA) is an efficient communication layer for 100base-T clusters of Personal Computers running Linux. It is based on Active Ports, a communication mechanism derived from Active Messages. GAMMA Active Ports deliver excellent communication performance at user level ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
. The Genoa Active Message MAchine (GAMMA) is an efficient communication layer for 100base-T clusters of Personal Computers running Linux. It is based on Active Ports, a communication mechanism derived from Active Messages. GAMMA Active Ports deliver excellent communication performance at user level (latency 12.7 ¯s, maximum throughput 12.2 MByte/s, half-power point reached with 192 byte long messages), thus enabling cost-effective cluster computing on 100base-T. Despite being implemented at kernel level in the Linux OS, the performance numbers of GAMMA Active Ports are much better than many other LAN-oriented communication layers, including so called "user-level" ones (e.g. U-Net). 1 Introduction Networks of workstations (NOWs) or, even better, clusters of Personal Computers (PCs) networked by inexpensive commodity interconnects, potentially offer a cost-effective support to parallel processing. The only obstacle to overcome is the inefficiency caused by the OS layers that sedimented...
Design of a VIA based communication protocol for LAM/MPI Suite
- In 9th Euromicro Workshop on Parallel and Distributed Processing
, 2001
"... The increasing use of System Area Network (SAN) demands efficient communication to benefit of SAN features through a direct access to network resources and avoiding kernel intervention in communication path. Recently, a consortium composed by Microsoft, Compaq and Intel authored a new standard, the ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
The increasing use of System Area Network (SAN) demands efficient communication to benefit of SAN features through a direct access to network resources and avoiding kernel intervention in communication path. Recently, a consortium composed by Microsoft, Compaq and Intel authored a new standard, the Virtual Interface Architecture (VIA), designed to reduce software overhead in data transfers. This paper describes the communication protocol proposed in order to allow a complete implementation of MPI based on VIA. This protocol is needed because the plain use of the two VIA data transfer models does not allow the implementation of MPI based on VIA, due to the large number of MPI communication flavors. To validate the goodness of the proposed protocol, a new communication layer based on VIA has been introduced in the LAM/MPI suite. The reported results, referring to a software VIA implementation for Fast Ethernet networks, exhibits a significant reduction in latency time of LAM/MPI based...
A Performance-oriented Operating System Approach to Fast Communications in a Low-cost Network of Workstations
- In Proc. 1998 International Conference on Parallel and Distributed Processing, Techniques and Applications (PDPTA'98), volume I
, 1998
"... The use of workstations connected by a fast Local Area Network (LAN) to form a so called Network of Workstations (NOW) is a very appealing idea to implement a low-cost parallel processing platform. Interprocess communication is the most difficult feature for such a system to implement with an accept ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
The use of workstations connected by a fast Local Area Network (LAN) to form a so called Network of Workstations (NOW) is a very appealing idea to implement a low-cost parallel processing platform. Interprocess communication is the most difficult feature for such a system to implement with an acceptable level of performance. Standard protocols and mechanisms implemented at Operating System (OS) level usually do not provide satisfactory performance in a NOW architecture, especially with respect to the communication performance offered by the raw interconnection hardware. Two main solutions to such efficiency issue have been proposed so far, namely: standard OS mechanisms relying on simplified communication protocols, and user-level protected access to the raw communication hardware. We show that a third way, namely efficient OS mechanisms supporting an Active Message communication layer, can not only offer higher level communication primitives in a multiprogrammed environment but also o...
Improving the Communication Subsystem Performance of WARPED
, 1998
"... With the advent of cheap and powerful hardware for workstations and networks, a new cluster-based architecture for Time Warp simulations has been envisioned. However, fine-grained Time Warp applications that communicate frequently are not the ideal candidates for such architectures due to their high ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
With the advent of cheap and powerful hardware for workstations and networks, a new cluster-based architecture for Time Warp simulations has been envisioned. However, fine-grained Time Warp applications that communicate frequently are not the ideal candidates for such architectures due to their high latency communication costs. Hence, designers of fine-grained Time Warp applications on clusters are faced with the problem of reducing the high communication latency of the communication subsystem in such architectures. An efficient communication subsystem consumes a lower fraction of the processing cycles for communication operations and allows the majority of the processing cycles to be used by the application. This increases the performance of Time Warp applications. This thesis reduces the latency of the communication subsystem by selecting one of the following approaches: (i) reducing network latency by employing a higher performance network hardware (i.e., Fast Ethernet versus Myrine...
Efficient Molecular Dynamics on a Network of Personal Computers
- In Proc. VecPar'98
, 1998
"... . The Genoa Active Message Machine (GAMMA) is a highperformance Active Messages-like communication layer implemented at kernel level as an extension of the Linux Operating System, and made available to user applications through a programming library. On lowcost clusters of Personal Computers (PCs) c ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
. The Genoa Active Message Machine (GAMMA) is a highperformance Active Messages-like communication layer implemented at kernel level as an extension of the Linux Operating System, and made available to user applications through a programming library. On lowcost clusters of Personal Computers (PCs) connected by Fast Ethernet, GAMMA achieves much better communication performance compared to public domain implementations of MPI and PVM. We have considered an existing PVM Molecular Dynamics (MD) parallel application, designed to be portable across various MPP as well as NOW platforms. The goal of our work is to show how much migrating such a complex application from PVM to GAMMA is convenient in terms of absolute performance improvement as well as price/performance ratio in the perspective of running MD on a low-cost cluster of PCs. The "migration " approach is then compared to other two alternatives, namely: running the PVM version of MD "as is" on a cluster of PCs and trying tuning the P...
A Cluster Operating System Based on Software COMA Memory Management
"... Clusters of SMPs are attractive for executing shared memory parallel applications but reconciling high performance and ease of programming remains an open issue. A possible approach is to provide an efficient Single System Image operating system giving the illusion of an SMP machine. In this paper, ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Clusters of SMPs are attractive for executing shared memory parallel applications but reconciling high performance and ease of programming remains an open issue. A possible approach is to provide an efficient Single System Image operating system giving the illusion of an SMP machine. In this paper, we present such a system focusing on global management of the memory resource. We introduce the concept of container at the lowest operating system level to build a COMA-like memory management subsystem. Higher level operating system services such as virtual memory system and file cache can be easily implemented based on containers and transparently take benefit of the whole memory resource available in the cluster.
Tender to III/97/31 Lot 5, Deliverable 1.1 - DISCO Report on the state-of-the-art of PC Cluster Computing
, 1998
"... This report surveys the state of the art of Cluster Computing based mainly on low-cost PC or workstations technology. Real industrial applications as well as EU funded and international University/Research projects are taken into account in order to provide an overall (although necessarily not exhau ..."
Abstract
- Add to MetaCart
This report surveys the state of the art of Cluster Computing based mainly on low-cost PC or workstations technology. Real industrial applications as well as EU funded and international University/Research projects are taken into account in order to provide an overall (although necessarily not exhaustive) view of the current situation and trends. Cluster computing is mainly obtained by providing a set of PCs or workstations connected by a more or less sophisticated local area network (LAN) with appropriate communications libraries and/or operating system (OS) primitives, and by adopting appropriate application-level languages, libraries, and/or environments to support parallel processing. After a first chapter reviewing already existing industrial applications (that should be regarded as main practical motivations to pursue this approach), we present in subsequent chapters the enabling technologies that have been adopted at the various application, LAN, OS and system library levels. An...
A Communication System for Efficient Parallel Processing on Clusters of Personal Computers
, 1999
"... Current trends indicate that multiprocessor platforms will eventually replace uniprocessors in every application field, as the performance improvement exhibited by uniprocessor architectures during the last decade will soon become unable to satisfy the ever growing demand for higher and higher appli ..."
Abstract
- Add to MetaCart
Current trends indicate that multiprocessor platforms will eventually replace uniprocessors in every application field, as the performance improvement exhibited by uniprocessor architectures during the last decade will soon become unable to satisfy the ever growing demand for higher and higher application performance. Modern high-end Personal Computers (PCs) provide computation speed as well storage capacity at the best price/performance ever. Therefore an obvious way to obtain higher performance, parallel systems is to build a distributed-memory platform out of a pool of PCs interconnected by a fast Local Area Network (LAN) hardware, to obtain what is commonly called a cluster of PCs. Clusters are potentially able to deliver high performance at the unbeatable price/performance typical of their building blocks, namely commodity PCs and LAN interconnects, thus providing a low-cost alternative to both shared-memory multiprocessors and distributed-memory Massively Parallel Processors (MPP...
GAMMA on DEC 2114x with Efficient Flow Control
- In Proc. 1999 International Conference on Parallel and Distributed Processing, Techniques and Applications (PDPTA'99), Las Vegas
, 1999
"... GAMMA is a prototype light-weight communication system based on the Active Ports paradigm, designed for efficient implementation over Fast Ethernet interconnects. The original implementation started in 1996 based on 3Com 3C595 cards. The optimizations obtained on that NICs allowed us to obtain the l ..."
Abstract
- Add to MetaCart
GAMMA is a prototype light-weight communication system based on the Active Ports paradigm, designed for efficient implementation over Fast Ethernet interconnects. The original implementation started in 1996 based on 3Com 3C595 cards. The optimizations obtained on that NICs allowed us to obtain the lowest latency and highest throughput results ever published in the literature for Fast Ethernet. Technology evolved, however, and now all low-cost NICs available are based on Descriptor Based DMA (DBDMA) transfers, originally introduced by the DEC chipset 21140. In this paper we report on the re-implementation of the GAMMA prototype for the DEC NICs exploiting the new transfer modes and achieving substantially equivalent performance figures. We also describe the addition of an efficient flow-control algorithm, that allows loss-free communications without seriously affecting performance. Keywords: Active Ports; Fast Ethernet; Low Latency; Descriptor Based DMA; Flow-control. 1 Introduction Li...

