Results 1 -
7 of
7
Efficiency vs. Portability in Cluster-Based Network Servers
"... Efficiency and portability are usually conflicting objectives for cluster-based network servers that distribute the clients ’ requests across the cluster based on the actual content requested. Our work is based on the observation that this efficiency vs. portability tradeoff has not been discussed b ..."
Abstract
-
Cited by 47 (21 self)
- Add to MetaCart
Efficiency and portability are usually conflicting objectives for cluster-based network servers that distribute the clients ’ requests across the cluster based on the actual content requested. Our work is based on the observation that this efficiency vs. portability tradeoff has not been discussed before in the literature. To fill this gap, in this paper we study this tradeoff in the context of an interesting class of content-based network servers, the locality-conscious servers, using modeling and experimentation. Our analytical model gauges the potential performance benefits of portable and non-portable localityconscious request distribution with respect to a traditional, locality-oblivious server, as a function of multiple parameters. Based on our experience with the model, we design and evaluate a portable, locality-conscious server. Experiments with our server, a nonportable server, and a traditional server validate and confirm our modeling results under several real workloads. Based on our modeling and experimental results, our main conclusion is that portability should be promoted in cluster-based network servers with low processor overhead communication, given its relatively low cost 15%) in terms of efficiency. For clusters with high processor overhead communication, efficiency should be the overriding concern, as the cost of portability can be very high (as high as 98 % on 32 nodes). We also conclude that user-level communication can be useful even for non-scientific applications such as network servers.
LoGPC: Modeling Network Contention in Message-Passing Programs
- IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1998
"... In many real applications, for example those with frequent and irregular communication patterns or those using large messages, network contention and contention for message processing resources can be a significant part of the total execution time. This paper presents a new cost model, called LoGPC, ..."
Abstract
-
Cited by 35 (4 self)
- Add to MetaCart
In many real applications, for example those with frequent and irregular communication patterns or those using large messages, network contention and contention for message processing resources can be a significant part of the total execution time. This paper presents a new cost model, called LoGPC, that extends the LogP [9] and LogGP [4] models to account for the impact of network contention and network interface DMA behavior on the performance of message-passing programs. We validate LoGPC by analyzing three applications implemented with Active Messages [11, 18] on the MIT Alewife multiprocessor. Our analysis shows that network contention accounts for up to 50% of the total execution time. In addition, we show that the impact of communication locality on the communication costs is at most a factor of two on Alewife. Finally, we use the model to identify tradeoffs between synchronous and asynchronous message passing styles.
Challenging Applications on Fast Networks
- Fourth International Symposium on High-Performance Computer Architecture (HPCA-4
, 1998
"... Parallel computing on clusters of workstations is attractive because of the low costs in comparison to MPPs, but the speed of the local area network limits the class of applications that can be run efficiently. Fortunately, faster network technology is becoming available for the next generation of w ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Parallel computing on clusters of workstations is attractive because of the low costs in comparison to MPPs, but the speed of the local area network limits the class of applications that can be run efficiently. Fortunately, faster network technology is becoming available for the next generation of workstation clusters. This paper studies the effect of running challenging applications that communicate heavily on three types of modern interconnects: 100 Mbit/s Fast Ethernet, 155 Mbit/s ATM, and 1.28 Gbit/s Myrinet. Experimental results show that even challenging communicationintensive applications can achieve acceptable performance on workstation clusters, but only if the communication software has been designed and tuned for high performance. 1. Introduction Parallel computing on clusters of workstations is attractive because of the low costs in comparison to commercial Massively Parallel Processors (MPPs). A disadvantage of workstation clusters is that MPPs contain high-performance pro...
Responsiveness without Interrupts
, 1999
"... this paper is a characterization of the delays actually observed in a suite of applications. We show that the majority of notification delays result from a small number of large delays. These delays can dominate any gains achieved through use of new network technologies. The impact of these delays c ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
this paper is a characterization of the delays actually observed in a suite of applications. We show that the majority of notification delays result from a small number of large delays. These delays can dominate any gains achieved through use of new network technologies. The impact of these delays can be considerable. Our applications averaged more than 31% slower without interrupts than with them. This result argues that the problem is serious, and needs to be addressed either by including interrupts in emerging standards, or through use of the techniques discussed below
An Efficient Virtual Network Interface in the FUGU Scalable Workstation
, 1998
"... A scalable workstation is one vision of a mainstream parallel computer: a machine that combines scalable, fine-grain communication facilities for parallel applications with virtual memory and preemptive multiprogramming to support general-purpose workloads. A key challenge in a scalable workstation ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
A scalable workstation is one vision of a mainstream parallel computer: a machine that combines scalable, fine-grain communication facilities for parallel applications with virtual memory and preemptive multiprogramming to support general-purpose workloads. A key challenge in a scalable workstation is the Virtual Network Interface (VNI) problem. The problem is that high performance communication for parallel programming depends on a tight coupling between the application and the network while multiprogramming and virtual memory effects disrupt such coupling. This thesis
Hyperplane Grouping and Pipelined Schedules: How to Execute Tiled Loops Fast on Clusters of SMPs
"... Abstract. This paper proposes a novel approach for the parallel execution of tiled Iteration Spaces onto a cluster of SMP PC nodes. Each SMP node has multiple CPUs and a single memory mapped PCI-SCI Network Interface Card. We apply a hyperplane-based grouping transformation to the tiled space, so as ..."
Abstract
- Add to MetaCart
Abstract. This paper proposes a novel approach for the parallel execution of tiled Iteration Spaces onto a cluster of SMP PC nodes. Each SMP node has multiple CPUs and a single memory mapped PCI-SCI Network Interface Card. We apply a hyperplane-based grouping transformation to the tiled space, so as to group together independent neighboring tiles and assign them to the same SMP node. In this way, intranode (intragroup) communication is annihilated. Groups are atomically executed inside each node. Nodes exchange data between successive group computations. We schedule groups much more efficiently by exploiting the inherent overlapping between communication and computation phases among successive atomic group executions. The applied non-blocking schedule resembles a pipelined datapath, where group computation phases are overlapped with communication ones, instead of being interleaved with them. Our experimental results illustrate that the proposed method outperforms previous approaches involving blocking communication or conventional grouping schemes.
SHIFT+M: Software-Hardware Information Flow Tracking on Multi-core
"... We designed, implemented and analyzed three distributed protocols for information-flow tracking on a multi-core message-passing architecture. In each we used Asbestos style labels to provide protection from unauthorized communication. The protocols remove the reliance on a central repository for tai ..."
Abstract
- Add to MetaCart
We designed, implemented and analyzed three distributed protocols for information-flow tracking on a multi-core message-passing architecture. In each we used Asbestos style labels to provide protection from unauthorized communication. The protocols remove the reliance on a central repository for taint checking by adding a trusted library and hardware mechanisms at each core. We modeled the hardware and software of each protocol using Simics, a multi-core full system simulator and used micro-benchmarks to capture their respective performance with different communication patterns. We present the protocols, hardware design, and results that inform an evaluation of the three protocols. 1

