Results 1 - 10
of
39
DPF: Fast, Flexible Message Demultiplexing using Dynamic Code Generation
- In ACM Communication Architectures, Protocols, and Applications (SIGCOMM
, 1996
"... Fast and flexible message demultiplexing are well-established goals in the networking community [1, 18, 22]. Currently, however, network architects have had to sacrifice one for the other. We present a new packet-filter system, DPF (Dynamic Packet Filters), that provides both the traditional flexibi ..."
Abstract
-
Cited by 116 (10 self)
- Add to MetaCart
Fast and flexible message demultiplexing are well-established goals in the networking community [1, 18, 22]. Currently, however, network architects have had to sacrifice one for the other. We present a new packet-filter system, DPF (Dynamic Packet Filters), that provides both the traditional flexibility of packet filters [18] and the speed of hand-crafted demultiplexing routines [3]. DPF filters run 10--50 times faster than the fastest packet filters reported in the literature [1, 17, 18, 27]. DPF's performance is either equivalent to or, when it can exploit runtime information, superior to handcoded demultiplexors. DPF achieves high performance by using a carefully-designed declarative packet-filter language that is aggressively optimized using dynamic code generation. The contributions of this work are: (1) a detailed description of the DPF design, (2) discussion of the use of dynamic code generation and quantitative results on its performance impact, (3) quantitative results on how ...
Distributed Network Computing over Local ATM Networks
- IEEE Journal on Selected Areas in Communications
, 1994
"... Communication between processors has long been the bottleneck of distributed network computing. However, recent progress in switch-based high-speed Local Area Networks (LANs) may be changing this situation. Asynchronous Transfer Mode (ATM) is one of the most widely-accepted and emerging high-speed n ..."
Abstract
-
Cited by 75 (5 self)
- Add to MetaCart
Communication between processors has long been the bottleneck of distributed network computing. However, recent progress in switch-based high-speed Local Area Networks (LANs) may be changing this situation. Asynchronous Transfer Mode (ATM) is one of the most widely-accepted and emerging high-speed network standards which can potentially satisfy the communication needs of distributed network computing. In this paper, we investigate distributed network computing over local ATM networks. We first study the performance characteristics involving end-to-end communication in an environment that includes several types of workstations interconnected via a Fore Systems' ASX-100 ATM Switch. We then compare the communication performance of four different Application Programming Interfaces (APIs). The four APIs were Fore Systems ATM API, BSD socket programming interface, Sun's Remote Procedure Call (RPC), and the Parallel Virtual Machine (PVM) message passing library. Each API represents distribute...
Separating Data and Control Transfer in Distributed Operating Systems
- In Sixth International Conference on Architecture Support for Programming Languages and Operating Systems
, 1994
"... Advances in processor architecture and technology have resulted in workstations in the 100+ MIPS range. As well, newer local-area networks such as ATM promise a ten- to hundred-fold increase in throughput, much reduced latency, greater scalability, and greatly increased reliability, when compared to ..."
Abstract
-
Cited by 69 (2 self)
- Add to MetaCart
Advances in processor architecture and technology have resulted in workstations in the 100+ MIPS range. As well, newer local-area networks such as ATM promise a ten- to hundred-fold increase in throughput, much reduced latency, greater scalability, and greatly increased reliability, when compared to current LANs such as Ethernet. We believe that these new network and processor technologies will permit tighter coupling of distributed systems at the hardware level, and that distributed systems software should be designed to benefit from that tighter coupling. In this paper, we propose an alternative way of structuring distributed systems that takes advantage of a communication model based on remote network access (reads and writes) to protected memory segments. A key feature of the new structure, directly supported by the communication model, is the separation of data transfer and control transfer. This is in contrast to the structure of traditional distributed systems, which are typical...
Low-Latency Communication over ATM Networks using Active Messages
- IEEE Micro
, 1995
"... Recent developments in communication architectures for parallel machines have made significant progress and reduced the communication overheads and latencies by over an order of magnitude as compared to earlier proposals. This paper examines whether these techniques can carry over to clusters of wor ..."
Abstract
-
Cited by 68 (0 self)
- Add to MetaCart
Recent developments in communication architectures for parallel machines have made significant progress and reduced the communication overheads and latencies by over an order of magnitude as compared to earlier proposals. This paper examines whether these techniques can carry over to clusters of workstations connected by an ATM network even though clusters use standard operating system software, are equipped with network interfaces optimized for stream communication, do not allow direct protected user-level access to the network, and use networks without reliable transmission or flow control. In a first part, this paper describes the differences in communication characteristics between clusters of workstations built from standard hardware and software components and state-of-the-art multiprocessors. The lack of flow control and of operating system coordination affects the communication layer design significantly and requires larger buffers at each end than on multiprocessors. A second ...
Implementation of a Reliable Remote Memory Pager
- In USENIX Annual Technical Conference
, 1996
"... Traditional operating systems use magnetic disks as paging devices, even though the cost of a disk transfer measured in processor cycles continues to increase. In this paper we explore the use of remote main memory for paging. We describe the design, implementation and evaluation of a pager that use ..."
Abstract
-
Cited by 52 (8 self)
- Add to MetaCart
Traditional operating systems use magnetic disks as paging devices, even though the cost of a disk transfer measured in processor cycles continues to increase. In this paper we explore the use of remote main memory for paging. We describe the design, implementation and evaluation of a pager that uses main memory of remote workstations as a faster-than-disk paging device and provides reliability in case of single workstation failures. Our pager has been implemented as a block device driver linked to the DEC OSF/1 operating system, without any modifications to the kernel code. Using several test applications we measure the performance of remote memory paging over an Ethernet interconnection network and find it to be faster than traditional disk paging. We evaluate the performance of various reliability policies and prove their feasibility even over low bandwidth networks, like Ethernet. We conclude that the benefits of reliable remote memory paging in workstation clusters are significant...
Experience with Active Messages on the Meiko CS-2
- In 9th International Parallel Processing Symposium
, 1995
"... Active messages provide a low latency communication architecture which on modern parallel machines achieves more than an order of magnitude performance improvement over more traditional communication libraries. This paper discusses the experience we gained while implementing active messages on the M ..."
Abstract
-
Cited by 45 (10 self)
- Add to MetaCart
Active messages provide a low latency communication architecture which on modern parallel machines achieves more than an order of magnitude performance improvement over more traditional communication libraries. This paper discusses the experience we gained while implementing active messages on the Meiko CS-2, and discusses implementations for similar architectures. During our work we have identified that architectures which only support efficient remote write operations (or DMA transfers as in the case of the CS-2) make it difficult to transfer both data and control as required by active messages. Traditional network interfaces avoid this problem because they have a single point of entry which essentially acts as a queue. To efficiently support active messages on modern network communication co-processors, hardware primitives are required which support this queue behavior. We overcame this problem by producing specialized code which runs on the communications co-processor and supports ...
Performance Evaluation of a Cluster-Based Multiprocessor Built from ATM Switches and Bus-Based Multiprocessor Servers
- in The 2nd IEEE Symposium on High-Performance Computer Architecture
, 1996
"... We consider a network of workstations (NOW) organization consisting of a number of bus-based multiprocessor servers interconnected by an ATM switch. A shared-memory model is supported by distributed virtual shared memory (DVSM) and this paper focuses on the access penalties incurred by (1) ATM ..."
Abstract
-
Cited by 31 (1 self)
- Add to MetaCart
We consider a network of workstations (NOW) organization consisting of a number of bus-based multiprocessor servers interconnected by an ATM switch. A shared-memory model is supported by distributed virtual shared memory (DVSM) and this paper focuses on the access penalties incurred by (1) ATM and (2) the DVSM software. First, through detailed architectural simulations we find that while the bandwidth and the latency of the ATM switch fabrics are found to be acceptable, the latency incurred by commercially available ATM interfaces has a first order effect on the performance. We also study the effects of various scheduling policies for the coherence handlers. Our data suggest that since the probability of finding an idle processor within a cluster is high, a good policy is to schedule it there instead of letting an extra compute processor execute coherence handlers. Overall, by adjusting the adaptation layer of ATM to a DVSM system we find that ATM is a promising technology for these kinds of systems. 1
Multicast on Irregular Switch-based Networks with Wormhole Routing
- IN PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA-3
, 1997
"... This paper presents efficient multicasting with reduced contention on irregular networks with switchbased wormhole interconnection and unicast message passing. First, it is proved that for an arbitrary irregular network with a typical deadlock-free, adaptive routing, it may not be possible to create ..."
Abstract
-
Cited by 31 (10 self)
- Add to MetaCart
This paper presents efficient multicasting with reduced contention on irregular networks with switchbased wormhole interconnection and unicast message passing. First, it is proved that for an arbitrary irregular network with a typical deadlock-free, adaptive routing, it may not be possible to create an ordered list of nodes to implement an arbitrary multicast in a contention-free manner with minimal number of communication steps. Next, three different multicast algorithms are proposed with their respective node orderings to reduce contention: switchbased ordering (SO), switch-based hierarchical ordering (SHO), and chain concatenation ordering (CCO). A variation of a binomial tree-based communication pattern with unicast message passing is used on the above ordered lists to implement multicast. The proposed multicast algorithms are compared with each other as well as with the naive random ordering (RO) algorithm for a range of system sizes, switch sizes, message lengths, degrees of co...
Early Experience with Message-Passing on the SHRIMP Multicomputer
- IN PROCEEDINGS OF THE 23RD ANNUAL SYMPOSIUM ON COMPUTER ARCHITECTURE
, 1996
"... The SHRIMP multicomputer provides virtual memorymapped communication (VMMC), which supports protected, user-level message passing, allows user programs to perform their own buffer management, and separates data transfers from control transfers so that a data transfer can be done without the interven ..."
Abstract
-
Cited by 27 (13 self)
- Add to MetaCart
The SHRIMP multicomputer provides virtual memorymapped communication (VMMC), which supports protected, user-level message passing, allows user programs to perform their own buffer management, and separates data transfers from control transfers so that a data transfer can be done without the intervention of the receiving node CPU. An important question is whether such a mechanism can indeed deliver all of the available hardware performance to applications which use conventional message-passing libraries. This paper
Exploiting Two-Case Delivery for Fast Protected Messaging
- In HPCA
, 1998
"... We propose and evaluate two complementary techniques to protect and virtualize a tightly-coupled network interface in a multicomputer. The techniques allow efficient, direct application access to network hardware in a multiprogrammed environment while gaining most of the benefits of a memory-based n ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
We propose and evaluate two complementary techniques to protect and virtualize a tightly-coupled network interface in a multicomputer. The techniques allow efficient, direct application access to network hardware in a multiprogrammed environment while gaining most of the benefits of a memory-based network interface. First, two-case delivery allows an application to receive a message directly from the network hardware in ordinary circumstances, but provides buffering transparently when required for protection. Second, virtual buffering stores messages in virtual memory on demand, providing the convenience of effectively unlimited buffer capacity while keeping actual physical memory consumption low. The evaluation is based on workloads of real and synthetic applications running on a simulator and partly on emulated hardware. The results show that the direct path is also the common path, justifying the use of software buffering. Further results show that physical buffering requirements ...

