Results 1 -
6 of
6
Coherent network interfaces for fine-grain communication
- In Proceedings of the 23rd Annual International Symposium on Computer Architecture
, 1996
"... Historically, processor accesses to memory-mapped device registers huve been marked uncachable to insure their visibili ~ to the device. The ubiquity of snooping cache coherence, howeveg makes it possible for processors and devices to interact with cachable, coherent memory operations. Using coheren ..."
Abstract
-
Cited by 48 (14 self)
- Add to MetaCart
Historically, processor accesses to memory-mapped device registers huve been marked uncachable to insure their visibili ~ to the device. The ubiquity of snooping cache coherence, howeveg makes it possible for processors and devices to interact with cachable, coherent memory operations. Using coherence can improve performance by facilitating burst transfers of whole cache blocks and reducing control overheads (e. g., for polling). This paper begins an exploration of network inter-jtces (NIs) that use coherence—coherent network interfaces (CNIs)--to improve communication performance, We restrict this study to NI/ CNIS that reside on coherent memoty or I/O buses, to NVCNIS that are much simpler than processors, and to the pe~ormance of&egrain messagingfiom user process to user process. Our jirst contribution is to develop and optimize two mechanisms that CNIS use to communicate with processors. A cachable device register—derived from cachable control registers [39)40]— is a coherent, cachable block of memory used to transfer status, control, or data between a device and a processor Cachable queues generalize cachable device registers from one cachable, coherent memory block to a contiguous region of cachable, coherent blocks managed as a circular queue. Our second contribution is a taxonomy and comparison of four CNIS with a more conventional NI. Microbenchmark results show that CNIS can improve the round-trip latency and achievable bandwidth of a small 64-byte message by 37 % and 125 % respectively on the memory bus and 74 % and 123 % respectively on a coherent 1/0 bus. Experiments with jive macrobenchmarks show that CNIS can improve the pe~ormance by 17-5370 on the memory bus and 30-88 % on the I/O bus.
Schematic: A Concurrent Object-Oriented Extension to Scheme
- In Proceedings of Workshop on Object-Based Parallel and Distributed Computation, number 1107 in Lecture Notes in Computer Science
, 1996
"... A concurrent object-oriented extension to the programming language Scheme, called Schematic, is described. Schematic supports familiar constructs often used in typical parallel programs (future and higher-level macros such as plet and pbegin), which are actually defined atop a very small number of f ..."
Abstract
-
Cited by 18 (12 self)
- Add to MetaCart
A concurrent object-oriented extension to the programming language Scheme, called Schematic, is described. Schematic supports familiar constructs often used in typical parallel programs (future and higher-level macros such as plet and pbegin), which are actually defined atop a very small number of fundamental primitives. In this way, Schematic achieves both the convenience for typical concurrent programming and simplicity and flexibility of the language kernel. Schematic also supports concurrent objects which exhibit more natural and intuitive behavior than the "bare" (unprotected) shared memory, and permit intra-object concurrency. Schematic will be useful for intensive parallel applications on parallel machines or networks of workstations, concurrent graphical user interface programming, distributed programming over network, and even concurrent shell programming.
Incorporating Locality Management into Garbage Collection in Massively Parallel Object-Oriented Languages
- In Joint Symposium on Parallel Processing (JSPP
, 1993
"... This paper discusses how locality between objects affects the performance, and proposes a software architecture for enhancing locality while keeping load-balance reasonable at the minimum sacrifice of runtime overhead. Objects are created locally by default and long-lived objects are selectively mig ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper discusses how locality between objects affects the performance, and proposes a software architecture for enhancing locality while keeping load-balance reasonable at the minimum sacrifice of runtime overhead. Objects are created locally by default and long-lived objects are selectively migrated during garbage collection. By enhancing locality, message passings are likely to be local and objects are likely to be referred to from only local objects, thus they are quickly reclaimed when becoming garbage. By integrating migration process into garbage collection, load-balance is achieved and information useful for migration (e.g., reference counting) are collected at a low cost during garbage collection. 1 Introduction 1.1 Why Locality is Important When we spawn a new concurrent object (task), where should the new object be located? Should the object be created on the local node, i.e., on the same node where the creater object resides, or on a remote node, i.e., which is some o...
Design and Evaluation of Network Interfaces for System Area Networks
, 1998
"... Much of a computer's communication performance is determined by how well it interacts with networks. Such interaction is critical for latency-sensitive applications, such as parallel programs that send frequent, short messages. Fortunately, networks have improved dramatically, especially System Area ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Much of a computer's communication performance is determined by how well it interacts with networks. Such interaction is critical for latency-sensitive applications, such as parallel programs that send frequent, short messages. Fortunately, networks have improved dramatically, especially System Area Networks (SANs). SANs provide submicrosecond latency, gigabytes per second bandwidth, and very high reliability to 10-100 hosts. Unfortunately, this dramatic improvement in network performance is seldom delivered to applications. A key bottleneck is the host network interface (NI), which connects a network to a host computer. For example, conventional NIs are usually accessed via direct memory access or uncached, memory-mapped device registers, which can incur latencies between ten and hundreds of microseconds. This thesis investigates novel techniques to improve interactions between a processor and a SAN NI. A key principle underlies these techniques: treat NI access as regular, sideeffec...
Efficient Implementations of Concurrent Object-Oriented Languages on Multicomputers
, 1992
"... : Novel software technologies for implementing concurrent object-oriented languages on different types of multicomputers (including stock multicomputers) are presented. Performance numbers suggest that concurrent object-oriented programming on currently available multicomputers is highly viable and ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
: Novel software technologies for implementing concurrent object-oriented languages on different types of multicomputers (including stock multicomputers) are presented. Performance numbers suggest that concurrent object-oriented programming on currently available multicomputers is highly viable and promising in performance, thus allowing us exploit the computing power and modeling power provided in the concurrent object-oriented paradigm. 1 Introduction The trend toward object-oriented (OO) software construction is becoming more and more prevalent. Important software concepts such as encapsulation promote a high degree of code re-use and clean architectural structuring of large software. High-performance parallel programming, previously performed in the context of more conventional programming languages, would also be able to enjoy the benefit from the OO technology with appropriate OO languages and systems. Although many OO languages currently in use today (such as C++[15] and Smallt...
The Impact of Message Traffic on Multicomputer Memory
- Concurrent Systems Architecture Group Memo, University of Illinois at Urbana-Champaign
, 1994
"... Multicomputer cache performance is highly sensitive to interprocessor message traffic. The widening gap between microprocessor speeds and primary memory latencies means slight increases in cache miss rate can have a severe impact on application performance. It is therefore critical to reduce cach ..."
Abstract
- Add to MetaCart
Multicomputer cache performance is highly sensitive to interprocessor message traffic. The widening gap between microprocessor speeds and primary memory latencies means slight increases in cache miss rate can have a severe impact on application performance. It is therefore critical to reduce cache misses. While there are a number of factors that may contribute to an increase in cache misses, of particular concern is multicomputer message traffic. In this paper, we examine the extent to which handling message traffic increases cache misses.

