Results 1 - 10
of
25
Building reliable, high-performance communication systems from components
- OPERATING SYSTEMS REVIEW
, 1999
"... Although building systems from components has attractions, this approach also has problems. Can we be sure that a certain configuration of components is correct? Can it perform as well as a monolithic system? Our paper answers these questions for the Ensemble communication architecture by showing ho ..."
Abstract
-
Cited by 68 (25 self)
- Add to MetaCart
Although building systems from components has attractions, this approach also has problems. Can we be sure that a certain configuration of components is correct? Can it perform as well as a monolithic system? Our paper answers these questions for the Ensemble communication architecture by showing how, with help of the Nuprl formal system, configurations may be checked against specifications, and how optimized code can be synthesized from these configurations. The performance results show that we can substantially reduce end-to-end latency in the already optimized Ensemble system. Finally, we discuss whether the techniques we used are general enough for systems other than communication systems.
A Stream Compiler for Communication-Exposed Architectures
- In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems
, 2002
"... With the increasing miniaturization of transistors, wire delays are becoming a dominant factor in microprocessor performance. To address this issue, a number of emerging architectures contain replicated processing units with software-exposed communication between one unit and another (e.g., Raw, iWa ..."
Abstract
-
Cited by 61 (16 self)
- Add to MetaCart
With the increasing miniaturization of transistors, wire delays are becoming a dominant factor in microprocessor performance. To address this issue, a number of emerging architectures contain replicated processing units with software-exposed communication between one unit and another (e.g., Raw, iWarp, SmartMemories). However, for their use to be widespread, it will be necessary to develop compiler technology that enables a portable, high-level language to execute efficiently across a range of wireexposed architectures.
ASHs: Application-Specific Handlers for High-Performance Messaging
- IN ACM COMMUNICATION ARCHITECTURES, PROTOCOLS, AND APPLICATIONS (SIGCOMM ’96
, 1996
"... Application-specific safe message handlers (ASHs) are designed to provide applications with hardware-level network performance. ASHs are user-written code fragments that safely and efficiently execute in the kernel in response to message arrival. ASHs can direct message transfers (thereby eliminatin ..."
Abstract
-
Cited by 59 (11 self)
- Add to MetaCart
Application-specific safe message handlers (ASHs) are designed to provide applications with hardware-level network performance. ASHs are user-written code fragments that safely and efficiently execute in the kernel in response to message arrival. ASHs can direct message transfers (thereby eliminating copies) and send messages (thereby reducing send-response latency). In addition, the ASH system provides support for dynamic integrated layer processing (thereby eliminating duplicate message traversals) and dynamic protocol composition (thereby supporting modularity). ASHs provide this high degree of flexibility while still providing network performance as good as, or (if they exploit application-specific knowledge) even better than, hard-wired in-kernel implementations. A combination of user-level microbenchmarks and end-to-end system measurements using TCP demonstrate the benefits of the ASH system.
Experience with a Language for Writing Coherence Protocols
, 1997
"... In this paper we describe our experience with Teapot [7], a domain-specific language for addressing the cache coherence problem. The cache coherence problem arises when parallel and distributed computing systems make local replicas of shared data for reasons of scalability and performance. ..."
Abstract
-
Cited by 24 (7 self)
- Add to MetaCart
In this paper we describe our experience with Teapot [7], a domain-specific language for addressing the cache coherence problem. The cache coherence problem arises when parallel and distributed computing systems make local replicas of shared data for reasons of scalability and performance.
StreamIt: A Compiler for Streaming Applications
, 2001
"... Streaming programs represent an increasingly important and widespread class of applications that holds unprecedented opportunities for high-impact compiler technology. Unlike sequential programs with obscured dependence information and complex communication patterns, a stream program is naturally wr ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
Streaming programs represent an increasingly important and widespread class of applications that holds unprecedented opportunities for high-impact compiler technology. Unlike sequential programs with obscured dependence information and complex communication patterns, a stream program is naturally written as a set of concurrent filters with regular steady-state communication. The StreamIt language aims to provide a natural, high-level syntax that improves programmer productivity in the streaming domain. At the same time, the language imposes a hierarchical structure on the stream graph that enables novel representations and optimizations within the StreamIt compiler. We define the "stream dependence function", a fundamental relationship between the input channels of two filters in a stream graph. We also describe a suite of stream optimizations, a denotational semantics for validating these optimizations, and a novel phased scheduling algorithm for stream graphs. In addition, we have implemented a prototype of the StreamIt optimizing compiler that is showing promising results.
Cache Aware Optimization of Stream Programs
- In LCTES
, 2005
"... Abstract Effective use of the memory hierarchy is critical for achievinghigh performance on embedded systems. We focus on the class of ..."
Abstract
-
Cited by 13 (8 self)
- Add to MetaCart
Abstract Effective use of the memory hierarchy is critical for achievinghigh performance on embedded systems. We focus on the class of
Design and Implementation of a Modular, Flexible, and Fast System for Dynamic Protocol Composition
, 1996
"... Distributed systems must communicate. To communicate at all requires high-level protocols be built with manageable complexity. To communicate well requires protocols efficient both in design and implementation. The ASH system provides mechanisms to address both of these needs. To manage complexity, ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
Distributed systems must communicate. To communicate at all requires high-level protocols be built with manageable complexity. To communicate well requires protocols efficient both in design and implementation. The ASH system provides mechanisms to address both of these needs. To manage complexity, it provides a simple interface that allows protocols to be dynamically composed in a modular manner. As a result, the complexity of building a high-level messaging service is reduced since it can be built from multiple, independent pieces at runtime. To provide efficiency, the ASH interface is designed so that the message processing steps of each protocol (such as checksumming, byteswapping, encryption, etc.) can be dynamically integrated into a single specialized loop that touches each byte of the message at most once. The ASH system is the first to dynamically integrate the data processing elements of each protocol. This ability is crucial since without it dynamic protocol composition cann...
Automatic Data Mapping of Signal Processing Applications
, 1997
"... This paper presents a technique to map automatically a complete digital signal processing (DSP) application onto a parallel machine with distributed memory. Unlike other applications where coarse or medium grain scheduling techniques can be used, DSP applications integrate several thousand of tasks ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
This paper presents a technique to map automatically a complete digital signal processing (DSP) application onto a parallel machine with distributed memory. Unlike other applications where coarse or medium grain scheduling techniques can be used, DSP applications integrate several thousand of tasks and hence necessitate fine grain considerations. Moreover finding an effective mapping imperatively require to take into account both architectural resources constraints and real time constraints. The main contribution of this paper is to show how it is possible to handle and to solve data partitioning, and fine-grain scheduling under the above operational constraints using Concurrent Constraints Logic Programming languages (CCLP). Our concurrent resolution technique undertaking linear and non linear constraints takes advantage of the special features of signal processing applications and provides a solution equivalent to a manual solution for the representative Panoramic Analysis (PA) appli...
Continuations and transducer composition
- In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation
, 2006
"... On-line transducers are an important class of computational agent; we construct and compose together many software systems using them, such as stream processors, layered network protocols, DSP networks and graphics pipelines. We show an interesting use of continuations, that, when taken in a CPS set ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
On-line transducers are an important class of computational agent; we construct and compose together many software systems using them, such as stream processors, layered network protocols, DSP networks and graphics pipelines. We show an interesting use of continuations, that, when taken in a CPS setting, exposes the control flow of these systems. This enables a CPS-based compiler to optimise systems composed of these transducers, using only standard, known analyses and optimisations. Critically, the analysis permits optimisation across the composition of these transducers, allowing efficient construction of systems in a hierarchical way.

