Results 1 - 10
of
82
Dynamo: A Transparent Dynamic Optimization System
- ACM SIGPLAN Notices
, 2000
"... We describe the design and implementation of Dynamo, a software dynamic optimization system that is capable of transparently improving the performance of a native instruction stream as it executes on the processor. The input native instruction stream to Dynamo can be dynamically generated (by a JIT ..."
Abstract
-
Cited by 347 (1 self)
- Add to MetaCart
We describe the design and implementation of Dynamo, a software dynamic optimization system that is capable of transparently improving the performance of a native instruction stream as it executes on the processor. The input native instruction stream to Dynamo can be dynamically generated (by a JIT for example), or it can come from the execution of a statically compiled native binary. This paper evaluates the Dynamo system in the latter, more challenging situation, in order to emphasize the limits, rather than the potential, of the system. Our experiments demonstrate that even statically optimized native binaries can be accelerated Dynamo, and often by a significant degree. For example, the average performance of --O optimized SpecInt95 benchmark binaries created by the HP product C compiler is improved to a level comparable to their --O4 optimized version running without Dynamo. Dynamo achieves this by focusing its efforts on optimization opportunities that tend to manifest only at runtime, and hence opportunities that might be difficult for a static compiler to exploit. Dynamo's operation is transparent in the sense that it does not depend on any user annotations or binary instrumentation, and does not require multiple runs, or any special compiler, operating system or hardware support. The Dynamo prototype presented here is a realistic implementation running on an HP PA-8000 workstation under the HPUX 10.20 operating system.
DPF: Fast, Flexible Message Demultiplexing using Dynamic Code Generation
- In ACM Communication Architectures, Protocols, and Applications (SIGCOMM
, 1996
"... Fast and flexible message demultiplexing are well-established goals in the networking community [1, 18, 22]. Currently, however, network architects have had to sacrifice one for the other. We present a new packet-filter system, DPF (Dynamic Packet Filters), that provides both the traditional flexibi ..."
Abstract
-
Cited by 116 (10 self)
- Add to MetaCart
Fast and flexible message demultiplexing are well-established goals in the networking community [1, 18, 22]. Currently, however, network architects have had to sacrifice one for the other. We present a new packet-filter system, DPF (Dynamic Packet Filters), that provides both the traditional flexibility of packet filters [18] and the speed of hand-crafted demultiplexing routines [3]. DPF filters run 10--50 times faster than the fastest packet filters reported in the literature [1, 17, 18, 27]. DPF's performance is either equivalent to or, when it can exploit runtime information, superior to handcoded demultiplexors. DPF achieves high performance by using a carefully-designed declarative packet-filter language that is aggressively optimized using dynamic code generation. The contributions of this work are: (1) a detailed description of the DPF design, (2) discussion of the use of dynamic code generation and quantitative results on its performance impact, (3) quantitative results on how ...
Fine-grained dynamic instrumentation of commodity operating system kernels
, 1999
"... We have developed a technology, fine-grained dynamic instrumentation of commodity kernels, which can splice (insert) dynamically generated code before almost any machine code instruction of a completely unmodified running commodity operating system kernel. This technology is well-suited to performan ..."
Abstract
-
Cited by 107 (5 self)
- Add to MetaCart
We have developed a technology, fine-grained dynamic instrumentation of commodity kernels, which can splice (insert) dynamically generated code before almost any machine code instruction of a completely unmodified running commodity operating system kernel. This technology is well-suited to performance profiling, debugging, code coverage, security auditing, runtime code optimizations, and kernel extensions. We have designed and implemented a tool called KernInst that performs dynamic instrumentation on a stock production Solaris kernel running on an UltraSPARC. On top of KernInst, we have implemented a kernel performance profiling tool, and used it to understand kernel and application performance under a Web proxy server workload. We used this information to make two changes (one to the kernel, one to the proxy) that cumulatively reduce the percentage of elapsed time that the proxy spends opening disk cache files from 40 % to 7%. 1
Optimizing Direct Threaded Code By Selective Inlining
- In SIGPLAN ’98 Conference on Programming Language Design and Implementation
, 1998
"... Achieving good performance in bytecoded language interpreters is difficult without sacrificing both simplicity and portability. This is due to the complexity of dynamic translation ("just-in-time compilation") of bytecodes into native code, which is the mechanism employed universally by highperforma ..."
Abstract
-
Cited by 69 (1 self)
- Add to MetaCart
Achieving good performance in bytecoded language interpreters is difficult without sacrificing both simplicity and portability. This is due to the complexity of dynamic translation ("just-in-time compilation") of bytecodes into native code, which is the mechanism employed universally by highperformance interpreters. We demonstrate that a few simple techniques make it possible to create highly-portable dynamic translators that can attain as much as 70% the performance of optimized C for certain numerical computations. Translators based on such techniques can offer respectable performance without sacrificing either the simplicity or portability of much slower "pure" bytecode interpreters. Keywords: bytecode interpretation, threaded code, inlining, dynamic translation, just-in-time compilation. 1 Introduction Bytecoded languages such as Smalltalk [Gol83], Caml [Ler97] and Java [Arn96, Lin97] offer significant engineering advantages over more conventional languages: higher levels of abst...
Flexible representation analysis
- IN ACM SIGPLAN INTERNATIONAL CONFERENCE ON FUNCTIONAL PROGRAMMING
, 1997
"... Statically typed languages with Hindley-Milner polymorphism have long been compiled using inefficient and fully boxed data representations. Recently, several new compilation methods have been proposed to support more efficient and unboxed multi-word representations. Unfortunately, none of these tech ..."
Abstract
-
Cited by 63 (14 self)
- Add to MetaCart
Statically typed languages with Hindley-Milner polymorphism have long been compiled using inefficient and fully boxed data representations. Recently, several new compilation methods have been proposed to support more efficient and unboxed multi-word representations. Unfortunately, none of these techniques is fully satisfactory. For example, Leroy's coercion-based approach does not handle recursive data types and mutable types well. The type-passing approach (proposed by Harper and Morrisett) handles all data objects, but it involves extensive runtime type analysis and code manipulations. This paper presents a new flexible representation analysis technique that combines the best of both approaches. Our new scheme supports unboxed representations for recursive and mutable types, yet it only requires little runtime type analysis. In fact, we show that there is a continuum of possibilities between the coercion-based approach and the type-passing approach. By varying the amount of boxing an...
ASHs: Application-Specific Handlers for High-Performance Messaging
- IN ACM COMMUNICATION ARCHITECTURES, PROTOCOLS, AND APPLICATIONS (SIGCOMM ’96
, 1996
"... Application-specific safe message handlers (ASHs) are designed to provide applications with hardware-level network performance. ASHs are user-written code fragments that safely and efficiently execute in the kernel in response to message arrival. ASHs can direct message transfers (thereby eliminatin ..."
Abstract
-
Cited by 59 (11 self)
- Add to MetaCart
Application-specific safe message handlers (ASHs) are designed to provide applications with hardware-level network performance. ASHs are user-written code fragments that safely and efficiently execute in the kernel in response to message arrival. ASHs can direct message transfers (thereby eliminating copies) and send messages (thereby reducing send-response latency). In addition, the ASH system provides support for dynamic integrated layer processing (thereby eliminating duplicate message traversals) and dynamic protocol composition (thereby supporting modularity). ASHs provide this high degree of flexibility while still providing network performance as good as, or (if they exploit application-specific knowledge) even better than, hard-wired in-kernel implementations. A combination of user-level microbenchmarks and end-to-end system measurements using TCP demonstrate the benefits of the ASH system.
tcc: A System for Fast, Flexible, and High-level Dynamic Code Generation
- IN PROCEEDINGS OF THE ACM SIGPLAN '97 CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION
, 1997
"... tcc is a compiler that provides efficient and high-level access to dynamic code generation. It implements the `C ("Tick-C") programming language, an extension of ANSI C that supports dynamic code generation [15]. `C gives power and flexibility in specifying dynamically generated code: whereas most o ..."
Abstract
-
Cited by 55 (3 self)
- Add to MetaCart
tcc is a compiler that provides efficient and high-level access to dynamic code generation. It implements the `C ("Tick-C") programming language, an extension of ANSI C that supports dynamic code generation [15]. `C gives power and flexibility in specifying dynamically generated code: whereas most other systems use annotations to denote run-time invariants, `C allows the programmer to specify and compose arbitrary expressions and statements at run time. This degree of control is needed to efficiently implement some of the most important applications of dynamic code generation, such as "just in time" compilers [17] and efficient simulators [10, 48, 46]. The paper focuses on the techniques that allow tcc to provide `C's flexibility and expressiveness without sacrificing run-time code generation efficiency. These techniques include fast register allocation, efficient creation and composition of dynamic code specifications, and link-time analysis to reduce the size of dynamic code generato...
Dynamic Feedback: An Effective Technique for Adaptive Computing
, 1997
"... This paper presents dynamic feedback, a technique that enables computations to adapt dynamically to different execution environments. A compiler that uses dynamic feedback produces several different versions of the same source code; each version uses a different optimization policy. The generated co ..."
Abstract
-
Cited by 51 (3 self)
- Add to MetaCart
This paper presents dynamic feedback, a technique that enables computations to adapt dynamically to different execution environments. A compiler that uses dynamic feedback produces several different versions of the same source code; each version uses a different optimization policy. The generated code alternately performs sampling phases and production phases. Each sampling phase measures the overhead of each version in the current environment. Each production phase uses the version with the least overhead in the previous sampling phase. The computation periodically resamples to adjust dynamically to changes in the environment.
High-Level Adaptive Program Optimization with ADAPT
, 2001
"... Compile-time optimization is often limited by a lack of target machine and input data set knowledge. Without this information, compilers may be forced to make conservative assumptions to preserve correctness and to avoid performance degradation. In order to cope with this lack of information at comp ..."
Abstract
-
Cited by 50 (7 self)
- Add to MetaCart
Compile-time optimization is often limited by a lack of target machine and input data set knowledge. Without this information, compilers may be forced to make conservative assumptions to preserve correctness and to avoid performance degradation. In order to cope with this lack of information at compile-time, adaptive and dynamic systems can be used to perform optimization at runtime when complete knowledge of input and machine parameters is available. This paper presents a compiler-supported high-level adaptive optimization system. Users describe, in a domain specific language, optimizations performed by stand-alone optimization tools and backend compiler flags, as well as heuristics for applying these optimizations dynamically at runtime. The ADAPT compiler reads these descriptions and generates application-specific runtime systems to apply the heuristics. To facilitate the usage of existing tools and compilers, overheads are minimized by decoupling optimization from execution. Our system, ADAPT, supports a range of paradigms proposed recently, including dynamic compilation, parameterization and runtime sampling. We demonstrate our system by applying several optimization techniques to a suite of benchmarks on two target machines. ADAPT is shown to consistently outperform statically generated executables, improving performance by as much as 70%.
Transparent dynamic optimization: The design and implementation of Dynamo
, 1999
"... dynamic optimization, compiler, trace selection, binary translation © Copyright Hewlett-Packard Company 1999 Dynamic optimization refers to the runtime optimization of a native program binary. This report describes the design and implementation of Dynamo, a prototype dynamic optimizer that is capabl ..."
Abstract
-
Cited by 49 (4 self)
- Add to MetaCart
dynamic optimization, compiler, trace selection, binary translation © Copyright Hewlett-Packard Company 1999 Dynamic optimization refers to the runtime optimization of a native program binary. This report describes the design and implementation of Dynamo, a prototype dynamic optimizer that is capable of optimizing a native program binary at runtime. Dynamo is a realistic implementation, not a simulation, that is written entirely in user-level software, and runs on a PA-RISC machine under the HPUX operating system. Dynamo does not depend on any special programming language,

