• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Shade: A Fast Instruction-Set Simulator for Execution Profiling (1994)

by Bob Cmelik , David Keppel
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 237
Next 10 →

Dynamic storage allocation: A survey and critical review

by Paul R. Wilson, Mark S. Johnstone, Michael Neely, David Boles , 1995
"... Dynamic memory allocation has been a fundamental part of most computer systems since roughly 1960, and memory allocation is widely considered to be either a solved problem or an insoluble one. In this survey, we describe a variety of memory allocator designs and point out issues relevant to their de ..."
Abstract - Cited by 187 (6 self) - Add to MetaCart
Dynamic memory allocation has been a fundamental part of most computer systems since roughly 1960, and memory allocation is widely considered to be either a solved problem or an insoluble one. In this survey, we describe a variety of memory allocator designs and point out issues relevant to their design and evaluation. We then chronologically survey most of the literature on allocators between 1961 and 1995. (Scores of papers are discussed, in varying detail, and over 150 references are given.) We argue that allocator designs have been unduly restricted by an emphasis on mechanism, rather than policy, while the latter is more important; higher-level strategic issues are still more important, but have not been given much attention. Most theoretical analyses and empirical allocator evaluations to date have relied on very strong assumptions of randomness and independence, but real program behavior exhibits important regularities that must be exploited if allocators are to perform well in practice.

Valgrind: A program supervision framework

by Nicholas Nethercote, Julian Seward - In Third Workshop on Runtime Verification (RV’03 , 2003
"... a;1 ..."
Abstract - Cited by 183 (6 self) - Add to MetaCart
Abstract not found

DAISY: Dynamic Compilation for 100% Architectural Compatibility

by Kemal Ebcioglu, Erik R. Altman , 1997
"... Although VLIW architectures offer the advantages of simplicity of design and high issue rates, a major impediment to their use is that they are not compatible with the existing software base. We describe new simple hardware features for a VLIW machine we call DAISY (Dynamically Architected Instructi ..."
Abstract - Cited by 173 (12 self) - Add to MetaCart
Although VLIW architectures offer the advantages of simplicity of design and high issue rates, a major impediment to their use is that they are not compatible with the existing software base. We describe new simple hardware features for a VLIW machine we call DAISY (Dynamically Architected Instruction Set from Yorlaown). DAISY is specifically intended to emulate existing architectures, so that all existing software for an old architecture (including operating system kernel code) runs without changes on the VLIW. Each time a new fragment of code is executed for the first time, the code is translated to VLIW primitives, parallelized and saved in a portion of main memory not visible to the old architecture, by a Firtual Machine Monitor (software) residing in read only memory. Subsequent executions of the same fragment do not require a translation (unless cast out). We discuss the architectural requirements for such a VLIW, to deal with issues including self-modifying code, precise exceptions, and aggressive reordedng of memory references in the presence of strong MP consistency and memory mapped I/O. We have implemented the dynamic parallelization algorithms for the PowerPC architecture. The initial results show high degrees of instruction level parallelism with reasonable translation overhead and memory usage.

Optimizing ML with Run-Time Code Generation

by Peter Lee, Mark Leone - In Proceedings of the ACM SIGPLAN '96 Conference on Programming Language Design and Implementation
"... We describe the design and implementation of a compiler that automatically translates ordinary programs written in a subset of ML into code that generates native code at run time. Run-time code generation can make use of values and invariants that cannot be exploited at compile time, yielding code t ..."
Abstract - Cited by 148 (11 self) - Add to MetaCart
We describe the design and implementation of a compiler that automatically translates ordinary programs written in a subset of ML into code that generates native code at run time. Run-time code generation can make use of values and invariants that cannot be exploited at compile time, yielding code that is often superior to statically optimal code. But the cost of optimizing and generating code at run time can be prohibitive. We demonstrate how compile-time specialization can reduce the cost of run-time code generation by an order of magnitude without greatly affecting code quality. Several benchmark programs are examined, which exhibit an average cost of only six cycles per instruction generated at run time. 1 Introduction In this paper, we describe our experience with a prototype system for run-time code generation. Our system, called Fabius, is a compiler that takes ordinary programs written in a subset of ML and automatically compiles them into native code that generates native c...

Trace-Driven Memory Simulation: A Survey

by Richard A. Uhlig, Trevor N. Mudge - ACM Computing Surveys , 2004
"... This article surveys and analyzes these developments by establishing criteria for evaluating trace-driven methods, and then applies these criteria to describe, categorize, and compare over 50 trace-driven simulation tools. We discuss the strengths and weaknesses of different approaches and show t ..."
Abstract - Cited by 133 (0 self) - Add to MetaCart
This article surveys and analyzes these developments by establishing criteria for evaluating trace-driven methods, and then applies these criteria to describe, categorize, and compare over 50 trace-driven simulation tools. We discuss the strengths and weaknesses of different approaches and show that no single method is best when all criteria, including accuracy, speed, memory, flexibility, portability, expense, and ease of use are considered. In a concluding section, we examine fundamental limitations to trace-driven simulation, and survey some recent developments in memory simulation that may overcome these bottlenecks

An Infrastructure for Adaptive Dynamic Optimization

by Derek Bruening, Timothy Garnett, Saman Amarasinghe , 2003
"... Dynamic optimization is emerging as a promising approach to overcome many of the obstacles of traditional static compilation. But while there are a number of compiler infrastructures for developing static optimizations, there are very few for developing dynamic optimizations. We present a framework ..."
Abstract - Cited by 130 (5 self) - Add to MetaCart
Dynamic optimization is emerging as a promising approach to overcome many of the obstacles of traditional static compilation. But while there are a number of compiler infrastructures for developing static optimizations, there are very few for developing dynamic optimizations. We present a framework for implementing dynamic analyses and optimizations. We provide an interface for building external modules, or clients, for the DynamoRIO dynamic code modification system. This interface abstracts away many low-level details of the DynamoRIO runtime system while exposing a simple and powerful, yet efficient and lightweight, API. This is achieved by restricting optimization units to linear streams of code and using adaptive levels of detail for representing instructions. The interface is not restricted to optimization and can be used for instrumentation, profiling, dynamic translation, etc.. To demonstrate

Understanding data lifetime via whole system simulation

by Jim Chow, Ben Pfaff, Tal Garfinkel, Kevin Christopher, Mendel Rosenblum - In USENIX Security Symposium , 2004
"... Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein. ..."
Abstract - Cited by 124 (4 self) - Add to MetaCart
Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein.

VCODE: A retargetable, extensible, very fast dynamic code generation system

by Dawson R. Engler - IN PLDI ’96: PROCEEDINGS OF THE ACM SIGPLAN 1996 CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION , 1996
"... Dynamic code generation is the creation of executable code at runtime. Such “on-the-fly” code generation is a powerful technique, enabling applications to use runtime information to improve performance by up to an order of magnitude [4, 8, 20, 22, 23]. Unfortunately, previous general-purpose dynamic ..."
Abstract - Cited by 111 (7 self) - Add to MetaCart
Dynamic code generation is the creation of executable code at runtime. Such “on-the-fly” code generation is a powerful technique, enabling applications to use runtime information to improve performance by up to an order of magnitude [4, 8, 20, 22, 23]. Unfortunately, previous general-purpose dynamic code generation systems have been either inefficient or non-portable. We present VCODE, a retargetable, extensible, very fast dynamic code generation system. An important feature of VCODE is that it generates machine code “in-place ” without the use of intermediate data structures. Eliminating the need to construct and consume an intermediate representation at runtime makes VCODE both efficient and extensible. VCODE dynamically generates code at an approximate cost of six to ten instructions per generated instruction, making it over an order of magnitude faster than the most efficient general-purpose code generation system in the literature [10]. Dynamic code generation is relatively well known within the compiler community. However, due in large part to the lack of a publicly available dynamic code generation system, it has remained a curiosity rather than a widely used technique. A practical contribution of this work is the free, unrestricted distribution of the VCODE system, which currently runs on the MIPS, SPARC, and Alpha architectures.

CommBench - A Telecommunications Benchmark for Network Processors

by Tilman Wolf, Mark Franklin - IN PROC. OF IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS , 2000
"... This paper presents a benchmark, CommBench, for use in evaluating and designing telecommunications network processors. The benchmark applications focus on small, computationally intense program kernels typical of the network processor environment. The benchmark ..."
Abstract - Cited by 102 (17 self) - Add to MetaCart
This paper presents a benchmark, CommBench, for use in evaluating and designing telecommunications network processors. The benchmark applications focus on small, computationally intense program kernels typical of the network processor environment. The benchmark

ADAPTIVE OPTIMIZATION FOR SELF: RECONCILING HIGH PERFORMANCE WITH EXPLORATORY PROGRAMMING

by Urs Hölzle , 1994
"... Object-oriented programming languages confer many benefits, including abstraction, which lets the programmer hide the details of an object’s implementation from the object’s clients. Unfortunately, crossing abstraction boundaries often incurs a substantial run-time overhead in the form of frequent p ..."
Abstract - Cited by 95 (6 self) - Add to MetaCart
Object-oriented programming languages confer many benefits, including abstraction, which lets the programmer hide the details of an object’s implementation from the object’s clients. Unfortunately, crossing abstraction boundaries often incurs a substantial run-time overhead in the form of frequent procedure calls. Thus, pervasive use of abstraction, while desirable from a design standpoint, may be impractical when it leads to inefficient programs. Aggressive compiler optimizations can reduce the overhead of abstraction. However, the long compilation times introduced by optimizing compilers delay the programming environment‘s responses to changes in the program. Furthermore, optimization also conflicts with source-level debugging. Thus, programmers are caught on the horns of two dilemmas: they have to choose between abstraction and efficiency, and between responsive programming environments and efficiency. This dissertation shows how to reconcile these seemingly contradictory goals by performing optimizations lazily. Four new techniques work together to achieve high performance and high responsiveness: • Type feedback achieves high performance by allowing the compiler to inline message sends based on information extracted from the runtime system. On average, programs run 1.5 times faster than the previous SELF system; compared to a commercial Smalltalk implementation, two medium-sized benchmarks run about three times faster. This level of performance is obtained with a compiler that is both simpler and faster than previous SELF compilers. • Adaptive optimization achieves high responsiveness without sacrificing performance by using a fast nonoptimizing compiler to generate initial code while automatically recompiling heavily used parts of the program with an optimizing compiler. On a previous-generation workstation like the SPARCstation-2, fewer than 200 pauses exceeded 200 ms during a 50-minute interaction, and 21 pauses exceeded one second. On a currentgeneration workstation, only 13 pauses exceed 400 ms. • Dynamic deoptimization shields the programmer from the complexity of debugging optimized code by transparently recreating non-optimized code as needed. No matter whether a program is optimized or not, it can always be stopped, inspected, and single-stepped. Compared to previous approaches, deoptimization allows more debugging while placing fewer restrictions on the optimizations that can be performed. • Polymorphic inline caching generates type-case sequences on-the-fly to speed up messages sent from the same call site to several different types of object. More significantly, they collect concrete type information for the optimizing compiler. With better performance yet good interactive behavior, these techniques make exploratory programming possible both for pure object-oriented languages and for application domains requiring higher ultimate performance, reconciling exploratory programming, ubiquitous abstraction, and high performance.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University