Results 1 - 10
of
25
Optimally Profiling and Tracing Programs
- ACM Transactions on Programming Languages and Systems
, 1994
"... copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others ..."
Abstract
-
Cited by 256 (17 self)
- Add to MetaCart
copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications
Branch Prediction For Free
, 1993
"... Many compilers rely on branch prediction to improve program performance by identifying frequently executed regions and by aiding in scheduling instructions. Profile-based predictors require a time-consuming and inconvenient compile-profile-compile cycle in order to make predictions. We present a pro ..."
Abstract
-
Cited by 144 (8 self)
- Add to MetaCart
Many compilers rely on branch prediction to improve program performance by identifying frequently executed regions and by aiding in scheduling instructions. Profile-based predictors require a time-consuming and inconvenient compile-profile-compile cycle in order to make predictions. We present a program-based branch predictor that performs well for a large and diverse set of programs written in C and Fortran. In addition to using natural loop analysis to predict branches that control the iteration of loops, we focus on heuristics for predicting non-loop branches, which dominate the dynamic branch count of many programs. The heuristics are simple and require little program analysis, yet they are effective in terms of coverage and miss rate. Although program-based prediction does not equal the accuracy of profile-based prediction, we believe it reaches a sufficiently high level to be useful. Additional type and semantic information available to a compiler would enhance our heuristics. #...
ADAPTIVE OPTIMIZATION FOR SELF: RECONCILING HIGH PERFORMANCE WITH EXPLORATORY PROGRAMMING
, 1994
"... Object-oriented programming languages confer many benefits, including abstraction, which lets the programmer hide
the details of an object’s implementation from the object’s clients. Unfortunately, crossing abstraction boundaries
often incurs a substantial run-time overhead in the form of frequent p ..."
Abstract
-
Cited by 95 (6 self)
- Add to MetaCart
Object-oriented programming languages confer many benefits, including abstraction, which lets the programmer hide
the details of an object’s implementation from the object’s clients. Unfortunately, crossing abstraction boundaries
often incurs a substantial run-time overhead in the form of frequent procedure calls. Thus, pervasive use of abstraction,
while desirable from a design standpoint, may be impractical when it leads to inefficient programs.
Aggressive compiler optimizations can reduce the overhead of abstraction. However, the long compilation times
introduced by optimizing compilers delay the programming environment‘s responses to changes in the program.
Furthermore, optimization also conflicts with source-level debugging. Thus, programmers are caught on the horns of
two dilemmas: they have to choose between abstraction and efficiency, and between responsive programming environments
and efficiency. This dissertation shows how to reconcile these seemingly contradictory goals by performing
optimizations lazily.
Four new techniques work together to achieve high performance and high responsiveness:
• Type feedback achieves high performance by allowing the compiler to inline message sends based on information
extracted from the runtime system. On average, programs run 1.5 times faster than the previous SELF system;
compared to a commercial Smalltalk implementation, two medium-sized benchmarks run about three times faster.
This level of performance is obtained with a compiler that is both simpler and faster than previous SELF compilers.
• Adaptive optimization achieves high responsiveness without sacrificing performance by using a fast nonoptimizing
compiler to generate initial code while automatically recompiling heavily used parts of the program
with an optimizing compiler. On a previous-generation workstation like the SPARCstation-2, fewer than 200
pauses exceeded 200 ms during a 50-minute interaction, and 21 pauses exceeded one second. On a currentgeneration
workstation, only 13 pauses exceed 400 ms.
• Dynamic deoptimization shields the programmer from the complexity of debugging optimized code by
transparently recreating non-optimized code as needed. No matter whether a program is optimized or not, it can
always be stopped, inspected, and single-stepped. Compared to previous approaches, deoptimization allows more
debugging while placing fewer restrictions on the optimizations that can be performed.
• Polymorphic inline caching generates type-case sequences on-the-fly to speed up messages sent from the same
call site to several different types of object. More significantly, they collect concrete type information for the
optimizing compiler.
With better performance yet good interactive behavior, these techniques make exploratory programming possible
both for pure object-oriented languages and for application domains requiring higher ultimate performance, reconciling
exploratory programming, ubiquitous abstraction, and high performance.
Compiler optimization-space exploration
- In Proceedings of the international symposium on Code generation and optimization
, 2003
"... To meet the demands of modern architectures, optimizing compilers must incorporate an ever larger number of increasingly complex transformation algorithms. Since code transformations may often degrade performance or interfere with subsequent transformations, compilers employ predictive heuristics to ..."
Abstract
-
Cited by 87 (1 self)
- Add to MetaCart
To meet the demands of modern architectures, optimizing compilers must incorporate an ever larger number of increasingly complex transformation algorithms. Since code transformations may often degrade performance or interfere with subsequent transformations, compilers employ predictive heuristics to guide optimizations by predicting their effects a priori. Unfortunately, the unpredictability of optimization interaction and the irregularity of today’s wide-issue machines severely limit the accuracy of these heuristics. As a result, compiler writers may temper high variance optimizations with overly conservative heuristics or may exclude these optimizations entirely. While this process results in a compiler capable of generating good average code quality across the target benchmark set, it is at the cost of missed optimization opportunities in individual code segments. To replace predictive heuristics, researchers have proposed compilers which explore many optimization options, selecting the best one a posteriori. Unfortunately, these existing iterative compilation techniques are not practical for reasons of compile time and applicability. In this paper, we present the Optimization-Space Exploration (OSE) compiler organization, the first practical iterative compilation strategy applicable to optimizations in general-purpose compilers. Instead of replacing predictive heuristics, OSE uses the compiler writer’s knowledge encoded in the heuristics to select a small number of promising optimization alternatives for a given code segment. Compile time is limited by evaluating only these alternatives for hot code segments using a general compiletime performance estimator. An OSE-enhanced version of Intel’s highly-tuned, aggressively optimizing production compiler for IA-64 yields a significant performance improvement, more than 20 % in some cases, on Itanium for SPEC codes. 1.
Dynamic Feedback: An Effective Technique for Adaptive Computing
, 1997
"... This paper presents dynamic feedback, a technique that enables computations to adapt dynamically to different execution environments. A compiler that uses dynamic feedback produces several different versions of the same source code; each version uses a different optimization policy. The generated co ..."
Abstract
-
Cited by 51 (3 self)
- Add to MetaCart
This paper presents dynamic feedback, a technique that enables computations to adapt dynamically to different execution environments. A compiler that uses dynamic feedback produces several different versions of the same source code; each version uses a different optimization policy. The generated code alternately performs sampling phases and production phases. Each sampling phase measures the overhead of each version in the current environment. Each production phase uses the version with the least overhead in the previous sampling phase. The computation periodically resamples to adjust dynamically to changes in the environment.
Code-Generation On-the-Fly: A Key to Portable Software
, 1994
"... A technique for representing programs abstractly and independently of the eventual target architecture is presented that yields a file representation twice as compact as machine code for a CISC processor. It forms the basis of an implementation, in which the process of code generation is deferred ..."
Abstract
-
Cited by 48 (19 self)
- Add to MetaCart
A technique for representing programs abstractly and independently of the eventual target architecture is presented that yields a file representation twice as compact as machine code for a CISC processor. It forms the basis of an implementation, in which the process of code generation is deferred until the time of loading. At that point, native code is created on_the_fly by a code_generating loader. The process of loading with dynamic code_generation is so fast that it requires little more time than the input of equivalent native code from a disk storage medium. This is predominantly due to the compactness of the abstract program representation, which allows to counterbalance the ad...
VISTA: A System for Interactive Code Improvement
- in "Proceedings of the joint conference on Languages, compilers
, 2002
"... Software designers face many challenges when developing applications for embedded systems. A major challenge is meeting the conflicting constraints of speed, code density, and power consumption. Traditional optimizing compiler technology is usually of little help in addressing this challenge. To mee ..."
Abstract
-
Cited by 18 (7 self)
- Add to MetaCart
Software designers face many challenges when developing applications for embedded systems. A major challenge is meeting the conflicting constraints of speed, code density, and power consumption. Traditional optimizing compiler technology is usually of little help in addressing this challenge. To meet speed, power, and size constraints, application developers typically resort to hand-coded assembly language. The results are software systems that are not portable, less robust, and more costly to develop and maintain. This paper describes a new code improvement paradigm implemented in a system called vista that can help achieve the cost/ performance trade-offs that embedded applications demand. Unlike traditional compilation systems where the smallest unit of compilation is typically a function and the programmer has no control over the code improvement process other than what types of code improvements to perform, the vista system opens the code improvement process and gives the application programmer, when necessary, the ability to finely control it. In particular, vista allows the application developer to (1) direct the order and scope in which the code improvement phases are applied, (2) manually specify code transformations, (3) undo previously applied transformations, and (4) view the low-level program representation graphically. vista can be used by embedded systems developers to produce applications, by compiler writers to prototype and debug new lowlevel code transformations, and by instructors to illustrate code transformations (e.g., in a compilers course).
Implementation of Stack-Based Languages on Register Machines
, 1996
"... Languages with programmer-visible stacks (stack-based languages) are used widely, as intermediate languages (e.g., JavaVM, FCode), and as languages for human programmers (e.g., Forth, PostScript). However, the prevalent computer architecture is the register machine. This poses the problem of efficie ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Languages with programmer-visible stacks (stack-based languages) are used widely, as intermediate languages (e.g., JavaVM, FCode), and as languages for human programmers (e.g., Forth, PostScript). However, the prevalent computer architecture is the register machine. This poses the problem of efficiently implementing stack-based languages on register machines. A straight-forward implementation of the stack consists of a memory area that contains the stack items, and a pointer to the top-of-stack item. The basic
Optimally Profiling and Tracing
- In Proceedings of the Conference on Principles of Programming Languages
, 1992
"... This paper describes algorithms for inserting monitoring code to profile and trace programs. These algorithms greatly reduce the cost of measuring programs with respect to the commonly used technique of placing code in each basic block. Program profiling counts the number of times each basic block i ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
This paper describes algorithms for inserting monitoring code to profile and trace programs. These algorithms greatly reduce the cost of measuring programs with respect to the commonly used technique of placing code in each basic block. Program profiling counts the number of times each basic block in a program executes. Instruction tracing records the sequence of basic blocks traversed in a program execution. The algorithms optimize the placement of counting/tracing code with respect to the expected or measured frequency of each block or edge in a program’s control-flow graph. We have implemented the algorithms in a profiling/tracing tool, and they substantially reduce the overhead of profiling and tracing. We also define and study the hierarchy of profiling problems. These problems have two dimensions: what is profiled (i.e., vertices (basic blocks) or edges in a control-flow graph) and where the instrumentation code is placed (in blocks or along edges). We compare the optimal solutions to the profiling problems and describe a new profiling problem: basic-block profiling with edge counters. This problem is important because an optimal solution to any other profiling problem (for a given control-flow graph) is never better than an optimal solution to this problem. Unfortunately, finding an optimal placement of edge counters for vertex profiling appears to be a
Technological Steps toward a Software Component Industry
- in J. Gutknecht (Ed.), Programming Languages and System Architectures, Springer Lecture Notes in Computer Science
, 1994
"... . A machine_independent abstract program representation is presented that is twice as compact as machine code for a CISC processor. It forms the basis of an implementation, in which the process of code generation is deferred until the time of loading. Separate compilation of program modules with typ ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
. A machine_independent abstract program representation is presented that is twice as compact as machine code for a CISC processor. It forms the basis of an implementation, in which the process of code generation is deferred until the time of loading. Separate compilation of program modules with type_safe interfaces, and dynamic loading (with code generation) on a per_module basis are both supported. To users of the implemented system, working with modules in the abstract representation is as convenient as working with native object_files, although it leads to several new capabilities. The combination of portability with practicality denotes a step toward a software component industry. 1. Introduction The rapid evolution of hardware technology is constantly influencing software development, for better as well as for worse. On the downside, faster hardware can conceal the complexity and cost of badly_designed programs; Reiser [Rei89] is not far off the mark in observing that s...

