Results 1 -
9 of
9
The Structure and Performance of Interpreters
- In Architectural Support for Programming Languages and Operating Systems (ASPLOS-VII
, 1996
"... Interpreted languages have become increasingly popular due to demands for rapid program development, ease of use, portability, and safety. Beyond the general impression that they are "slow," however, little has been documented about the performance of interpreters as a class of applications. This pa ..."
Abstract
-
Cited by 45 (1 self)
- Add to MetaCart
Interpreted languages have become increasingly popular due to demands for rapid program development, ease of use, portability, and safety. Beyond the general impression that they are "slow," however, little has been documented about the performance of interpreters as a class of applications. This paper examines interpreter performance by measuring and analyzing interpreters from both software and hardware perspectives. As examples, we measure the MIPSI, Java, Perl, and Tcl interpreters running an array of micro and macro benchmarks on a DEC Alpha platform. Our measurements of these interpreters relate performance to the complexity of the interpreter's virtual machine and demonstrate that native runtime libraries can play a key role in providing good performance. From an architectural perspective, we show that interpreter performance is primarily a function of the interpreter itself and is relatively independent of the application being interpreted. We also demonstrate that high-level i...
Register File Design Considerations in Dynamically Scheduled Processors
- In Proceedings of the Second IEEE Symposium on High-Performance Computer Architecture
, 1995
"... We have investigated the register file requirements of dynamically scheduled processors using register renaming and dispatch queues running the SPEC92 benchmarks. We looked at processors capable of issuing either four or eight instructions per cycle and found that in most cases implementing precise ..."
Abstract
-
Cited by 40 (1 self)
- Add to MetaCart
We have investigated the register file requirements of dynamically scheduled processors using register renaming and dispatch queues running the SPEC92 benchmarks. We looked at processors capable of issuing either four or eight instructions per cycle and found that in most cases implementing precise exceptions requires a relatively small number of additional registers compared to imprecise exceptions. Systems with aggressive non-blocking load support were able to achieve performance similar to processors with perfect memory systems at the cost of some additional registers. Given our machine assumptions, we found that the performance of a four-issue machine with a 32-entry dispatch queue tends to saturate around 80 registers. For an eight-issue machine with a 64-entry dispatch queue performance does not saturate until about 128 registers. Assuming the machine cycle time is proportional to the register file cycle time, the 8-issue machine yields only 20% higher performance than the 4-issue machine due in part...
Checking Program Profiles
- IN PROCEEDINGS OF THE THIRD IEEE INTERNATIONAL WORKSHOP ON SOURCE CODE ANALYSIS AND MANIPULATION
, 2003
"... Execution profiles have become increasingly important for guiding code optimization. However, little has been done to develop ways to check automatically that a profile does, in fact, reflect the actual execution behavior of a program. This paper describes a framework that uses program monitoring te ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Execution profiles have become increasingly important for guiding code optimization. However, little has been done to develop ways to check automatically that a profile does, in fact, reflect the actual execution behavior of a program. This paper describes a framework that uses program monitoring techniques in a way that allows the automatic checking of a wide variety of profile data. We also describe our experiences with using an instance of this framework to check edge profiles. The profile checker uncovered profiling anomalies that were previously unknown and that would have been very difficult to identify using existing techniques.
Automatic derivation of compiler machine descriptions
- ACM Transactions on Programming Languages and Systems (TOPLAS
, 2002
"... We describe a method designed to significantly reduce the effort required to retarget a compiler to a new architecture, while at the same time producing fast and effective compilers. The basic idea is to use the native C compiler at compiler construction time to discover architectural features of th ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
We describe a method designed to significantly reduce the effort required to retarget a compiler to a new architecture, while at the same time producing fast and effective compilers. The basic idea is to use the native C compiler at compiler construction time to discover architectural features of the new architecture. From this information a formal machine description is produced. Given this machine description, a native code-generator can be generated by a back-end generator such as BEG or burg. A prototype automatic Architecture Discovery Tool (called ADT) has been implemented. This tool is completely automatic and requires minimal input from the user. Given the Internet address of the target machine and the command-lines by which the native C compiler, assembler, and linker are invoked, ADT will generate a BEG machine specification containing the register set, addressing modes, instruction set, and instruction timings for the architecture. The current version of ADT is general enough to produce machine descriptions for the integer instruction sets of common RISC and CISC architectures such as the Sun SPARC, Digital Alpha, MIPS, DEC VAX, and
TFP: Time-sensitive, Flow-specific Profiling at Runtime
- in LCPC
, 2003
"... Program profiling can help performance prediction and compiler optimization. This paper describes the initial work behind TFP, a new profiling strategy that can gather and verify a range of flow-specific information at runtime. While TFP can collect more refined information than block, edge or p ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Program profiling can help performance prediction and compiler optimization. This paper describes the initial work behind TFP, a new profiling strategy that can gather and verify a range of flow-specific information at runtime. While TFP can collect more refined information than block, edge or path profiling, it is only 5.75% slower than a very fast runtime path-profiling technique. Statistics collected using TFP over the SPEC2000 benchmarks reveal possibilities for further flow-specific runtime optimizations. We also show how TFP can improve the overall performance of a real application.
DERIVE: A Tool That Automatically Reverse-Engineers Instruction Encodings
- In Proceedings of the ACM SIGPLAN Workshop on Dynamic and Adaptive Compilation and Optimization (Dynamo
"... Many binary tools, such as disassemblers, dynamic code generation systems, and executable code rewriters, need to understand how machine instructions are encoded. Unfortunately, specifying such encodings is tedious and error-prone. Users must typically specify thousands of details of instruction lay ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Many binary tools, such as disassemblers, dynamic code generation systems, and executable code rewriters, need to understand how machine instructions are encoded. Unfortunately, specifying such encodings is tedious and error-prone. Users must typically specify thousands of details of instruction layout, such as opcode and #eld locations values, legal operands, and jump o#set encodings. Wehave built a tool called derive that extracts these details from existing software: the system assembler. Users need only provide the assembly syntax for the instructions for which they want encodings. Derive automatically reverse-engineers instruction encoding knowledge from the assembler by feeding it permutations of instructions and doing equation solving on the output. Derive is robust and general. It derives instruction encodings for SPARC, MIPS, Alpha, PowerPC, ARM, and x86. In the last case, it handles variable-sized instructions, large instructions, instruction encodings determined by operan...
Profile-guided specialization of an operating system kernel
- In Proc. Workshop on Binary Instrumentation and Applications
, 2006
"... Abstract General-purpose operating systems such as Linux are in-creasingly replacing custom embedded counterparts on a wide variety of devices. Despite their convenience and flex-ibility, however, such operating systems may be overly general and thus incur unnecessary performance overheads inthese c ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract General-purpose operating systems such as Linux are in-creasingly replacing custom embedded counterparts on a wide variety of devices. Despite their convenience and flex-ibility, however, such operating systems may be overly general and thus incur unnecessary performance overheads inthese contexts. This paper describes a new approach to mitigating these overheads by automatically specializing the OSkernel for particular execution environments. We use value profiling to identify targets for specialization such as fre-quent system call parameters. A novel profiling technique is used to identify frequently invoked procedure call sequenceswithin the kernel. This information is used to sidestep the problems arising from indirect function calls when carryingout interprocedural compiler optimization. It drives a variety of compiler optimizations such as function inlining and codespecialization that reduce the execution overheads along frequent paths. A prototype implementation that uses the PLTObinary rewriting system to specialize the Linux kernel is described. While overall performance data are mixed, the im-provements we see argue for the potential of this approach.
Stack Analysis of x86 Executables ⋆
"... Abstract. Binary rewriting is becoming increasingly popular for a variety of low-level code manipulation purposes. One of the difficulties encountered in this context is that machine-language programs typically have much less semantic information compared to source code, which makes it harder to rea ..."
Abstract
- Add to MetaCart
Abstract. Binary rewriting is becoming increasingly popular for a variety of low-level code manipulation purposes. One of the difficulties encountered in this context is that machine-language programs typically have much less semantic information compared to source code, which makes it harder to reason about the program’s runtime behavior. This problem is especially acute in the widely used Intel x86 architecture, where the paucity of registers often makes it necessary to store values on the runtime stack. The use of memory in this manner affects many analyses and optimizations because of the possibility of indirect memory references, which are difficult to reason about. This paper describes a simple analysis of some basic aspects of the way in which programs manipulate the runtime stack. The information so obtained can be very helpful in enhancing and improving a variety of other dataflow analyses that reason about and manipulate values stored on the runtime stack. Experiments indicate that the analyses are efficient and useful for improving optimizations that need to reason about the runtime stack. 1
The Structure and Performance of Interpreters
- In Architectural Support for Programming Languages and Operating Systems (ASPLOS-VII
, 1996
"... Interpreted languages have become increasingly popular due to demands for rapid program development, ease of use, portability, and safety. Beyond the general impression that they are "slow," however, little has been documented about the performance of interpreters as a class of applications. This p ..."
Abstract
- Add to MetaCart
Interpreted languages have become increasingly popular due to demands for rapid program development, ease of use, portability, and safety. Beyond the general impression that they are "slow," however, little has been documented about the performance of interpreters as a class of applications. This paper examines interpreter performance by measuring and analyzing interpreters from both software and hardware perspectives. As examples, we measure the MIPSI, Java, Perl, and Tcl interpreters running an array of micro and macro benchmarks on a DEC Alpha platform. Our measurements of these interpreters relate performance to the complexity of the interpreter's virtual machine and demonstrate that native runtime libraries can play a key role in providing good performance. From an architectural perspective, we show that interpreter performance is primarily a function of the interpreter itself and is relatively independent of the application being interpreted. We also demonstrate that high-level ...

