Results 1 - 10
of
36
Mondrian Memory Protection
, 2002
"... Mondrian memory protection (MMP) is a fine-grained protection scheme that allows multiple protection domains to flexibly share memory and export protected services. In contrast to earlier pagebased systems, MMP allows arbitrary permissions control at the granularity of individual words. We use a com ..."
Abstract
-
Cited by 82 (1 self)
- Add to MetaCart
Mondrian memory protection (MMP) is a fine-grained protection scheme that allows multiple protection domains to flexibly share memory and export protected services. In contrast to earlier pagebased systems, MMP allows arbitrary permissions control at the granularity of individual words. We use a compressed permissions table to reduce space overheads and employ two levels of permissions caching to reduce run-time overheads. The protection tables in our implementation add less than 9% overhead to the memory space used by the application. Accessing the protection tables adds less than 8% additional memory references to the accesses made by the application. Although it can be layered on top of demandpaged virtual memory, MMP is also well-suited to embedded systems with a single physical address space. We extend MMP to support segment translation which allows a memory segment to appear at another location in the address space. We use this translation to implement zero-copy networking underneath the standard read system call interface, where packet payload fragments are connected together by the translation system to avoid data copying. This saves 52% of the memory references used by a traditional copying network stack.
Complete Removal of Redundant Expressions
, 1998
"... Partial redundancy elimination (PRE), the most important component of global optimizers, generalizes the removal of common subexpressions and loop-invariant computations. Because existing PRE implementations are based on code motion, they fail to completely remove the redundancies. In fact, we obser ..."
Abstract
-
Cited by 64 (13 self)
- Add to MetaCart
Partial redundancy elimination (PRE), the most important component of global optimizers, generalizes the removal of common subexpressions and loop-invariant computations. Because existing PRE implementations are based on code motion, they fail to completely remove the redundancies. In fact, we observed that 73% of loop-invariant statements cannot be eliminated from loops by code motion alone. In dynamic terms, traditional PRE eliminates only half of redundancies that are strictly partial. To achieve a complete PRE, control flow restructuring must be applied. However, the resulting code duplication may cause code size explosion. This paper focuses on achieving a complete PRE while incurring an acceptable code growth. First, we present an algorithm for complete removal of partial redundancies, based on the integration of code motion and control flow restructuring. In contrast to existing complete techniques, we resort to restructuring merely to remove obstacles to code motion, rather th...
PLTO: A Link-Time Optimizer for the Intel IA-32 Architecture
- In Proc. 2001 Workshop on Binary Translation (WBT-2001
, 2001
"... tool we have developed for the Intel IA-32 architecture. A number of characteristics of this architecture complicate the task of link-time optimization. These include a large number of op-codes and addressing modes, which increases the complexity of program analysis; variable-length instructions, wh ..."
Abstract
-
Cited by 50 (15 self)
- Add to MetaCart
tool we have developed for the Intel IA-32 architecture. A number of characteristics of this architecture complicate the task of link-time optimization. These include a large number of op-codes and addressing modes, which increases the complexity of program analysis; variable-length instructions, which complicates disassembly of machine code; a paucity of available registers, which limits the extent of some optimizations; and a reliance on using memory locations for holding values and for parameter passing, which complicates program analysis and optimization. We describe how PLTO addresses these problems and the resulting performance improvements it is able to achieve.
Online Feedback-Directed Optimization of Java
, 2002
"... This paper describes the implementation of an online feedback-directed optimization system. The system is fully automatic; it requires no prior... ..."
Abstract
-
Cited by 45 (3 self)
- Add to MetaCart
This paper describes the implementation of an online feedback-directed optimization system. The system is fully automatic; it requires no prior...
Eliminating Stack Overflow by Abstract Interpretation
- ACM Trans. Embed. Comput. Syst
"... Abstract. An important correctness criterion for software running on embedded microcontrollers is stack safety: a guarantee that the call stack does not overflow. We address two aspects of the problem of creating stack-safe embedded software that also makes efficient use of memory: statically boundi ..."
Abstract
-
Cited by 43 (9 self)
- Add to MetaCart
Abstract. An important correctness criterion for software running on embedded microcontrollers is stack safety: a guarantee that the call stack does not overflow. We address two aspects of the problem of creating stack-safe embedded software that also makes efficient use of memory: statically bounding worst-case stack depth, and automatically reducing stack memory requirements. Our first contribution is a method for statically guaranteeing stack safety by performing interpretation of machine code. Abstract interpretation permits our analysis to accurately model when interrupts are enabled and disabled, which is essential for accurately bounding the stack depth of typical embedded systems. We have implemented a stack analysis tool that targets Atmel AVR microcontrollers, and tested it on embedded applications compiled from up to 30,000 lines of C. We experimentally validate the accuracy of the tool, which runs in a few seconds on the largest programs that we tested. The second contribution of this paper is a novel framework for automatically reducing stack memory requirements. We show that goal-directed global function inlining can be used to reduce the stack memory requirements of component-based embedded software, on average, to 40 % of the requirement of a system compiled without inlining, and to 68 % of the requirement of a system compiled with aggressive whole-program inlining that is not directed towards reducing stack usage. 1
An Automatic Object Inlining Optimization and its Evaluation
- In PLDI 2000
, 2000
"... Automatic object inlining [19, 20] transforms heap data structures by fusing parent and child objects together. It can improve runtime by reducing object allocation and pointer dereference costs. We report continuing work studying object inlining optimizations. In particular, we present a new semant ..."
Abstract
-
Cited by 33 (0 self)
- Add to MetaCart
Automatic object inlining [19, 20] transforms heap data structures by fusing parent and child objects together. It can improve runtime by reducing object allocation and pointer dereference costs. We report continuing work studying object inlining optimizations. In particular, we present a new semantic derivation of the correctness conditions for object inlining, and program analysis which extends our previous work. And we present an object inlining transformation, focusing on a new algorithm which optimizes class field layout to minimize code expansion. Finally, we detail a fuller evaluation on eleven programs and libraries (including Xpdf, the 25,000 line Portable Document Format (PDF) le browser) that utilizes hardware measures of impact on the memory system. We show that our analysis scales effectively to large programs, nding many inlinable elds (45 in xpdf) at acceptable cost, and we show that, on some programs, it finds nearly all fields for which object inlining is correct, and a...
A Survey of Adaptive Optimization in Virtual Machines
- PROCEEDINGS OF THE IEEE, 93(2), 2005. SPECIAL ISSUE ON PROGRAM GENERATION, OPTIMIZATION, AND ADAPTATION
, 2004
"... Virtual machines face significant performance challenges beyond those confronted by traditional static optimizers. First, portable program representations and dynamic language features, such as dynamic class loading, force the deferral of most optimizations until runtime, inducing runtime optimiza ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
Virtual machines face significant performance challenges beyond those confronted by traditional static optimizers. First, portable program representations and dynamic language features, such as dynamic class loading, force the deferral of most optimizations until runtime, inducing runtime optimization overhead. Second, modular
Achieving High Performance via Co-Designed Virtual Machines
- In International Workshop on Innovative Architecture
, 1999
"... Introduction Today's virtual machines use a layer of software that allows programs compiled in one instruction set to be executed on a processor executing a (different) native instruction set. Virtual machines have become popular in recent years for providing platform independence; however, virtual ..."
Abstract
-
Cited by 25 (6 self)
- Add to MetaCart
Introduction Today's virtual machines use a layer of software that allows programs compiled in one instruction set to be executed on a processor executing a (different) native instruction set. Virtual machines have become popular in recent years for providing platform independence; however, virtual machines also open many new opportunities for enhancing performance. The co-design of virtual machine software and the underlying hardware microarchitecture will enable enhanced instruction level parallelism and more adaptable performance mechanisms than are possible when hardware and application software are separated by instruction set architectures as is traditionally done. In future high performance computers, a virtual instruction set architecture (V-ISA) will be the level for maintaining architectural compatibility. The V-ISA will be implemented with a virtual machine that blends software and hardware in a symbiotic manner via co-design. The hardware will support an implementationdep
A framework for unrestricted whole-program optimization
- In ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation
, 2006
"... Procedures have long been the basic units of compilation in conventional optimization frameworks. However, procedures are typically formed to serve software engineering rather than optimization goals, arbitrarily constraining code transformations. Techniques, such as aggressive inlining and interpro ..."
Abstract
-
Cited by 24 (8 self)
- Add to MetaCart
Procedures have long been the basic units of compilation in conventional optimization frameworks. However, procedures are typically formed to serve software engineering rather than optimization goals, arbitrarily constraining code transformations. Techniques, such as aggressive inlining and interprocedural optimization, have been developed to alleviate this problem, but, due to code growth and compile time issues, these can be applied only sparingly. This paper introduces the Procedure Boundary Elimination (PBE) compilation framework, which allows unrestricted whole-program optimization. PBE allows all intra-procedural optimizations and analyses to operate on arbitrary subgraphs of the program, regardless of the original procedure boundaries and without resorting to inlining. In order to control compilation time, PBE also introduces novel extensions of region formation and encapsulation. PBE enables targeted code specialization, which recovers the specialization benefits of inlining while keeping code growth in check. This paper shows that PBE attains better performance than inlining with half the code growth.
Adaptive Online Context-Sensitive Inlining
- In First Annual IEEE/ACM Interational Conference on Code Generation and Optimization
, 2003
"... As current trends in software development move toward more complex object-oriented programming, inlining has become a vital optimization that provides substantial performance improvements to C++ and Java programs. Yet, the aggressiveness of the inlining algorithm must be carefully monitored to effec ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
As current trends in software development move toward more complex object-oriented programming, inlining has become a vital optimization that provides substantial performance improvements to C++ and Java programs. Yet, the aggressiveness of the inlining algorithm must be carefully monitored to effectively balance performance and code size. The state-of-the-art is to use profile information (associated with call edges) to guide inlining decisions. In the presence of virtual method calls, profile information for one call edge may not be sufficient for making effectual inlining decisions. Therefore, we explore the use of profiling data with additional levels of context sensitivity. In addition to exploring fixed levels of context sensitivity, we explore several adaptive schemes that attempt to find the ideal degree of context sensitivity for each call site. Our techniques are evaluated on the basis of runtime performance, code size and dynamic compilation time. On average, we found that with minimal impact on performance (+/-1%) context sensitivity can enable 10% reductions in compiled code space and compile time. Performance on individual programs varied from -4.2% to 5.3% while reductions in compile time and code space of up to 33.0% and 56.7% respectively were obtained. 1.

