Results 1 -
9 of
9
Efficient Software-Based Fault Isolation
, 1993
"... One way to provide fault isolation among cooperating software modules is to place each in its own address space. However, for tightly-coupled modules, this solution incurs prohibitive context switch overhead. In this paper, we present a software approach to implementing fault isolation within a sing ..."
Abstract
-
Cited by 627 (11 self)
- Add to MetaCart
One way to provide fault isolation among cooperating software modules is to place each in its own address space. However, for tightly-coupled modules, this solution incurs prohibitive context switch overhead. In this paper, we present a software approach to implementing fault isolation within a single address space. Our approach has two parts. First, we load the code and data for a distrusted module into its own fault domain, a logically separate portion of the application's address space. Second, we modify the object code of a distrusted module to prevent it from writing or jumping to an address outside its fault domain. Both these software operations are portable and programming language independent. Our approach poses a tradeo relative to hardware fault isolation: substantially faster communication between fault domains, at a cost of slightly increased execution time for distrusted modules. We demonstrate that for frequently communicating modules, implementing fault isolation in software rather than hardware can substantially improve end-to-end application performance.
Program Optimization for Instruction Caches
, 1989
"... This paper presents an optimization algorithm for reducing instruction cache misses. The algorithm uses profile information to reposition programs in memory so that a direct-mapped cache behaves much like an optimal cache with full associativity and full knowledge of the future. For best results, th ..."
Abstract
-
Cited by 138 (2 self)
- Add to MetaCart
This paper presents an optimization algorithm for reducing instruction cache misses. The algorithm uses profile information to reposition programs in memory so that a direct-mapped cache behaves much like an optimal cache with full associativity and full knowledge of the future. For best results, the cache should have a mechanism for excluding certain instructions designated by the compiler. This paper first presents a reduced form of the algorithm. This form is shown to produce an optimal miss rate for programs without conditionals and with a tree call graph, assuming basic blocks can be reordered at will. If conditionals are allowed, but there are no loops within conditionals, the algorithm does as well as an optimal cache for the worst case execution of the program consistent with the profile information. Next, the algorithm is extended with heuristics for general programs. The effectiveness of these heuristics are demonstrated with empirical results for a set of 10 programs for various cache sizes. The improvement depends on cache size. For a 512 word cache, miss rates for a direct-mapped instruction cache are halved. For an 8K word cache, miss rates fall by over 75%. Over a wide range of cache sizes the algorithm is as effective as increasing the cache size by a factor of 3 times. For 512 words, the algorithm generates only 32 % more misses than an optimal cache. Optimized programs on a direct-mapped cache have lower miss rates than unoptimized programs on set-associative caches of the same size.
Efficient Procedure Mapping using Cache Line Coloring
- IN PROCEEDINGS OF THE SIGPLAN'97 CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION
, 1997
"... As the gap between memory and processor performance continues to widen, it becomes increasingly important to exploit cache memory effectively. Both hardware and software approaches can be explored to optimize cache performance. Hardware designers focus on cache organization issues, including replace ..."
Abstract
-
Cited by 67 (12 self)
- Add to MetaCart
As the gap between memory and processor performance continues to widen, it becomes increasingly important to exploit cache memory effectively. Both hardware and software approaches can be explored to optimize cache performance. Hardware designers focus on cache organization issues, including replacement policy, associativity, line size and the resulting cache access time. Software writers use various optimization techniques, including software prefetching, data scheduling and code reordering. Our focus is on improving memory usage through code reordering compiler techniques. In this
Optimizing Instruction Cache Performance for Operating System Intensive Workloads
- IEEE Transactions on Computers
, 1995
"... High instruction cache hit rates are key to high performance. One known technique to improve the hit rate of caches is to use an optimizing compiler to minimize cache interference via an improved layout of the code. This technique, however, has been applied to application code only, even though ther ..."
Abstract
-
Cited by 61 (11 self)
- Add to MetaCart
High instruction cache hit rates are key to high performance. One known technique to improve the hit rate of caches is to use an optimizing compiler to minimize cache interference via an improved layout of the code. This technique, however, has been applied to application code only, even though there is evidence that the operating system often uses the cache heavily and with less uniform patterns than applications. Therefore, it is unknown how well existing optimizations perform for systems code and whether better optimizations can be found. We address this problem in this paper. This paper characterizes in detail the locality patterns of the operating system code and shows that there is substantial locality. Unfortunately, caches are not able to extract much of it: rarely-executed special-case code disrupts spatial locality, loops with few iterations that call routines make loop locality hard to exploit, and plenty of loop-less code hampers temporal locality. As a result, interference...
Efficient Dynamic Procedure Placement
, 1998
"... Commercial applications such as database servers often have very large instruction footprints and consequently are frequently stalled due to instruction cache misses. A large fraction of the i-cache misses are typically due to conflicts in the relatively small direct-mapped on-chip instruction ca ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Commercial applications such as database servers often have very large instruction footprints and consequently are frequently stalled due to instruction cache misses. A large fraction of the i-cache misses are typically due to conflicts in the relatively small direct-mapped on-chip instruction caches. A variety of tools have been developed to try to order the procedures of an application to minimize these conflicts. Such tools often make use of profile information to place procedures so that procedures that frequently call each other do not conflict in the i-cache. However, users often avoid using any kind of tool that requires them to do extra profiling and linking steps to optimize their application. In addition, any tool that does a static layout of procedures (whether using profiling information or not) cannot adapt to varying application workloads that cause very different application behavior. We have developed a method called DPP (dynamic procedure placement) for pl...
Cache Line Coloring Using Real and Estimated Profiles
"... Efficient exploitation of the available cache memory space can have a significant improvement in program performance. By carefully restructuring a program such that temporally local sequences of instructions are mapped to different portions of the cache, fewer cache conflicts will result. We present ..."
Abstract
- Add to MetaCart
Efficient exploitation of the available cache memory space can have a significant improvement in program performance. By carefully restructuring a program such that temporally local sequences of instructions are mapped to different portions of the cache, fewer cache conflicts will result. We present a link-time procedure mapping algorithm which views the cache as a colored address space, each color representing a different cache line. The idea of coloring is borrowed from the register allocation problem. We consider the size of the cache and the size of a cache line in this work, when coloring procedures to indicate the current mapping of the procedure in the cache space. In this work we also describe a methodology of weighting a program call graph, without the aid of dynamic profiles, and use this to guide our coloring algorithm. We refer to this strategy as statically generating a call graph. We employ static branch prediction to estimate program behavior, which in turn is used to we...
A First Look at the Interplay of Code Reordering and Configurable Caches
, 2005
"... The instruction cache is a popular target for optimizations of microprocessor-based systems because of the cache's high impact on system performance and power, and because of the cache's predictable temporal and spatial locality. Optimization techniques can be designed based on this predictability. ..."
Abstract
- Add to MetaCart
The instruction cache is a popular target for optimizations of microprocessor-based systems because of the cache's high impact on system performance and power, and because of the cache's predictable temporal and spatial locality. Optimization techniques can be designed based on this predictability. We explore for the first time the interplay of two popular instruction cache optimization techniques: the long-known technique of code reordering and the relatively-new technique of cache configuration. We address the question of whether those two optimizations complement each other or if one optimization dominates the other. Through experiments using embedded system benchmarks, we show that cache configuration dominates a particular category of code reordering techniques with respect to optimizing performance and energy, obviating the need for reordering. We also examine the modern scenario of synthesized custom caches, and show that combining cache configuration with code reordering results in cache size reductions of 13% on average, and up to 89% in some benchmarks, beyond just cache configuration alone.
Cache Line Coloring Procedure Placement Using Real and Estimated Profiles
"... Efficient exploitation of the available cache memory space can have a significant improvement in program performance. By carefully restructuring a program such that temporally local sequences of instructions are mapped to different portions of the cache, fewer cache conflicts will result. In this pa ..."
Abstract
- Add to MetaCart
Efficient exploitation of the available cache memory space can have a significant improvement in program performance. By carefully restructuring a program such that temporally local sequences of instructions are mapped to different portions of the cache, fewer cache conflicts will result. In this paper we present a link-time procedure mapping algorithm which views the cache as a colored address space, each color representing a different cache line. The idea of coloring is borrowed from the register allocation problem. We consider the size of the cache and the size of a cache line in this work, when coloring procedures to indicate the current mapping of the procedure in the cache space. This algorithm takes as input a weighted program call graph, with nodes representing procedures and edge weights representing call frequencies. This graph is used as input to our color-based mapping algorithm to provide an improved program layout. The results in this paper are presented for reducing the ...
11th International Workshop on Software & Compilers for Embedded Systems (SCOPES) 2008 WCET-Driven, Code-Size Critical Procedure Cloning ∗†
"... In the domain of the worst-case execution time (WCET) analysis, loops are an inherent source of unpredictability and loss of precision since the determination of tight and safe information on the number of loop iterations is a difficult task. In particular, data-dependent loops whose iteration count ..."
Abstract
- Add to MetaCart
In the domain of the worst-case execution time (WCET) analysis, loops are an inherent source of unpredictability and loss of precision since the determination of tight and safe information on the number of loop iterations is a difficult task. In particular, data-dependent loops whose iteration counts depend on function parameters can not be precisely handled by a timing analysis. Procedure Cloning can be exploited to make these loops explicit within the source code allowing a highly precise WCET analysis. In this paper we extend the standard Procedure Cloning optimization by WCET-aware concepts with the objective to improve the tightness of the WCET estimation. Our novel approach is driven by WCET information which successively eliminates code structures leading to overestimated timing results, thus making the code more suitable for the analysis. In addition, the code size increase during the optimization is monitored and large increases are avoided. The effectiveness of our optimization is shown by tests on real-world benchmarks. After performing our optimization, the estimated WCET is reduced by up to 64.2 % while the employed code transformations yield an additional code size increase of 22.6 % on average. In contrast, the averagecase performance being the original objective of Procedure Cloning showed a slight decrease. 1.

