Results 1 - 10
of
33
Bounding worst-case instruction cache performance
- In IEEE Real-Time Systems Symposium
, 1994
"... The use of caches poses a difficult tradeoff for architects of real-time systems. While caches provide significant performance advantages, they have also been viewed as inherently unpredictable since the behavior of a cache reference depends upon the history of the previous references. The use of ca ..."
Abstract
-
Cited by 108 (35 self)
- Add to MetaCart
The use of caches poses a difficult tradeoff for architects of real-time systems. While caches provide significant performance advantages, they have also been viewed as inherently unpredictable since the behavior of a cache reference depends upon the history of the previous references. The use of caches will only be suitable for realtime systems if a reasonably tight bound on the performance of programs using cache memory can be predicted. This paper describes an approach for bounding the worstcase instruction cache performance of large code segments. First, a new method called Static Cache Simulation is used to analyze a program’s control flow to statically categorize the caching behavior of each instruction. A timing analyzer, which uses the categorization information, then estimates the worst-case instruction cache performance for each loop and function in the program. 1.
Integrating the timing analysis of pipelining and instruction caching
- In IEEE Real-Time Systems Symposium
, 1995
"... Recently designed machines contain pipelines and caches. While both features provide significant performance advantages, they also pose problems for predicting execution time of code segments in real-time systems. Pipeline hazards may result in multicycle delays. Instruction or data memory reference ..."
Abstract
-
Cited by 92 (16 self)
- Add to MetaCart
Recently designed machines contain pipelines and caches. While both features provide significant performance advantages, they also pose problems for predicting execution time of code segments in real-time systems. Pipeline hazards may result in multicycle delays. Instruction or data memory references may not be found in cache and these misses typically require several cycles to resolve. Whether an instruction will stall due to a pipeline hazard or a cache miss depends on the dynamic sequence of previous instructions executed and memory references performed. Furthermore, these penalties are not independent since delays due to pipeline stalls and cache miss penalties may overlap. This paper describes an approach for bounding the worst-case performance of large code segments on machines that exploit both pipelining and instruction caching. First, a method is used to analyze a program’s control flow to statically categorize the caching behavior of each instruction. Next, these categorizations are used in the pipeline analysis of sequences of instructions representing paths within the program. A timing analyzer uses the pipeline path analysis to estimate the worst-case execution performance of each loop and function in the program. Finally, agraphical user interface is invoked that allows a user to request timing predictions on portions of the program. 1.
Timing Analysis for Data Caches and Set-Associative Caches
, 1997
"... The contributions of this paper are twofold. First, an automatic tool-based approach is described to bound worst-case data cache performance. The given approach works on fully optimized code, performs the analysis over the entire control flow of a program, detects and exploits both spatial and tempo ..."
Abstract
-
Cited by 63 (11 self)
- Add to MetaCart
The contributions of this paper are twofold. First, an automatic tool-based approach is described to bound worst-case data cache performance. The given approach works on fully optimized code, performs the analysis over the entire control flow of a program, detects and exploits both spatial and temporal locality within data references, produces results typically within a few seconds, and estimates, on average, 30% tighter WCET bounds than can be predicted without analyzing data cache behavior. Results obtained by running the system on representative programs are presented and indicate that timing analysis of data cache behavior can result in significantly tighter worst-case performance predictions. Second, a framework to bound worst-case instruction cache performance for set-associative caches is formally introduced and operationally described. Results of incorporating instruction cache predictions within pipeline simulation show that timing predictions for set-associative caches remain...
Finding effective optimization phase sequences
- In Proceedings of the 2003 ACM SIGPLAN conference on Language, Compiler, and Tool for Embedded Systems
, 2003
"... It has long been known that a single ordering of optimization phases will not produce the best code for every application. This phase ordering problem can be more severe when generating code for embedded systems due to the need to meet conflicting constraints on time, code size, and power consumptio ..."
Abstract
-
Cited by 45 (11 self)
- Add to MetaCart
It has long been known that a single ordering of optimization phases will not produce the best code for every application. This phase ordering problem can be more severe when generating code for embedded systems due to the need to meet conflicting constraints on time, code size, and power consumption. Given that many embedded application developers are willing to spend time tuning an application, we believe a viable approach is to allow the developer to steer the process of optimizing a function. In this paper, we describe support in VISTA, an interactive compilation system, for finding effective sequences of optimization phases. VISTA provides the user with dynamic and static performance information that can be used during an interactive compilation session to gauge the progress of improving the code. In addition, VISTA provides support for automatically using performance information to select the best optimization sequence among several attempted. One such feature is the use of a genetic algorithm to search for the most efficient sequence based on specified fitness criteria. We hav e included a number of experimental results that evaluate the effectiveness of using a genetic algorithm in VISTA to find effective optimization phase sequences.
The Advantages of Machine-Dependent Global Optimization
- In Proceedings of the 1994 Conference on Programming Languages and Systems Architectures
, 1994
"... machine designers have long recognized this dilemma. In 1972, Newey, Poole, and Waite [Newe79] observed that `Most problems will suggest a number of specialized operations which could possibly be implemented quite efficiently on certain hardware. The designer must balance the convenience and utility ..."
Abstract
-
Cited by 41 (17 self)
- Add to MetaCart
machine designers have long recognized this dilemma. In 1972, Newey, Poole, and Waite [Newe79] observed that `Most problems will suggest a number of specialized operations which could possibly be implemented quite efficiently on certain hardware. The designer must balance the convenience and utility of these operations against the increased difficulty of implementing an abstract machine with a rich and varied instruction set.' Fortunately, applying all code improvements to the LIL removes efficiency considerations as HIL design issue. The abstract machine need only contain a set of features roughly equivalent to the intersection of the operations included in typical target machines. The result is a small, simple abstract machine. Such abstract machines are termed `intersection' machines. The analogy between union/intersection abstract machines and CISC/RISC architectures is obvious. void daxpy(n, da, dx, incx, dy, incy) int n, incx, incy; double da, dx[], dy[]; { if (n <= 0) return; i...
Static Cache Simulation and its Applications
, 1994
"... This work takes a fresh look at the simulation of cache memories. It introduces the technique of static cache simulation that statically predicts a large portion of cache references. To efficiently utilize this technique, a method to perform efficient on-the-fly analysis of programs in general is de ..."
Abstract
-
Cited by 41 (13 self)
- Add to MetaCart
This work takes a fresh look at the simulation of cache memories. It introduces the technique of static cache simulation that statically predicts a large portion of cache references. To efficiently utilize this technique, a method to perform efficient on-the-fly analysis of programs in general is developed and proved correct. This method is combined with static cache simulation for a number of applications. The application of fast instruction cache analysis provides a new framework to evaluate instruction cache memories that outperforms even the fastest techniques published. Static cache simulation is shown to address the issue of predicting cache behavior, contrary to the belief that cache memories introduce unpredictability to real-time systems that cannot be efficiently analyzed. Static cache simulation for instruction caches provides a large degree of predictability for real-time systems. In addition, an architectural modification through bit-encoding is introduced that provides fu...
Timing Analysis for Data and Wrap-Around Fill Caches
- REAL-TIME SYSTEMS
, 1999
"... The contributions of this paper are twofold. First, an automatic tool-based approach is described to bound worst-case data cache performance. The approach works on fully optimized code, performs the analysis over the entire control flow of a program, detects and exploits both spatial and temporal ..."
Abstract
-
Cited by 26 (13 self)
- Add to MetaCart
The contributions of this paper are twofold. First, an automatic tool-based approach is described to bound worst-case data cache performance. The approach works on fully optimized code, performs the analysis over the entire control flow of a program, detects and exploits both spatial and temporal locality within data references, and produces results typically within a few seconds. Results obtained by running the system on representative programs are presented and indicate that timing analysis of data cache behavior usually results in significantly tighter worstcase performance predictions. Second, a method to deal with realistic cache filling approaches, namely wrap-around-filling for cache misses, is presented as an extension to pipeline analysis. Wrap-around fill analysis is more challenging than traditional cache analysis since the words within a program line are loaded into cache in different cycles according to a predetermined sequence, rather than all at the same time. Results ...
Tighter Timing Predictions by Automatic Detection and Exploitation of Value-Dependent Constraints
- Proceedings of the IEEE Real-Time Technology and Applications Symposium
, 1999
"... Predicting the worst-case execution time (WCET) of a real-time program isachallenging task. Though much progress has been made in obtaining tighter timing predictions by using techniques that model the architectural features of a machine, significant overestimations of WCET can still occur. Even wit ..."
Abstract
-
Cited by 19 (6 self)
- Add to MetaCart
Predicting the worst-case execution time (WCET) of a real-time program isachallenging task. Though much progress has been made in obtaining tighter timing predictions by using techniques that model the architectural features of a machine, significant overestimations of WCET can still occur. Even with perfect architectural modeling, dependencies on data values can constrain the outcome of conditional branches and the corresponding set of paths that can be taken in a program. While value-dependent constraint information has been used in the past by some timing analyzers, it has typically been specified manually, which is both tedious and error prone. This paper describes efficient techniques for automatically detecting value-dependent constraints by a compiler and automatically exploiting these constraints within a timing analyzer. The result is tighter timing analysis predictions without requiring additional interaction with a user. 1.
VISTA: A System for Interactive Code Improvement
- in "Proceedings of the joint conference on Languages, compilers
, 2002
"... Software designers face many challenges when developing applications for embedded systems. A major challenge is meeting the conflicting constraints of speed, code density, and power consumption. Traditional optimizing compiler technology is usually of little help in addressing this challenge. To mee ..."
Abstract
-
Cited by 18 (7 self)
- Add to MetaCart
Software designers face many challenges when developing applications for embedded systems. A major challenge is meeting the conflicting constraints of speed, code density, and power consumption. Traditional optimizing compiler technology is usually of little help in addressing this challenge. To meet speed, power, and size constraints, application developers typically resort to hand-coded assembly language. The results are software systems that are not portable, less robust, and more costly to develop and maintain. This paper describes a new code improvement paradigm implemented in a system called vista that can help achieve the cost/ performance trade-offs that embedded applications demand. Unlike traditional compilation systems where the smallest unit of compilation is typically a function and the programmer has no control over the code improvement process other than what types of code improvements to perform, the vista system opens the code improvement process and gives the application programmer, when necessary, the ability to finely control it. In particular, vista allows the application developer to (1) direct the order and scope in which the code improvement phases are applied, (2) manually specify code transformations, (3) undo previously applied transformations, and (4) view the low-level program representation graphically. vista can be used by embedded systems developers to produce applications, by compiler writers to prototype and debug new lowlevel code transformations, and by instructors to illustrate code transformations (e.g., in a compilers course).
Compiled Simulation of Programmable DSP Architectures
- in Proc. of IEEE Workshop on VLSI in Signal Processing
, 1995
"... This paper presents a technique for simulating processors based on the principle of compiled simulation. Unlike existing, commercially available instruction set simulators for DSPs, which are of interpretive character, the proposed technique performs instruction decoding and simulation scheduling at ..."
Abstract
-
Cited by 17 (7 self)
- Add to MetaCart
This paper presents a technique for simulating processors based on the principle of compiled simulation. Unlike existing, commercially available instruction set simulators for DSPs, which are of interpretive character, the proposed technique performs instruction decoding and simulation scheduling at compile time. The technique offers up to three orders of magnitude faster simulation. The high speed allows the user to explore algorithms and hardware/software trade-offs before any hardware implementation. Moreover, the user can tailor the compiled simulation to trade speed for more accuracy. In this paper, the sources of the speedup and the limitations of the technique are analyzed and the realization of the simulation compiler is presented. Introduction Designers of today's DSP systems---such as digital cellular phones and multimedia systems---face increasing complexity and quickly changing system requirements. To deal with these challenges, designers have increasingly turned to progr...

