Results 1 - 10
of
77
Bounding Pipeline and Instruction Cache Performance
- IEEE Transactions on Computers
, 1999
"... Predicting the execution time of code segments in real-time systems is challenging. Most recently designed machines contain pipelines and caches. Pipeline hazards may result in multicycle delays. Instruction or data memory references may not be found in cache and these misses typically require sever ..."
Abstract
-
Cited by 104 (22 self)
- Add to MetaCart
Predicting the execution time of code segments in real-time systems is challenging. Most recently designed machines contain pipelines and caches. Pipeline hazards may result in multicycle delays. Instruction or data memory references may not be found in cache and these misses typically require several cycles to resolve. Whether an instruction will stall due to apipeline hazard oracache miss depends on the dynamic sequence of previous instructions executed and memory references performed. Furthermore, these penalties are not independent since delays due to pipeline stalls and cache miss penalties may overlap. This paper describes an approach for bounding the worst and best-case performance of large code segments on machines that exploit both pipelining and instruction caching. First, a method is used to analyze a program’s control flow to statically categorize the caching behavior of each instruction. Next, these categorizations are used in the pipeline analysis of sequences of instructions representing paths within the program. A timing analyzer uses the pipeline path analysis to estimate the worst and best-case execution performance of each loop and function in the program. Finally, agraphical user interface is invoked that allows a user to request timing predictions on portions of the program. The results indicate that the timing analyzer efficiently produces tight predictions of worst and best-case performance for pipelining and instruction caching. Index terms: real-time systems, worst-case execution time, best-case execution time, timing analysis, instruction cache, pipelining
Timing Anomalies in Dynamically Scheduled Microprocessors
"... Previous timing analysis methods have assumed that the worst-case instruction execution time necessarily corresponds to the worst-case behavior. We show that this assumption is wrong in dynamically scheduled processors. A cache miss, for example, can in some cases result in a shorter execution time ..."
Abstract
-
Cited by 83 (0 self)
- Add to MetaCart
Previous timing analysis methods have assumed that the worst-case instruction execution time necessarily corresponds to the worst-case behavior. We show that this assumption is wrong in dynamically scheduled processors. A cache miss, for example, can in some cases result in a shorter execution time than a cache hit. Many examples of such timing anomalies are provided. We first provide necessary conditions when timing anomalies can show up and identify what architectural features that may cause such anomalies. We also show that analyzing the effect of these anomalies with known techniques results in prohibitive computational complexities. Instead, we propose some simple code modification techniques to make it impossible for any anomalies to occur. These modifications make it possible to estimate WCET by known techniques. Our evaluation shows that the pessimism imposed by these techniques is fairly limited; it is less than 27 % for the programs in our benchmark suite.
A Comparison of Static Analysis and Evolutionary Testing for the Verification of Timing Constraints
- Real-Time Systems
, 1998
"... This paper contrasts two methods to verify timing constraints of real-time applications. The method of static analysis predicts the worst-case and best-case execution times of a task's code by analyzing execution paths and simulating processor characteristics without ever executing the program or re ..."
Abstract
-
Cited by 75 (30 self)
- Add to MetaCart
This paper contrasts two methods to verify timing constraints of real-time applications. The method of static analysis predicts the worst-case and best-case execution times of a task's code by analyzing execution paths and simulating processor characteristics without ever executing the program or requiring the program's input. Evolutionary testing is an iterative testing procedure, which approximates the extreme execution times within several generations. By executing the test object dynamically and measuring the execution times the inputs are guided yielding gradually tighter predictions of the extreme execution times. We examined both approaches on a number of real world examples. The results show that static analysis and evolutionary testing are complementary methods, which together provide upper and lower bounds for both worst-case and best-case execution times. 1. Introduction For real-time systems the correct system functionality depends on their logical correctness as well as o...
An integrated path and timing analysis method based on cycle-level symbolic execution
- Journal of Real-Time Systems
, 1999
"... Abstract. Previously published methods for estimation of the worst-case execution time on high-performance processors with complex pipelines and multi-level memory hierarchies result in overestimations owing to insu cient path and/or timing analysis. This does not only give rise to poor utilization ..."
Abstract
-
Cited by 64 (1 self)
- Add to MetaCart
Abstract. Previously published methods for estimation of the worst-case execution time on high-performance processors with complex pipelines and multi-level memory hierarchies result in overestimations owing to insu cient path and/or timing analysis. This does not only give rise to poor utilization of processing resources but also reduces the schedulability in real-time systems. This paper presents a method that integrates path and timing analysis to accurately predict the worst-case execution time for real-time programs on high-performance processors. The unique feature of the method is that it extends cycle-level architectural simulation techniques to enable symbolic execution with unknown input data values; it uses alternative instruction semantics to handle unknown operands. We show that the method can exclude many infeasible (or non-executable) program paths and can calculate path information, such as bounds on number of loop iterations, without the need for manual annotations of programs. Moreover, the method is shown to accurately analyze timing properties of complex features in high-performance processors using multiple-issue pipelines and instruction and data caches. The combined path and timing analysis capability is shown to derive exact estimates of the worst-case execution time for six out of seven programs in our benchmark suite. Keywords: Real-time systems, worst-case execution time, timing analysis, path analysis, symbolic execution, multiple-issue processor, caches, architecture simulation. 1.
Timing Analysis for Data Caches and Set-Associative Caches
, 1997
"... The contributions of this paper are twofold. First, an automatic tool-based approach is described to bound worst-case data cache performance. The given approach works on fully optimized code, performs the analysis over the entire control flow of a program, detects and exploits both spatial and tempo ..."
Abstract
-
Cited by 63 (11 self)
- Add to MetaCart
The contributions of this paper are twofold. First, an automatic tool-based approach is described to bound worst-case data cache performance. The given approach works on fully optimized code, performs the analysis over the entire control flow of a program, detects and exploits both spatial and temporal locality within data references, produces results typically within a few seconds, and estimates, on average, 30% tighter WCET bounds than can be predicted without analyzing data cache behavior. Results obtained by running the system on representative programs are presented and indicate that timing analysis of data cache behavior can result in significantly tighter worst-case performance predictions. Second, a framework to bound worst-case instruction cache performance for set-associative caches is formally introduced and operationally described. Results of incorporating instruction cache predictions within pipeline simulation show that timing predictions for set-associative caches remain...
Timing Analysis for Instruction Caches
- REAL-TIME SYSTEMS
, 2000
"... This paper contributes a comprehensive study of a framework to bound worst-case instruction cache performance for caches with arbitrary levels of associativity. The framework is formally introduced, operationally described and its correctness is shown. Results of incorporating instruction cache pred ..."
Abstract
-
Cited by 55 (22 self)
- Add to MetaCart
This paper contributes a comprehensive study of a framework to bound worst-case instruction cache performance for caches with arbitrary levels of associativity. The framework is formally introduced, operationally described and its correctness is shown. Results of incorporating instruction cache predictions within pipeline simulation show that timing predictions for set-associative caches remain just as tight as predictions for direct-mapped caches. The low cache simulation overhead allows interactive use of the analysis tool and scales well with increasing associativity. The approach taken is based on a data-ow specication of the problem and provides another step toward worst-case execution time prediction of contemporary architectures and its use in schedulability analysis for hard real-time systems.
Analysis of Cache-related Preemption Delay in Fixed-priority Preemptive Scheduling
, 1996
"... We propose a technique for analyzing cache-related preemption delays of tasks that cause unpredictable variation in task execution time in the context of fixed-priority preemptive scheduling. The proposed technique consists of two steps. The first step performs a per-task analysis to estimate cache- ..."
Abstract
-
Cited by 53 (4 self)
- Add to MetaCart
We propose a technique for analyzing cache-related preemption delays of tasks that cause unpredictable variation in task execution time in the context of fixed-priority preemptive scheduling. The proposed technique consists of two steps. The first step performs a per-task analysis to estimate cache-related preemption cost for each execution point in a given task. The second step computes the worst case response time of each task that includes the cache-related preemption delay using a response time equation and a linear programming technique. This step takes as its input the preemption cost information of tasks obtained in the first step. This paper also compares the proposed approach with previous approaches. The results show that the proposed approach gives a prediction of the worst case cache-related preemption delay that is up to 60% tighter than the best of predictions obtained from the previous approaches. Index Terms--- real-time system, fixed-priority scheduling, cache memory,...
Bounding Loop Iterations for Timing Analysis
- In Proceedings of the IEEE Real-Time Applications Symposium. IEEE CS Press, Los Alamitos, Calif
, 1998
"... Static timing analyzers need to know the minimum and maximum number of iterations associated with each loop in a real-time program so accurate timing predictions can be obtained. This paper describes three complementary methods to support timing analysis by bounding the number of loop iterations. Fi ..."
Abstract
-
Cited by 51 (8 self)
- Add to MetaCart
Static timing analyzers need to know the minimum and maximum number of iterations associated with each loop in a real-time program so accurate timing predictions can be obtained. This paper describes three complementary methods to support timing analysis by bounding the number of loop iterations. First, an algorithm is presented that determines the minimum and maximum number of iterations of loops with multiple exits. Second, the loopinvariant variables on which the number of loop iterations depends are identified for which the user can provide minimum and maximum values. Finally, a method is given to tightly predict the execution time of loops whose number of iterations is dependent on counter variables of outer level loops. These methods have been successfully integrated in an existing timing analyzer that predicts the performance for optimized code on a machine that exploits caching and pipelining. The result is tighter timing analysis predictions and less work for the user.
Complete Worst-Case Execution Time Analysis of Straight-line Hard Real-Time Programs
, 2000
"... In this article, the problem of finding a tight estimate on the worst-case execution time (WCET) of a real-time program is addressed. The analysis is focused on straight-line code (i.e. code without loops and recursive function calls) which is quite commonly found in synthesised code of hard real-ti ..."
Abstract
-
Cited by 41 (8 self)
- Add to MetaCart
In this article, the problem of finding a tight estimate on the worst-case execution time (WCET) of a real-time program is addressed. The analysis is focused on straight-line code (i.e. code without loops and recursive function calls) which is quite commonly found in synthesised code of hard real-time embedded systems. The analysis exploits the very simple structure of these programs, resulting in a considerable processing time improvement compared to general-case analysis techniques. A comprehensive timing analysis system, called the Program Timing Analyser (PTA), covering low-level aspects (on the assembler instruction level) as well as high-level aspects (on the programming language level) is presented.The concepts of PTA are demonstrated with a detailed example. Also some experimental results are given. On one hand the low-level analysis covers all speed-up mechanisms used for modern superscalar processors: pipelining, instruction-level parallelism and caching. It can handle a unified cache as well as separate caches for data and instructions. The pipelined and parallel execution of assembler instructions is analysed a-priori. Also, the analysis predicts whether memory accesses will hit the caches at run-time. On the other hand the high-level analysis addresses the problem of using the results from the low-level to compute the final estimate on the WCET. This is done by a heuristic for searching the longest really executable path in the control flow, i.e. by taking into account functional dependencies between various program parts. The heuristic represents a reasonable trade-off between accuracy and effort. By exploiting the simple structure of the input code, no user-annotations are necessary, resulting in a safe and efficient analysis.

