Results 1 - 10
of
14
Data Cache Locking for Higher Program Predictability
, 2003
"... Caches have become increasingly important with the widening gap between main memory and processor speeds. However, they are a source of unpredictability due to their characteristics, resulting in programs behaving in a different way than expected. Cache locking ..."
Abstract
-
Cited by 32 (2 self)
- Add to MetaCart
Caches have become increasingly important with the widening gap between main memory and processor speeds. However, they are a source of unpredictability due to their characteristics, resulting in programs behaving in a different way than expected. Cache locking
Let’s Study Whole-Program Cache Behaviour Analytically
- In Proceedings of International Symposium on High-Performance Computer Architecture (HPCA 8
, 2002
"... ..."
Data caches in multitasking hard real-time systems
- IN IEEE REAL-TIME SYSTEMS SYMPOSIUM
, 2003
"... Data caches are essential in modern processors, bridging the widening gap between main memory and processor speeds. However, they yield very complex performance models, which makes it hard to bound execution times tightly. This paper contributes a new technique to obtain predictability in preemptive ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
Data caches are essential in modern processors, bridging the widening gap between main memory and processor speeds. However, they yield very complex performance models, which makes it hard to bound execution times tightly. This paper contributes a new technique to obtain predictability in preemptive multitasking systems in the presence of data caches. We explore the use of cache partitioning, dynamic cache locking and static cache analysis to provide worst-case performance estimates in a safe and tight way. Cache partitioning divides the cache among tasks to eliminate inter-task cache interferences. We combine static cache analysis and cache locking mechanisms to ensure that all intra-task conflicts, and consequently, memory access times, are exactly predictable. To minimize the performance degradation due to cache partitioning and locking, two strategies are employed. First, the cache is loaded with data likely to be accessed so that their cache utilization is maximized. Second, compiler optimizations such as tiling and padding are applied in order to reduce cache replacement misses. Experimental results show that this scheme is fully predictable, without compromising the performance of the transformed programs. Our method outperforms static cache locking for all analyzed task sets under various cache architectures, with a CPU utilization reduction ranging between 3.8 and 20.0 times for a high performance system.
Bounding preemption delay within data cache reference patterns for real-time tasks
- In IEEE Real-Time Embedded Technology and Applications Symposium
, 2006
"... Caches have become invaluable for higher-end architectures to hide, in part, the increasing gap between processor speed and memory access times. While the effect of caches on timing predictability of single real-time tasks has been the focus of much research, bounding the overhead of cache warm-ups ..."
Abstract
-
Cited by 15 (6 self)
- Add to MetaCart
Caches have become invaluable for higher-end architectures to hide, in part, the increasing gap between processor speed and memory access times. While the effect of caches on timing predictability of single real-time tasks has been the focus of much research, bounding the overhead of cache warm-ups after preemptions remains a challenging problem, particularly for data caches. In this paper, we bound the penalty of cache interference for real-time tasks by providing accurate predictions of the data cache behavior across preemptions. For every task, we derive data cache reference patterns for all scalar and non-scalar references. Partial timing of a task is performed up to a preemption point using these patterns. The effects of cache interference are then analyzed using a settheoretic approach, which identifies the number and location of additional misses due to preemption. A feedback mechanism provides the means to interact with the timing analyzer, which subsequently times another interval of a task bounded by the next preemption. Our experimental results demonstrate that it is sufficient to consider the n most expensive preemption points, where n is the maximum possible number of preemptions. Further, it is shown that such accurate modeling of data cache behavior in preemptive systems significantly improves the WCET predictions for a task. To the best of our knowledge, our work of bounding preemption delay for data caches is unprecedented. 1.
Bounding worst-case data cache behavior by analytically deriving cache reference patterns
- In IEEE Real-Time Embedded Technology and Applications Symposium
, 2005
"... While caches have become invaluable for higher-end architectures due to their ability to hide, in part, the gap between processor speed and memory access times, caches (and particularly data caches) limit the timing predictability for data accesses that may reside in memory or in cache. This is a si ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
While caches have become invaluable for higher-end architectures due to their ability to hide, in part, the gap between processor speed and memory access times, caches (and particularly data caches) limit the timing predictability for data accesses that may reside in memory or in cache. This is a significant problem for real-time systems. The objective our work is to provide accurate predictions of data cache behavior of scalar and non-scalar references whose reference patterns are known at compile time. Such knowledge about cache behavior provides the basis for significant improvements in bounding the worst-case execution time (WCET) of real-time programs, particularly for hardto-analyze data caches. We exploit the power of the Cache Miss Equations (CME) framework but lift a number of limitations of traditional CME to generalize the analysis to more arbitrary programs. We further devised a transformation, coined “forced ” loop fusion, which facilitates the analysis across sequential loops. Our contributions result in exact data cache reference patterns — in contrast to approximate cache miss behavior of prior work. Experimental results indicate improvements on the accuracy of worst-case data cache behavior up to two orders of magnitude over the original approach. In fact, our results closely bound and sometimes even exactly match those obtained by trace-driven simulation for worst-case inputs. The resulting WCET bounds of timing analysis confirm these findings in terms of providing tight bounds. Overall, our contributions lift analytical approaches to predict data cache behavior to a level suitable for efficient static timing analysis and, subsequently, real-time schedulability of tasks with predictable WCET. 1.
Tightening the bounds on feasible preemption points
- In Proc. of the 27th IEEE International Real-Time Systems Symposium (RTSS
, 2006
"... Caches have become invaluable for higher-end architectures to hide, in part, the increasing gap between processor speed and memory access times. While the effect of caches on timing predictability of single real-time tasks has been the focus of much research, bounding the overhead of cache warm-ups ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
Caches have become invaluable for higher-end architectures to hide, in part, the increasing gap between processor speed and memory access times. While the effect of caches on timing predictability of single real-time tasks has been the focus of much research, bounding the overhead of cache warm-ups after preemptions remains a challenging problem, particularly for data caches. This paper makes multiple contributions. First, we bound the penalty of cache interference for real-time tasks by providing accurate predictions of the data cache behavior across preemptions, including instruction cache and pipeline effects. For every task, we derive data cache reference patterns for all scalar and non-scalar references. We show that, when considering cache preemption, the critical instance does not occur upon simultaneous release of all tasks. Second, we develop analysis methods to calculate tight upper bounds on the number of possible preemption points for each job of a task and consider the worst-case placement of these preemption points. Partial timing of a job is performed up to a preemption point using the cache reference patterns. The effects of cache interference are then analyzed using a set-theoretic approach, which identifies the number and location of additional misses due to preemption. A feedback mechanism provides the means to interact with the timing analyzer, which subsequently times another interval of a job bounded by the next preemption. Significant improvements in tightening bounds of up to an order of magnitude over two prior methods and up to half a magnitude over a third prior method are obtained by experiments for (a) the number of preemptions, (b) the WCET and (c) the response time of a task. Overall, this work contributes (1) by formulating a new critical instance under cache preemption, (2) by proving a new analysis method to derive bounds on the number of preemptions and (3) by determining actual preemption points when calculating the preemption delay under consideration of data caches. 1.
Static Analysis of Parameterized Loop Nests for Energy Efficient Use of Data Caches
- In Proceedings of Workshop on Compilers and Operating Systems for Low Power (COLP
, 2001
"... Caches are an important... In this paper, we examine efficient utilization of data caches for low power in an adaptive memory hierarchy. We focus on the optimization of data reuse through the static analysis of line size adaptivity. We present an approach that enables the quantification of data miss ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Caches are an important... In this paper, we examine efficient utilization of data caches for low power in an adaptive memory hierarchy. We focus on the optimization of data reuse through the static analysis of line size adaptivity. We present an approach that enables the quantification of data misses w.r.t. cache line size at compile-time. This analysis is implemented in a software package STAMINA. Experimental results demonstrate effectiveness and accuracy of the analytical results compared to alternative simulation based methods.
Optimizing Program Locality through CMEs and GAs
- IN PROC. PACT
, 2003
"... Caches have become increasingly important with the widening gap between main memory and processor speeds. Small and fast cache memories are designed to bridge this discrepancy. However, they are only effective when programs exhibit sufficient data locality. Performance of ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Caches have become increasingly important with the widening gap between main memory and processor speeds. Small and fast cache memories are designed to bridge this discrepancy. However, they are only effective when programs exhibit sufficient data locality. Performance of
Tightening the Bounds on Feasible Preemptions
"... Data Caches are an increasingly important architectural feature in most modern computer systems. They help bridge the gap between processor speeds and memory access times. One inherent difficulty of using data caches in a real-time system is the unpredictability of memory accesses, which makes it di ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Data Caches are an increasingly important architectural feature in most modern computer systems. They help bridge the gap between processor speeds and memory access times. One inherent difficulty of using data caches in a real-time system is the unpredictability of memory accesses, which makes it difficult to calculate worst-case execution times (WCETs) of real-time tasks. While cache analysis for single real-time tasks has been the focus of much research in the past, bounding the preemption delay in a multi-task preemptive environment is a challenging problem, particularly for data caches. This paper makes multiple contributions in the context of independent, periodic tasks with deadlines less than or equal to their periods executing on a single processor. 1) For every task, we derive data cache reference patterns for all scalar and non-scalar references. These patterns are used to derive an upper bound on the WCET of real-time tasks. 2) We show that, when considering cache preemption effects, the critical instant does not occur upon simultaneous release of all tasks. We provide results for task sets with phase differences to prove our claim. 3) We develop a method to calculate tight upper bounds on the maximum number of possible preemptions for each job of a task and, considering the worst-case placement of these preemption points, derive a much tighter bound on its WCET. We provide results using both static and dynamic priority schemes. Our results show significant improvements in the bounds derived. We achieve up to an order of magnitude improvement over two prior methods and up to half an order of magnitude over a third prior method for the number of preemptions, the WCET and the response time of a task. Consideration of the best-case and worst-case execution times of higher priority jobs enables these improvements.
Coyote Project: Documentation
, 2000
"... this paper describes the cache behavior by means of a set of equations. These equations describe accurately the relationship among loop indices, array sizes, base addresses and the cache parameters for a loop nest. Some statistics-based methods has been reported to reduce the execution time of solvi ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
this paper describes the cache behavior by means of a set of equations. These equations describe accurately the relationship among loop indices, array sizes, base addresses and the cache parameters for a loop nest. Some statistics-based methods has been reported to reduce the execution time of solving the equations. We have followed the ideas presented by Vera et al. [2, 6]. Their method is based on two facts. Firstly, the equations describe convex bounded polyhedra. The integer points inside of them represent the iteration points where a potential miss occurs. By exploiting # emails:{nerina.bermudo, xavier.vera}@mdh.se some intrinsic properties of the particular types of polyhedra generated by the equations, they reduce the complexity of the algorithm. On the other hand, one of the advantages of using equations towards simulators is that they allow studying each reference in a particular iteration point independently of all other memory references. They estimate the miss ratio by means of sampling techniques

