DMCA
Cache-Aware Scheduling and Analysis for Multicores ∗
Citations: | 35 - 5 self |
Citations
314 | An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches,”
- Kim, Burger, et al.
- 2002
(Show Context)
Citation Context ...es could be non-uniform in terms of accessing speed: data residing in the part of a large cache close to the core could be accessed much faster than data residing physically farther from the core. In =-=[22]-=-, it was shown in an example that in a 16-megabyte on-chip L2 cache built in a 50-nanometer processor technology, the closest bank could be accessed in 4 cycles while an access to the farthest bank mi... |
165 |
The worst-case execution-time problem—overview of methods and survey of tools,” Trans.
- Wilhelm, Engblom, et al.
- 2008
(Show Context)
Citation Context ...urces on multicores, which severely degrade the timing predictability of multicore systems due to the cache contention between cores. For single processor systems, there are well-developed techniques =-=[30]-=- for timing analysis of embedded software. Using these techniques, the worst-case execution time (WCET) of real-time tasks may be estimated, and then used for systemlevel timing analyses like schedula... |
138 | Static-priority scheduling on multiprocessors.
- Andersson, Baruah, et al.
- 2001
(Show Context)
Citation Context ...bounds of a task’s WCET 1 , with which we can do safe schedulability analysis for the task system. The schedulability analysis problem of global multiprocessor scheduling has been intensively studied =-=[3, 4, 9, 6, 21, 22]-=-. These analysis techniques are also extended to deal with more general cases, e.g., the global scheduling on 1-D FPGAs [15, 19], where a task may occupy multiple resources (columns on FPGAs) during e... |
127 | Multiprocessor EDF and deadline monotonic schedulability analysis,” in RTSS,
- Baker
- 2003
(Show Context)
Citation Context ...r bounds of a task’s WCET1, with which we can do safe schedulability analysis for the task system. The schedulability analysis problem of global multiprocessor scheduling has been intensively studied =-=[3, 4, 9, 6, 21, 23]-=-. These analysis techniques are also extended to deal with more general cases, e.g., the global scheduling on 1-D FPGAs [16, 20], where a task may occupy multiple resources (columns on FPGAs) during e... |
108 | Avoiding conflict misses dynamically in large direct-mapped caches
- Bershad, Lee, et al.
- 1994
(Show Context)
Citation Context ... the contention among multiple cores in a shared cache” [27].The goal of this paper is not to solve the above challenging problem. Instead, we use cache partitioning techniques such as page-coloring =-=[8]-=- combined with scheduling to isolate the cache spaces of hard real-time tasks running simultaneously to avoid the interference between them. This yields an efficient method – cache space isolation – t... |
96 | Techniques for multiprocessor global schedulability analysis.
- Baruah
- 2007
(Show Context)
Citation Context ...bounds of a task’s WCET 1 , with which we can do safe schedulability analysis for the task system. The schedulability analysis problem of global multiprocessor scheduling has been intensively studied =-=[3, 4, 9, 6, 21, 22]-=-. These analysis techniques are also extended to deal with more general cases, e.g., the global scheduling on 1-D FPGAs [15, 19], where a task may occupy multiple resources (columns on FPGAs) during e... |
96 | OS-Controlled Cache Predictability for Real-Time Systems,”
- Liedtke, Hartig, et al.
- 1997
(Show Context)
Citation Context ...e of physical addresses. This restricts the memory size available to each task, as well as flexibility for recoloring. These problems can be compensated for by a simple rewiring trick as described in =-=[23]-=-. Therefore it is reasonable for our model to assume a cache with equally sized cache partitions that can be assigned and reassigned arbitrarily during the lifetimes of the tasks in question. 3.2 Task... |
87 | Improved schedulability analysis of EDF on multiprocessor platforms,”
- Bertogna, Cirinei, et al.
- 2005
(Show Context)
Citation Context ...bounds of a task’s WCET 1 , with which we can do safe schedulability analysis for the task system. The schedulability analysis problem of global multiprocessor scheduling has been intensively studied =-=[3, 4, 9, 6, 21, 22]-=-. These analysis techniques are also extended to deal with more general cases, e.g., the global scheduling on 1-D FPGAs [15, 19], where a task may occupy multiple resources (columns on FPGAs) during e... |
64 | Bus access optimization for predictable implementation of real-time applications on multiprocessor systems-on-chip,” in
- Rosen, Andrei, et al.
- 2007
(Show Context)
Citation Context ...nalysis, we need isolation techniques for all the shared resources. For the on-chip shared bus bandwidth, techniques such as time-slicing, round-robin and prioritized access have been studied in e.g. =-=[25, 24]-=-. In this paper, we shall focus on shared caches only, and study the scheduling and analysis problem for hard real-time tasks with timing and cache space constraints, on multicores with shared L2 cach... |
60 | A comparison of global and partitioned EDF schedulability tests for multiprocessors.
- Baker
- 2005
(Show Context)
Citation Context ... proof is omitted due to the page limitation. 7. PERFORMANCE EVALUATION At first we evaluate the performance of the proposed schedulability tests in terms of acceptance ratio. We follow the method in =-=[5]-=- to generate task sets: A task set of M + 1 tasks is generated and tested. Then we iteratively increase the number of tasks by 1 to generate a new task set, and all the schedulability tests are run on... |
59 |
Software-based cache partitioning for real-time applications
- Wolfe
- 1994
(Show Context)
Citation Context ...ime is a concept which has already been used, most notably, for reducing interference in order to improve average case performance or to increase predictability in singlecore settings with preemption =-=[31, 16, 10]-=-. Different approaches may be used to achieve cache partitioning. Assuming a k-associative cache that consists of l cache sets with k cache lines each, one can distinguish setbased [8] and associativi... |
56 | Hardware support for wcet analysis of hard real-time multicore systems,”
- Paolieri, Quiones, et al.
- 2009
(Show Context)
Citation Context ...nalysis, we need isolation techniques for all the shared resources. For the on-chip shared bus bandwidth, techniques such as time-slicing, round-robin and prioritized access have been studied in e.g. =-=[25, 24]-=-. In this paper, we shall focus on shared caches only, and study the scheduling and analysis problem for hard real-time tasks with timing and cache space constraints, on multicores with shared L2 cach... |
53 | WCET analysis for multi-core processors with shared L2 instruction caches.
- Yan, Zhang
- 2008
(Show Context)
Citation Context ...grams (not sequential programs as for the case of single processor systems) running on different cores. To our best knowledge, the only known work on WCET analysis for multicores with shared cache is =-=[32]-=-, which is only applicable to a special scenario and very simple hardware architecture (we will discuss its limitation in Section 2). Researchers in the WCET analysis community agree that “it will be ... |
49 | Managing shared l2 caches on multicore systems in software.
- Tam, Azimi, et al.
- 2007
(Show Context)
Citation Context ...directly) mapped on a particular subset of all cache sets. The number of available page colors by that method is therefore 2 (m1+m2)−n . An example system supporting cache partitioning is reported in =-=[29]-=-, where the authors modified the Linux kernel to support page-coloring based cache space isolation, in which 16 colors are supported, and conducted intensive experiments on a Power 5 dual-core process... |
45 | Dynamic cache partitioning via columnization.
- Chiou, Jain, et al.
- 2000
(Show Context)
Citation Context ...nt approaches may be used to achieve cache partitioning. Assuming a k-associative cache that consists of l cache sets with k cache lines each, one can distinguish setbased [8] and associativity-based =-=[14]-=- partitioning. The first one is also called row-based partitioning and assigns different cache sets to different partitions. It therefore enables up to l partitions and is thus quite fine-grained for ... |
45 | Exploring locking & partitioning for predictable shared caches on multi-cores
- Suhendra, Mitra
(Show Context)
Citation Context ...e WCET analysis community agree that “it will be extremely difficult, if not impossible, to develop analysis methods that can accurately capture the contention among multiple cores in a shared cache” =-=[27]-=-.The goal of this paper is not to solve the above challenging problem. Instead, we use cache partitioning techniques such as page-coloring [8] combined with scheduling to isolate the cache spaces of ... |
42 | Real-time scheduling on multicore platforms,”
- Anderson, Calandrino, et al.
- 2006
(Show Context)
Citation Context ...a high-miss-rate co-runner, as opposed to a low-miss-rate co-runner. L2 contention can be reduced by discouraging threads with heavy memory-to-L2 traffic from being co-scheduled [17]. Anderson et al. =-=[2, 1, 12]-=- applied the policy of encouraging or discouraging the co-scheduling of tasks (or jobs), to improve the cache performance and also to meet the realtime constraints. All these works assumed that the WC... |
31 |
lp solve: a mixed integer linear program solver.
- Berkelaar
- 1999
(Show Context)
Citation Context ...mentioned earlier, the second test is of O(N 2 ) complexity. The scalability of the first test is of our special concerns since it employs the LP formulation. We use the open source LP solver lpsolve =-=[7]-=- to solve the LP formulation of the first test. Table 3 shows the running time and maximal peak memory usage of lpsolve with different task set scales. The experiment is conducted on a normal desktop ... |
25 | Cache-aware real-time scheduling on multicore platforms: Heuristics and a case study, In:
- Calandrino, Anderson
- 2008
(Show Context)
Citation Context ...a high-miss-rate co-runner, as opposed to a low-miss-rate co-runner. L2 contention can be reduced by discouraging threads with heavy memory-to-L2 traffic from being co-scheduled [17]. Anderson et al. =-=[2, 1, 12]-=- applied the policy of encouraging or discouraging the co-scheduling of tasks (or jobs), to improve the cache performance and also to meet the realtime constraints. All these works assumed that the WC... |
24 | Integrated scratchpad memory optimization and task scheduling for MPSoC architecture,” in CASES,
- Suhendra, Raghavan, et al.
- 2006
(Show Context)
Citation Context ...h processor, the total utilization of the allocated tasks is no larger than 1, as well as the total memory size of the allocated tasks does not exceed the processor’s memory capacity. Suhendra et al. =-=[28]-=- and Salamy et al. [26] studied the problem of how to statically allocate and schedule a task graph onto a MPSoC, in which each processor 1 In this paper we focus on the interference caused by the sha... |
22 | Calandrino, “Parallel real-time task scheduling on multicore platforms
- Anderson, M
(Show Context)
Citation Context ...a high-miss-rate co-runner, as opposed to a low-miss-rate co-runner. L2 contention can be reduced by discouraging threads with heavy memory-to-L2 traffic from being co-scheduled [17]. Anderson et al. =-=[2, 1, 12]-=- applied the policy of encouraging or discouraging the co-scheduling of tasks (or jobs), to improve the cache performance and also to meet the realtime constraints. All these works assumed that the WC... |
22 |
Impact of cache partitioning on multi-tasking real time embedded systems.
- Bui, Caccamo, et al.
- 2008
(Show Context)
Citation Context ...ime is a concept which has already been used, most notably, for reducing interference in order to improve average case performance or to increase predictability in singlecore settings with preemption =-=[31, 16, 10]-=-. Different approaches may be used to achieve cache partitioning. Assuming a k-associative cache that consists of l cache sets with k cache lines each, one can distinguish setbased [8] and associativi... |
17 |
Throughput-oriented scheduling on chip multithreading systems
- FEDOROVA, SELTZER, et al.
- 2004
(Show Context)
Citation Context ...he cache-aware scheduling, and finally, conclusions are given in Section 9. 2. RELATED WORK Since L2 misses affect the system performance to a much greater extent than L1 misses or pipeline conflicts =-=[17]-=-, the shared cache contention may dramatically degrade the system performance and predictability. Chandra et al. [13] showed that a thread’s execution time may be up to 65% longer when it runs with a ... |
13 | Hybrid instruction cache partitioning for preemptive real-time systems
- Busquets-Mataix, Serrano, et al.
- 1997
(Show Context)
Citation Context ...ion. For single-processor multi-tasking systems, cache space isolation allows compositional timing analysis where the WCET of tasks can be estimated separately using existing WCET analysis techniques =-=[11]-=-. For multicores, to enable compositional timing analysis, we need isolation techniques for all the shared resources. For the on-chip shared bus bandwidth, techniques such as time-slicing, round-robin... |
11 | A unified hard/soft real-time schedulability test for global edf multiprocessor scheduling
- Leontyev, Anderson
- 2008
(Show Context)
Citation Context ...bounds of a task’s WCET 1 , with which we can do safe schedulability analysis for the task system. The schedulability analysis problem of global multiprocessor scheduling has been intensively studied =-=[3, 4, 9, 6, 21, 22]-=-. These analysis techniques are also extended to deal with more general cases, e.g., the global scheduling on 1-D FPGAs [15, 19], where a task may occupy multiple resources (columns on FPGAs) during e... |
10 |
An EDF Schedulability Test for Periodic Tasks on Reconfigurable Hardware Devices,
- Danne, Platzner
- 2006
(Show Context)
Citation Context ...obal multiprocessor scheduling has been intensively studied [3, 4, 9, 6, 21, 22]. These analysis techniques are also extended to deal with more general cases, e.g., the global scheduling on 1-D FPGAs =-=[15, 19]-=-, where a task may occupy multiple resources (columns on FPGAs) during execution. However, all these techniques are not applicable to our problem, since with cache space isolation, tasks are actually ... |
9 |
Predicting Inter-Thread Cache Contention on a Multi-Processor Architecture
- Chandra, Guo, et al.
(Show Context)
Citation Context ...e system performance to a much greater extent than L1 misses or pipeline conflicts [17], the shared cache contention may dramatically degrade the system performance and predictability. Chandra et al. =-=[13]-=- showed that a thread’s execution time may be up to 65% longer when it runs with a high-miss-rate co-runner than with a low-miss-rate co-runner. Such dramatic slowdowns were due to significant increas... |
9 | New schedulability test conditions for non-preemptive scheduling on multiprocessor platforms
- Guan, Yi, et al.
- 2008
(Show Context)
Citation Context ...bounds of a task’s WCET 1 , with which we can do safe schedulability analysis for the task system. The schedulability analysis problem of global multiprocessor scheduling has been intensively studied =-=[3, 4, 9, 6, 21, 22]-=-. These analysis techniques are also extended to deal with more general cases, e.g., the global scheduling on 1-D FPGAs [15, 19], where a task may occupy multiple resources (columns on FPGAs) during e... |
9 |
Sanjoy Baruah. A categorization of real-time multiprocessor scheduling problems and algorithms
- Carpenter, Funk, et al.
- 2004
(Show Context)
Citation Context ...ar to those introduced in this paper. The detailed analysis derivation is shown in the appendix. One can also apply cache space isolation in a way similar to the partitioned multiprocessor scheduling =-=[13]-=-: each task is assigned to a core and a set of cache partitions in advance. One reason for us to be interested in the partitioned scheduling is that the shared cache on multicores could be non-uniform... |
8 | Task partitioning upon memoryconstrained multiprocessors
- Fisher, Anderson, et al.
- 2005
(Show Context)
Citation Context ... execution. However, all these techniques are not applicable to our problem, since with cache space isolation, tasks are actually scheduled on two resources: cores and the shared cache. Fisher et al. =-=[18]-=- studied the problem of static allocation of periodic tasks onto a multiprocessor platform such that on each processor, the total utilization of the allocated tasks is no larger than 1, as well as the... |
4 | Comparing caching techniques for multitasking real-time systems
- Dropsho, Weems
- 1997
(Show Context)
Citation Context ...ime is a concept which has already been used, most notably, for reducing interference in order to improve average case performance or to increase predictability in singlecore settings with preemption =-=[31, 16, 10]-=-. Different approaches may be used to achieve cache partitioning. Assuming a k-associative cache that consists of l cache sets with k cache lines each, one can distinguish setbased [8] and associativi... |
4 | Schedulability analysis of preemptive and nonpreemptive EDF on partial runtime-reconfigurable FPGAs
- Guan
- 2008
(Show Context)
Citation Context ...obal multiprocessor scheduling has been intensively studied [3, 4, 9, 6, 21, 22]. These analysis techniques are also extended to deal with more general cases, e.g., the global scheduling on 1-D FPGAs =-=[15, 19]-=-, where a task may occupy multiple resources (columns on FPGAs) during execution. However, all these techniques are not applicable to our problem, since with cache space isolation, tasks are actually ... |
1 |
A framework for task scheduling and memory partitioning for multi-processor system-on-chip
- Salamy, Ramanujam
- 2009
(Show Context)
Citation Context ...utilization of the allocated tasks is no larger than 1, as well as the total memory size of the allocated tasks does not exceed the processor’s memory capacity. Suhendra et al. [28] and Salamy et al. =-=[26]-=- studied the problem of how to statically allocate and schedule a task graph onto a MPSoC, in which each processor 1 In this paper we focus on the interference caused by the shared L2 cache, and there... |
1 |
Sanjoy Baruah. Task partitioning upon memory-constrained multiprocessors
- Fisher, Anderson
- 2005
(Show Context)
Citation Context ... execution. However, all these techniques are not applicable to our problem, since with cache space isolation, tasks are actually scheduled on two resources: cores and the shared cache. Fisher et al. =-=[19]-=- studied the problem of static allocation of periodic tasks onto a multiprocessor platform such that on each processor, the total utilization of the allocated tasks is no larger than 1, as well as the... |