Results 1 - 10
of
14
Predator: a predictable SDRAM memory controller
- In CODES+ISSS
, 2007
"... Memory requirements of intellectual property components (IP) in contemporary multi-processor systems-on-chip are increasing. Large high-speed external memories, such as DDR2 SDRAMs, are shared between a multitude of IPs to satisfy these requirements at a low cost per bit. However, SDRAMs have highly ..."
Abstract
-
Cited by 68 (4 self)
- Add to MetaCart
(Show Context)
Memory requirements of intellectual property components (IP) in contemporary multi-processor systems-on-chip are increasing. Large high-speed external memories, such as DDR2 SDRAMs, are shared between a multitude of IPs to satisfy these requirements at a low cost per bit. However, SDRAMs have highly variable access times that depend on previous requests. This makes it difficult to accurately and analytically determine latencies and the useful bandwidth at design time, and hence to guarantee that hard real-time requirements are met. The main contribution of this paper is a memory controller design that provides a guaranteed minimum bandwidth and a maximum latency bound to the IPs. This is accomplished using a novel two-step approach to predictable SDRAM sharing. First, we define memory access groups, corresponding to precomputed sequences of SDRAM commands, with known efficiency and latency. Second, a predictable arbiter is used to schedule these groups dynamically at run-time, such that an allocated bandwidth and a maximum latency bound is guaranteed to the IPs. The approach is general and covers all generations of SDRAM. We present a modular implementation of our memory controller that is efficiently integrated into the network interface of a network-on-chip. The area of the implementation is cheap, and scales linearly with the number of IPs. An instance with six ports runs at 200 MHz and requires 0.042 mm 2 in 0.13μm CMOS technology.
Real-time scheduling using creditcontrolled static-priority arbitration, in:
- Proc. Int’l Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), IEEE Computer Society,
, 2008
"... Abstract The convergence of application domains in new systemson-chip (SoC) ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
(Show Context)
Abstract The convergence of application domains in new systemson-chip (SoC)
Bounding WCET of Applications Using SDRAM with Priority Based Budget Scheduling in MPSoCs
- In Proc. DATE
, 2012
"... Abstract—SDRAM is a popular off-chip memory that provides large data storage, high data rates, and is in general significantly cheaper than SRAM. There is a growing interest in using SDRAMs in safety critical application domains like aerospace, automotive and industrial automation. Some of these app ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
(Show Context)
Abstract—SDRAM is a popular off-chip memory that provides large data storage, high data rates, and is in general significantly cheaper than SRAM. There is a growing interest in using SDRAMs in safety critical application domains like aerospace, automotive and industrial automation. Some of these applications have hard real-time requirements where missing a deadline can have devastating consequence. Before integrating any hardware or software in this type of system it needs to be proven that deadlines will always be met. In practice, this is done by analyzing application’s timing behavior and calculating its Worst Case Execution Time (WCET). SDRAMs have variable access latencies depending on the refresh operation and the previous accesses. This paper builds on hardware techniques such as bank interleaving and applying Priority Based Budget Scheduling (PBS) to share the SDRAM among multiple masters. Its main contribution is a technique to bound the WCET of an application accessing a shared SDRAM of a multicore architecture using the worst case access pattern. We implemented and tested an overall memory system on an Altera Cyclone III FPGA and applied the proposed WCET estimation technique. The results show that our technique produces safe and low WCET bounds. I.
Application-specific workload shaping in multimedia-enabled personal mobile devices
- In Proc. of the 4th International Conference on Hardware Software Codesign
, 2006
"... Today, most personal mobile devices (e.g. cell phones and PDAs) are multimedia-enabled and support a variety of concurrently running applications such as audio/video players, word processors and web browsers. Media-processing applications are often computationally expensive and most of these devices ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
(Show Context)
Today, most personal mobile devices (e.g. cell phones and PDAs) are multimedia-enabled and support a variety of concurrently running applications such as audio/video players, word processors and web browsers. Media-processing applications are often computationally expensive and most of these devices typically have 100 – 400 MHz processors. As a result, the user-perceived application response times are often poor when multiple applications are concurrently fired. In this paper we show that by using application-specific dynamic buffering techniques, the workload of these applications can be suitably “shaped ” to fit the available processor bandwidth. Our techniques are analogous to traffic shaping which is widely used in communication networks to optimally utilize network bandwidth. Such shaping techniques have recently attracted a lot of attention in the context of embedded systems design (e.g. for dynamic voltage scaling). However, they have not been exploited for enhanced schedulability of multiple applications, as we do in this paper.
An SDRAM-aware router for networks-on-chip
- DAC'09
, 2009
"... In this paper, we present an NoC (Networks-on-Chip) router with an SDRAM-aware flow control. Based on a priority-based arbitration, it schedules packets to improve memory utilization and reduce memory latency. Moreover, our multi-scheduling scheme performed by the multiple SDRAM-aware routers helps ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
In this paper, we present an NoC (Networks-on-Chip) router with an SDRAM-aware flow control. Based on a priority-based arbitration, it schedules packets to improve memory utilization and reduce memory latency. Moreover, our multi-scheduling scheme performed by the multiple SDRAM-aware routers helps to achieve better SDRAM performance and save the hardware cost of NoC platform. Experimental results show that our SDRAM-aware router improves memory latency by 18 % and memory utilization by 4.9 % on average with over 42 % saving of gate count of the NoC platform with dual memory subsystem.
Conservative open-page policy for mixed time-criticality memory controllers
- In Proceedings of the Conference on Design, Automation and Test in Europe, DATE ’13
, 2013
"... Abstract—Complex Systems-on-Chips (SoC) are mixed time-criticality systems that have to support firm real-time (FRT) and soft real-time (SRT) applications running in parallel. This is chal-lenging for critical SoC components, such as memory controllers. Existing memory controllers focus on either fi ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
Abstract—Complex Systems-on-Chips (SoC) are mixed time-criticality systems that have to support firm real-time (FRT) and soft real-time (SRT) applications running in parallel. This is chal-lenging for critical SoC components, such as memory controllers. Existing memory controllers focus on either firm real-time or soft real-time applications. FRT controllers use a close-page policy that maximizes worst-case performance and ignore opportunities to exploit locality, since it cannot be guaranteed. Conversely, SRT controllers try to reduce latency and consequently processor stalling by speculating on locality. They often use an open-page policy that sacrifices guaranteed performance, but is beneficial in the average case. This paper proposes a conservative open-page policy that improves average-case performance of a FRT controller in terms of bandwidth and latency without sacrificing real-time guarantees. As a result, the memory controller efficiently handles both FRT and SRT applications. The policy keeps pages open as long as possible without sacrificing guarantees and captures locality in this window. Experimental results show that on average 70 % of the locality is captured for applications in the CHStone benchmark, reducing the execution time by 17 % compared to a close-page policy. The effectiveness of the policy is also evaluated in a multi-application use-case, and we show that the overall average-case performance improves if there is at least one FRT or SRT application that exploits locality. I.
Classification and Analysis of Predictable Memory Patterns
"... Abstract—The verification complexity of real-time require-ments in embedded systems grows exponentially with the number of applications, as resource sharing prevents independent veri-fication using simulation-based approaches. Formal verification is a promising alternative, although its applicabilit ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Abstract—The verification complexity of real-time require-ments in embedded systems grows exponentially with the number of applications, as resource sharing prevents independent veri-fication using simulation-based approaches. Formal verification is a promising alternative, although its applicability is limited to systems with predictable hardware and software. SDRAM mem-ories are common examples of essential hardware components with unpredictable timing behavior, typically preventing use of formal approaches. A predictable SDRAM controller has been proposed that provides guarantees on bandwidth and latency by dynamically scheduling memory patterns, which are stati-cally computed sequences of SDRAM commands. However, the proposed patterns become increasingly inefficient as memories become faster, making them unsuitable for DDR3 SDRAM. This paper extends the memory pattern concept in two ways. Firstly, we introduce a burst count parameter that enables patterns to have multiple SDRAM bursts per bank, which is required for DDR3 memories to be used efficiently. Secondly, we present a classification of memory pattern sets into four categories based on the combination of patterns that cause worst-case bandwidth and latency to be provided. Bounds on bandwidth and latency are derived that apply to all pattern types and burst counts, as opposed to the single case covered by earlier work. Experimental results show that these extensions are required to support the most efficient pattern sets for many use-cases. We also demonstrate that the burst count parameter increases efficiency in presence of large requests and enables a wider range of real-time requirements to be satisfied. Index Terms—predictability; SDRAM; memory controller; memory patterns; burst count; classification I.
Automatic Generation of Efficient Predictable Memory Patterns
"... Abstract—Verifying firm real-time requirements gets increas-ingly complex, as the number of applications in embedded systems grows. Predictable systems reduce the complexity by enabling formal verification. However, these systems require predictable software and hardware components, which is prob-le ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract—Verifying firm real-time requirements gets increas-ingly complex, as the number of applications in embedded systems grows. Predictable systems reduce the complexity by enabling formal verification. However, these systems require predictable software and hardware components, which is prob-lematic for resources with highly variable execution times, such as SDRAM controllers. A predictable SDRAM controller has been proposed that addresses this problem using predictable memory patterns, which are precomputed sequences of SDRAM commands. However, the memory patterns are derived manually, which is a time-consuming and error-prone process that must be repeated for every memory device, and may result in inefficient use of scarce and expensive bandwidth. This paper addresses this issue by proposing three algorithms for automatic generation of efficient memory patterns that pro-vide different trade-offs between run-time of the algorithm and the bandwidth guaranteed by the controller. We experimentally evaluate the algorithms for a number of DDR2/DDR3 memories and show that an appropriate choice of algorithm reduces run-time to less than a second and increases the guaranteed bandwidth by up to 10.2%. Index Terms—predictability; real-time; SDRAM; memory con-troller; memory patterns; pattern generation; memory efficiency I.
The 14th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications Real-Time Scheduling Using Credit-Controlled Static-Priority Arbitration
"... The convergence of application domains in new systemson-chip (SoC) results in systems with many applications with a mix of soft and hard real-time requirements. To reduce cost, resources, such as memories and interconnect, are shared between applications. However, resource sharing introduces interfe ..."
Abstract
- Add to MetaCart
The convergence of application domains in new systemson-chip (SoC) results in systems with many applications with a mix of soft and hard real-time requirements. To reduce cost, resources, such as memories and interconnect, are shared between applications. However, resource sharing introduces interference between the sharing applications, making it difficult to satisfy their real-time requirements. Existing arbiters do not efficiently satisfy the requirements of applications in SoCs, as they either couple rate or allocation granularity to latency, or cannot run at high speeds in hardware with a low-cost implementation. The contribution of this paper is an arbiter called Credit-Controlled Static-Priority (CCSP), consisting of a rate regulator and a static-priority scheduler. The rate regulator isolates applications by regulating the amount of provided service in a way that decouples allocation granularity and latency. The static-priority scheduler decouples latency and rate, such that low latency can be provided to any application, regardless of the allocated rate. We show that CCSP belongs to the class of latency-rate servers and guarantees the allocated rate within a maximum latency, as required by hard real-time applications. We present a hardware implementation of the arbiter in the context of a DDR2 SDRAM controller. An instance with six ports running at 200 MHz requires an area of 0.0223 mm2 in a 90 nm CMOS process. 1.
A Scalable Processor with Embedded Software for Large-Scale Scientific Applications
"... We present a scalable, dynamically reconfigurable FPGAbased processor design that encompasses both reconfigurable circuitry and software programmability for supercomputing applications. Advanced FPGA chips contain both reconfigurable logic blocks and embedded processor cores, providing the developer ..."
Abstract
- Add to MetaCart
(Show Context)
We present a scalable, dynamically reconfigurable FPGAbased processor design that encompasses both reconfigurable circuitry and software programmability for supercomputing applications. Advanced FPGA chips contain both reconfigurable logic blocks and embedded processor cores, providing the developer with an environment for embedded system design 1. Since the reconfigurable fabric and the embedded processor cores can be programmed and used independently of each other or in any combination with each other, the designer has a flexible platform in which new avenues of research are possible. II. Related Work The necessity of simulation to measure performance and validate computer architectures is widely accepted, however the expense with respect to simulation time and computer