Results 1  10
of
22
Enabling ApplicationLevel Performance Guarantees in NetworkBased Systems on Chip by Applying Dataflow Analysis
"... A growing number of applications, often with realtime requirements, are integrated on the same System on Chip (SoC), in the form of hardware and software Intellectual Property (IP). To facilitate realtime applications, Networks on Chip (NoC) guarantee bounds on latency and throughput. These boun ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
A growing number of applications, often with realtime requirements, are integrated on the same System on Chip (SoC), in the form of hardware and software Intellectual Property (IP). To facilitate realtime applications, Networks on Chip (NoC) guarantee bounds on latency and throughput. These bounds, however, only extend to the Network Interfaces (NI), between the IP and the NoC. To give performance guarantees on the application level, the buffers in the NIs must be sufficiently large for the particular application. At the same time, it is imperative to minimise the size of the NI buffers, as they are major contributors to the area and power consumption of the NoC. Existing buffersizing methods use coarsegrained application models, based on linear traffic bounds or periodic producers and consumers, thus severely limiting their applicability. In this work, we propose to capture the behaviour of the NoC and the applications using a dataflow model. This allows us to verify the temporal behaviour and to compute buffer sizes using existing dataflowanalysis techniques. We show what is required from the NoC architecture and demonstrate how to construct a NoC model, with multiple levels of detail. Using the proposed model, buffer sizes are determined for a range of SoC designs with a run
Hardrealtime scheduling of datadependent tasks in embedded streaming applications
 in Proceedings of the 9th ACM International Conference on Embedded Software, ser. EMSOFT ’11
"... Most of the hardrealtime scheduling theory for multiprocessor systems assumes independent periodic or sporadic tasks. Such a simple task model is not directly applicable to modern embedded streaming applications. This is because a modern streaming application is typically modeled as a directed gr ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
(Show Context)
Most of the hardrealtime scheduling theory for multiprocessor systems assumes independent periodic or sporadic tasks. Such a simple task model is not directly applicable to modern embedded streaming applications. This is because a modern streaming application is typically modeled as a directed graph where nodes represent actors (i.e. tasks) and edges represent datadependencies. The actors in such graphs have datadependency constraints and do not necessarily conform to the periodic or sporadic task models. Therefore, in this paper we investigate the applicability of hardrealtime scheduling theory for periodic tasks to streaming applications modeled as acyclic CycloStatic Dataflow (CSDF) graphs. In such graphs, the actors are datadependent, however, we analytically prove that they (i.e. the actors) can be scheduled as implicitdeadline periodic tasks. As a result, a variety of hardrealtime scheduling algorithms for periodic tasks can be applied to schedule such applications with a certain guaranteed throughput. We compare the throughput resulting from such scheduling approach to the maximum achievable throughput of an application for a set of 19 real streaming applications. We find that in more than 80 % of the cases, the throughput resulting from our approach is equal to the maximum achievable throughput.
The Earlier the Better: A Theory of Timed Actor Interfaces
"... Programming embedded and cyberphysical systems requires attention not only to functional behavior and correctness, but also to nonfunctional aspects and specifically timing and performance constraints. A structured, compositional, modelbased approach based on stepwise refinement and abstraction t ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
Programming embedded and cyberphysical systems requires attention not only to functional behavior and correctness, but also to nonfunctional aspects and specifically timing and performance constraints. A structured, compositional, modelbased approach based on stepwise refinement and abstraction techniques can support the development process, increase its quality and reduce development time through automation of synthesis, analysis or verification. For this purpose, we introduce in this paper a general theory of timed actor interfaces. Our theory supports a notion of refinement that is based on the principle of worstcase design that permeates the world of performancecritical systems. This is in contrast with the classical behavioral and functional refinements based on restricting or enlarging sets of behaviors. An important feature of our refinement is that it allows timedeterministic abstractions to be made of timenondeterministic systems, improving efficiency and reducing complexity of formal analysis. We also show how our theory relates to, and can be used to reconcile a number of existing time and performance models and how their established theories can be exploited to represent and analyze interface specifications and refinement steps.
Managing latency in embedded streaming applications under hardrealtime scheduling
 In Proceedings of the 8th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
, 2012
"... In this paper, we consider the problem of hardrealtime scheduling of embedded streaming applications, modeled using dataflow graphs, while minimizing the application latency. Recently, it has been shown that the actors in an acyclic CycloStatic Dataflow (CSDF) graph can be scheduled as a set o ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
In this paper, we consider the problem of hardrealtime scheduling of embedded streaming applications, modeled using dataflow graphs, while minimizing the application latency. Recently, it has been shown that the actors in an acyclic CycloStatic Dataflow (CSDF) graph can be scheduled as a set of implicitdeadline periodic tasks. Such scheduling approach has been shown to yield the maximum achievable throughput for a large set of graphs, called matched I/O rates graphs. We show that scheduling the graph actors as implicitdeadline periodic tasks increases the latency significantly for a class of graphs called unbalanced graphs. To alleviate this problem, we propose a new taskset representation for the actors in which the actors are scheduled as a set of constraineddeadline periodic tasks. We prove that scheduling the actors as constraineddeadline periodic tasks delivers optimal throughput (i.e., rate) and latency for graphs with repetition vector equal to ~1. Furthermore, we evaluate the constraineddeadline representation using a set of 19 reallife applications and show that it is capable of achieving the minimum achievable latency for more than 70 % of the applications, and even if the application has a repetition vector not equal to ~1. We show that choosing the task deadline involves a tradeoff between the latency and the resources requirements. Finally, we propose a decision tree to assist the designer in choosing the appropriate realtime periodic task model for scheduling acyclic CSDF graphs.
ProcessVariation Aware Mapping of RealTime Streaming Applications to MPSoCs for Improved Yield
"... A bs tr act —As t echnology s cales, t he impact of proces s variat ion on the maximum supported frequency (FMAX) of individual cores in a MPSoC becomes more pronounced. Task allocation without variationaware performance analysis can result in a significant loss in yield, defined as the number of m ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
A bs tr act —As t echnology s cales, t he impact of proces s variat ion on the maximum supported frequency (FMAX) of individual cores in a MPSoC becomes more pronounced. Task allocation without variationaware performance analysis can result in a significant loss in yield, defined as the number of manufactured chips satisfying the application timing requirement. We propose variationaware task allocation for realtime streaming applications modeled as task graphs. Our solutions are primarily based on the throughput requirement, which is the most important timing requirement in many realtime streaming applications. The three main contributions of this paper are: 1) Using data flow graphs that are wellsuited for modeling and analysis of realtime streaming applications, we explicitly model task execution both in terms of clock cycles (which is independent of variation) and seconds (which does depend on the variation of the resource), which we connect by an explicit binding. 2) We present two approaches for optimizing the yield. The approaches give different results at different costs. 3) We present exhaustive and heuristic algorithms that implement the optimization approaches. Our variationaware mapping algorithms are tested on models of real applications, and are compared to the mapping methods that are unaware of hardware variation. Our results demonstrate yield improvements of up to 50 % with an average of 31%, showing the effectiveness of our approaches. Index Terms—Process variation, Multiprocessor Systemon
Dynamic Voltage and Frequency Scaling and Power Management for Dataflow Applications
"... Abstract—Composability means that the behaviour of an application, including its timing, is not affected by the absence or presence of other applications. It is required to be able to design, test, and verify applications independently. In this paper we define composable dynamic voltage and frequen ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Composability means that the behaviour of an application, including its timing, is not affected by the absence or presence of other applications. It is required to be able to design, test, and verify applications independently. In this paper we define composable dynamic voltage and frequency scaling (DVFS) hardware, and composable power management. We ensure that the functional and temporal behaviours of an application are not affected by other applications, even when they are power managed. For dataflow applications with worstcase execution times per task, our power management is also predictable, i.e. guarantees endtoend realtime requirements, even when the application is mapped on multiple processors that are power managed independently. Our method can be used with various DVFS architectures, such as onchip and offchip VF regulators. Our FPGA implementation models a system with multiple tiles, each containing a processor with local memory running a realtime operating system (RTOS) and power management. Tiles are interconnected by a network on chip, and communicate using shared memories. Experiments indicate energy savings of 68% w.r.t. no power management, and 40 % w.r.t. power gating only. We also demonstrate composability and predictability on the platform in the presence of power management. I.
Lightweight Modeling of Complex State Dependencies in Stream Processing Systems
"... Over the last few years, RealTime Calculus has been used extensively to model and analyze embedded systems processing continuous data/event streams. Towards this, bounds on the arrival process of streams and bounds on the processing capacity of resources serve as inputs to the model, which are used ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Over the last few years, RealTime Calculus has been used extensively to model and analyze embedded systems processing continuous data/event streams. Towards this, bounds on the arrival process of streams and bounds on the processing capacity of resources serve as inputs to the model, which are used to calculate endtoend delays suffered by streams, maximum backlog, utilization of resources, etc. This “functional ” model, although amenable to computationally inexpensive analysis methods, has limited modeling capability. In particular, “statebased ” processing, e.g. blocking write – where the processing depends on the “state ” or filllevel of the buffer – cannot be modeled in a straightforward manner. This has led to a number of recent proposals on using automatatheoretic models for stream processing systems (e.g. Event Count Automata [RTSS 2005]). Although such models offer better modeling flexibility, they suffer from the usual statespace explosion problem. In this paper we show that a number of complex statedependencies can be modeled in a lightweight manner, using a feedback control technique. This avoids explicit state modeling, and hence the statespace explosion problem. Our proposed modeling and analysis therefore extend the original RealTime Calculusbased functional modeling in a very useful way, and cover much larger problem domain compared to what was previously possible without explicit statemodeling. We illustrate its utility through two case studies and also compare our analysis results with those obtained from detailed system simulations (which are significantly more time consuming).
Exploring tradeoffs between performance and resource requirements for synchronous dataflow graphs
 IN ESTIMEDIA ’09 PROC, IEEE,
, 2009
"... Synchronous dataflow graphs (SDFGs) are widely used to model streaming applications such as signal processing and multimedia applications. These are often implemented on resourceconstrained embedded platforms ranging from PDAs and cell phones to automobile equipment and printing systems. Tradeoff ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Synchronous dataflow graphs (SDFGs) are widely used to model streaming applications such as signal processing and multimedia applications. These are often implemented on resourceconstrained embedded platforms ranging from PDAs and cell phones to automobile equipment and printing systems. Tradeoff analysis between resource usage and performance is critical in the life cycle of those products, from tailoring platforms to target applications at design time to resource management at runtime. We present a tradeoff analysis method for SDFGs based on modelchecking techniques and leveraging knowledge from the dataflow domain. We develop results to prune the state space of an SDFG for multiobjective model checking without loosing optimality. To achieve scalability to large state spaces, we combine these pruning techniques with pragmatic heuristics. We evaluate our techniques with two sets of experiments. One set shows we can now do throughputstorage tradeoff analysis for shared memory architectures, showing reductions in memory usage of 1050 % compared to existing distributed memory based analysis. A second set of experiments shows how our techniques support designspace exploration for the digital datapath of a professional printer system. Analysis times range from less than a second to at most several minutes.
Efficient Design, Analysis, and Implementation of Complex Multiprocessor RealTime Systems
, 2013
"... The advent of multicore technologies is a fundamental development that is impacting software design processes across a wide range of application domains, including an important category of such applications, namely, those that have realtime constraints. This development has led to much recent work ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
The advent of multicore technologies is a fundamental development that is impacting software design processes across a wide range of application domains, including an important category of such applications, namely, those that have realtime constraints. This development has led to much recent work on multicoreoriented resource management frameworks for realtime applications. Unfortunately, most of this work focuses on simple task models where complex but practical runtime behaviors among tasks do not arise. In practice, however, many factors such as programming methodologies, interactions with external devices, and resource sharing often result in complex runtime behaviors that can negatively impact timing correctness. The goal of this dissertation is to support such more realistic and complex applications in multicorebased realtime systems. The thesis of this dissertation is: Capacity loss (i.e., over provisioning) can be significantly reduced on multiprocessors while providing soft and hard realtime guarantees for realtime applications that exhibit complex runtime behaviors such as selfsuspensions, graphbased precedence constraints, nonpreemptive sections, and parallel execution segments by designing new realtime scheduling algorithms and developing new schedulability tests.
Mathematical Formalisms for Performance Evaluation of NetworksonChip
"... This article reviews four popular mathematical formalisms—queueing theory, network calculus, schedulability analysis,anddataflow analysis—and how they have been applied to the analysis of onchip communication performance in SystemsonChip. The article discusses the basic concepts and results of ea ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
This article reviews four popular mathematical formalisms—queueing theory, network calculus, schedulability analysis,anddataflow analysis—and how they have been applied to the analysis of onchip communication performance in SystemsonChip. The article discusses the basic concepts and results of each formalism and provides examples of how they have been used in NetworksonChip (NoCs) performance analysis. Also, the respective strengths and weaknesses of each technique and its suitability for a specific purpose are investigated. An open research issue is a unified analytical model for a comprehensive performance evaluation of NoCs. To this end, this article reviews the attempts that have been made to bridge these formalisms.