Results 1 - 10
of
14
Exploring trade-offs in buffer requirements and throughput constraints for synchronous dataflow graphs
- DESIGN AUTOMATION CONFERRENCE, PROC. ACM
, 2006
"... Multimedia applications usually have throughput constraints. An implementation must meet these constraints, while it minimizes resource usage and energy consumption. The compute intensive kernels of these applications are often specified as Synchronous Dataflow Graphs. Communication between nodes in ..."
Abstract
-
Cited by 24 (7 self)
- Add to MetaCart
Multimedia applications usually have throughput constraints. An implementation must meet these constraints, while it minimizes resource usage and energy consumption. The compute intensive kernels of these applications are often specified as Synchronous Dataflow Graphs. Communication between nodes in these graphs requires storage space which influences throughput. We present exact techniques to chart the Pareto space of throughput and storage tradeoffs, which can be used to determine the minimal storage space needed to execute a graph under a given throughput constraint. The feasibility of the approach is demonstrated with a number of examples.
Minimising buffer requirements of synchronous dataflow graphs with model checking
- IN PROCEEDINGS OF THE DESIGN AUTOMATION CONFERENCE
, 2005
"... ..."
Task-level Timing Models for Guaranteed Performance in Multiprocessor Networks-on-Chip
- In CASES, Proc
, 2003
"... We consider a dynamic application running on a multiprocessor network-on-chip as a set of independent jobs, each job possibly running on multiple processors. To provide guaranteed quality and performance, the scheduling of jobs, jobs themselves and the hardware must be amenable to timing analysis. F ..."
Abstract
-
Cited by 17 (12 self)
- Add to MetaCart
We consider a dynamic application running on a multiprocessor network-on-chip as a set of independent jobs, each job possibly running on multiple processors. To provide guaranteed quality and performance, the scheduling of jobs, jobs themselves and the hardware must be amenable to timing analysis. For a certain class of applications and multiprocessor architectures, we propose exact timing models that effectively co-model both the computation and communication of a job. The models are based on interprocessor communication (IPC) graphs [4]. Our main contribution is a precise model of network-on-chip communication, including buffer models. We use a JPEG-decoder job as an example to demonstrate that our models can be used in practice to derive upper bounds on the job execution time and to reason about optimal buffer sizes.
Throughput-Buffering Trade-Off Exploration for Cyclo-Static and Synchronous Dataflow Graphs
"... Multimedia applications usually have throughput constraints. An implementation must meet these constraints, while it minimizes resource usage and energy consumption. The compute intensive kernels of these applications are often specified as Cyclo-Static or Synchronous Dataflow Graphs. Communication ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
Multimedia applications usually have throughput constraints. An implementation must meet these constraints, while it minimizes resource usage and energy consumption. The compute intensive kernels of these applications are often specified as Cyclo-Static or Synchronous Dataflow Graphs. Communication between nodes in these graphs requires storage space which influences throughput. We present an exact technique to chart the Pareto space of throughput and storage trade-offs, which can be used to determine the minimal buffer space needed to execute a graph under a given throughput constraint. The feasibility of the exact technique is demonstrated with experiments on a set of realistic DSP and multimedia applications. To increase scalability of the approach, a fast approximation technique is developed that guarantees both throughput and a, tight, bound on the maximal overestimation of buffer requirements. The approximation technique allows to trade off worst-case overestimation versus run-time.
Language and Compiler Support for Stream Programs
, 2009
"... Stream programs represent an important class of high-performance computations. Defined by their regular processing of sequences of data, stream programs appear most commonly in the context of audio, video, and digital signal processing, though also in networking, encryption, and other areas. Stream ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Stream programs represent an important class of high-performance computations. Defined by their regular processing of sequences of data, stream programs appear most commonly in the context of audio, video, and digital signal processing, though also in networking, encryption, and other areas. Stream programs can be naturally represented as a graph of independent actors that communicate explicitly over data channels. In this work we focus on programs where the input and output rates of actors are known at compile time, enabling aggressive transformations by the compiler; this model is known as synchronous dataflow. We develop a new programming language, StreamIt, that empowers both programmers and compiler writers to leverage the unique properties of the streaming domain. StreamIt offers several new abstractions, including hierarchical single-input single-output streams, composable primitives for data reordering, and a mechanism called teleport messaging that enables precise event handling
Constrained and phased scheduling of synchronous data flow graphs for StreamIT language
- MASTER’S THESIS, MIT
, 2002
"... ..."
Online Resource Management in a Multiprocessor with a Network on Chip
- In ACM Symposium on Applied Computing
, 2007
"... We propose an online resource allocation solution for multiprocessor systems-on-chip, that executes several real-time, streaming media jobs simultaneously. The system consists of up to 24 processors connected by an Æthereal [7] Network-on-Chip (NoC) of 4 to 12 routers. A job is a set of processing t ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
We propose an online resource allocation solution for multiprocessor systems-on-chip, that executes several real-time, streaming media jobs simultaneously. The system consists of up to 24 processors connected by an Æthereal [7] Network-on-Chip (NoC) of 4 to 12 routers. A job is a set of processing tasks connected by FIFO channels. Each job can be independently started or stopped by the user. Each job is annotated with resource budgets per computation task and communication channel which have been computed at compile-time. When a job is requested to start, resources that meet the required resource budgets have to be found. Because it is done online, allocation must be done with low-complexity algorithms. We do the allocation in two-steps. First, tasks are assigned to virtual tiles (VTs), while trying to minimise the total number of VTs and the total bandwidth used. In the second step, these VTs are mapped to real tiles, and network bandwidth allocation and routing are performed simultaneously. We show with simulations that introducing randomisation in the processing order yields a significant improvement in the percentage of mapping succdesses. In combination, these techniques allow 95 % of the processor resources to be allocated while handling a large number of job arrivals and departures.
Performance Analysis of Reconfiguration in Adaptive Real-Time Streaming Applications
"... We propose a design optimization framework for adaptive real-time streaming applications. The main contribution is a hybrid approach for performance analysis combining formal analysis and simulation using a two-phase framework. We formulate the scheduling problem of adaptive streaming applications w ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
We propose a design optimization framework for adaptive real-time streaming applications. The main contribution is a hybrid approach for performance analysis combining formal analysis and simulation using a two-phase framework. We formulate the scheduling problem of adaptive streaming applications with ILP analysis, and use the simulation based on the synchronous model of computation to ensure throughput guarantees. We finally illustrate the capabilities of our methodology by experiments. 1.
19.4 Multithreaded Simulation for Synchronous Dataflow Graphs
"... Synchronous dataflow (SDF) has been successfully used in design tools for system-level simulation of wireless communication systems. Modern wireless communication standards involve large complexity and highly-multirate behavior, and typically result in long simulation time. The traditional approach ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Synchronous dataflow (SDF) has been successfully used in design tools for system-level simulation of wireless communication systems. Modern wireless communication standards involve large complexity and highly-multirate behavior, and typically result in long simulation time. The traditional approach for simulating SDF graphs is to compute and execute static single-processor schedules. Nowadays, multi-core processors are increasingly popular for their potential performance improvements through on-chip, thread-level parallelism. However, without novel scheduling and simulation techniques that explicitly explore multithreading capability, current design tools gain only minimal performance improvements. In this paper, we present a new multithreaded simulation scheduler, called MSS, to provide simulation runtime speed-up for executing SDF graphs on multi-core processors. We have implemented MSS in the Advanced Design System (ADS) from Agilent Technologies. On an Intel dualcore, hyper-threading (4 processing units) processor, our results from this implementation demonstrate up to 3.5 times speed-up in simulating modern wireless communication systems
Buffer minimization of real-time streaming applications scheduling on hybrid CPU/FPGA architectures
- in Proceedings of Design Automation and Test in Europe (DATE ’09
, 2009
"... We address the problem of real-time streaming applications scheduling on hybrid CPU/FPGA architectures. The main contribution is a two-step approach to minimize the buffer requirement for streaming applications with throughput guarantees. A novel declarative way of constraint based scheduling for re ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We address the problem of real-time streaming applications scheduling on hybrid CPU/FPGA architectures. The main contribution is a two-step approach to minimize the buffer requirement for streaming applications with throughput guarantees. A novel declarative way of constraint based scheduling for real-time hybrid SW/HW systems is proposed, while the application throughput is guaranteed by periodic phases in execution. We use a voice-band modem application to exemplify the scheduling capabilities of our method. The experimental results show the advantages of our techniques in both less buffer requirement and higher throughput guarantees compared to the traditional PAPS method. I.

