Results 1 -
9 of
9
Exploring trade-offs in buffer requirements and throughput constraints for synchronous dataflow graphs
- DESIGN AUTOMATION CONFERRENCE, PROC. ACM
, 2006
"... Multimedia applications usually have throughput constraints. An implementation must meet these constraints, while it minimizes resource usage and energy consumption. The compute intensive kernels of these applications are often specified as Synchronous Dataflow Graphs. Communication between nodes in ..."
Abstract
-
Cited by 24 (7 self)
- Add to MetaCart
Multimedia applications usually have throughput constraints. An implementation must meet these constraints, while it minimizes resource usage and energy consumption. The compute intensive kernels of these applications are often specified as Synchronous Dataflow Graphs. Communication between nodes in these graphs requires storage space which influences throughput. We present exact techniques to chart the Pareto space of throughput and storage tradeoffs, which can be used to determine the minimal storage space needed to execute a graph under a given throughput constraint. The feasibility of the approach is demonstrated with a number of examples.
Minimising buffer requirements of synchronous dataflow graphs with model checking
- IN PROCEEDINGS OF THE DESIGN AUTOMATION CONFERENCE
, 2005
"... ..."
Throughput-Buffering Trade-Off Exploration for Cyclo-Static and Synchronous Dataflow Graphs
"... Multimedia applications usually have throughput constraints. An implementation must meet these constraints, while it minimizes resource usage and energy consumption. The compute intensive kernels of these applications are often specified as Cyclo-Static or Synchronous Dataflow Graphs. Communication ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
Multimedia applications usually have throughput constraints. An implementation must meet these constraints, while it minimizes resource usage and energy consumption. The compute intensive kernels of these applications are often specified as Cyclo-Static or Synchronous Dataflow Graphs. Communication between nodes in these graphs requires storage space which influences throughput. We present an exact technique to chart the Pareto space of throughput and storage trade-offs, which can be used to determine the minimal buffer space needed to execute a graph under a given throughput constraint. The feasibility of the exact technique is demonstrated with experiments on a set of realistic DSP and multimedia applications. To increase scalability of the approach, a fast approximation technique is developed that guarantees both throughput and a, tight, bound on the maximal overestimation of buffer requirements. The approximation technique allows to trade off worst-case overestimation versus run-time.
A Case Study of System Level Specification and Software Synthesis of Multi-mode Multimedia Terminal
- COMPUT. PROGRAM
"... In this paper, we specify the behavior of a multi-mode multimedia terminal (MMMT) using block diagram representation and automate the software implementation from the behavioral level specification. The MMMT system consists of several real-time tasks for signal processing and control tasks to manage ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
In this paper, we specify the behavior of a multi-mode multimedia terminal (MMMT) using block diagram representation and automate the software implementation from the behavioral level specification. The MMMT system consists of several real-time tasks for signal processing and control tasks to manage task executions. We use a dataflow model and an FSM model to specify the internal behavior of a signal processing task and a control task respectively. At the top level, we introduce a novel task-level specification model to represent diverse task execution semantics and communication protocols. The synthesized software is directly executed on a PC and downloaded into a Compaq iPAQ, and run to demonstrate the viability of the proposed system-level design methodology from behavior level specification to real software implementation. This paper explains the key techniques used in the proposed methodology and some lessons we learned during this preliminary experiment.
Buffer Optimization and Dispatching Scheme for Embedded Systems with Behavioral Transparency ∗ ABSTRACT
"... Software components are modular and can enable post-deployment update, but their high overhead in runtime and memory is prohibitive for many embedded systems. This paper proposes to minimize such overhead by exploiting behavioral transparency in models of computation. In such a model (e.g., synchron ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Software components are modular and can enable post-deployment update, but their high overhead in runtime and memory is prohibitive for many embedded systems. This paper proposes to minimize such overhead by exploiting behavioral transparency in models of computation. In such a model (e.g., synchronous dataflow), the state of buffer requirements is determined completely by the firing sequence of the actors without requiring functional simulation of the actors. Instead of dedicating space to each channel or actor statically, our dispatcher passes buffer pointers to an actor upon firing. Straightforward implementations are counterproductive, as fine-grained allocation incurs high pointer overhead while coarse-grained allocation suffers from fragmentation. To address this problem, we propose medium-grained, “access-contiguous” buffer allocation scheme. We formulate the problem as 2-D tiles that represent the lifetime of the buffers over time and define operators for their translation and transformations to minimize their memory occupation spatially and temporally. Experimental results on real-life applications show up to 70 % data memory reduction compared to existing techniques. Our technique retains code modularity for dynamic configuration and, more importantly, enables many more applications that otherwise would not fit if implemented using previous state-of-the-art techniques. Categories and Subject Descriptors C.3 [Special-purpose and application-based systems]: Real-time
Buffer minimization in RTL synthesis from coarse-grained dataflow specification
- In SASMI
, 2006
"... Abstract- This paper concerns area-efficient automatic hardware architecture synthesis and its optimization from dataflow graph(DFG) specification for fast HW/SW cosynthesis. A node in a DFG represents a coarse grain computation block such as FIR and DCT and a port in a block may consume multiple da ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract- This paper concerns area-efficient automatic hardware architecture synthesis and its optimization from dataflow graph(DFG) specification for fast HW/SW cosynthesis. A node in a DFG represents a coarse grain computation block such as FIR and DCT and a port in a block may consume multiple data samples per invocation, which distinguishes our approach from conventional behavioral synthesis and complicates the problem. In the proposed design methodology, arcs in DFG are synthesized to intermediate buffers to store the transient data samples between nodes by using either registers or memory. Since the buffer size is the major factor of hardware overhead in the synthesized architecture, we aim to reduce the buffer size by applying a shift buffering technique and a buffer sharing technique. Experiments with H.263 decoder subsystem demonstrate the proposed techniques reduce the buffer requirement by around 44 % to make the resultant hardware close to the hand-optimized hardware. I.
DESIGN METHODOLOGY FOR EMBEDDED COMPUTER VISION SYSTEMS
"... Abstract Computer vision has emerged as one of the most popular domains of embedded applications. The applications in this domain are characterized by complex, intensive computations along with very large memory requirements. Parallelization and multiprocessor implementations have become increasingl ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract Computer vision has emerged as one of the most popular domains of embedded applications. The applications in this domain are characterized by complex, intensive computations along with very large memory requirements. Parallelization and multiprocessor implementations have become increasingly important for this domain, and various powerful new embedded platforms to support these applications have emerged in recent years. However, the problem of efficient design methodology for optimized implementation of such systems remains vastly unexplored. In this chapter, we look into the main research problems faced in this area and how they vary from other embedded design methodologies in light of key application characteristics in the embedded computer vision domain. We also provide discussion on emerging solutions to these various problems. 1
SYSTEMATIC EXPLORATION OF TRADE-OFFS BETWEEN APPLICATION THROUGHPUT AND HARDWARE RESOURCE REQUIREMENTS IN Dsp Systems
, 2010
"... Dataflow has been used extensively as an efficient model-of-computation to analyze performance and resource requirements in implementing DSP algorithms on various target architectures. Although various software synthesis techniques have been widely studied in recent years, there is a distinct lack o ..."
Abstract
- Add to MetaCart
Dataflow has been used extensively as an efficient model-of-computation to analyze performance and resource requirements in implementing DSP algorithms on various target architectures. Although various software synthesis techniques have been widely studied in recent years, there is a distinct lack of efficient synthesis techniques in the literature for systematically mapping dataflow models into efficient hardware implementations. In this thesis, we explore three different aspects that contribute to the development of a powerful dataflow-based hardware synthesis framework: 1. Systematic generation of 1D/2D FFT implementation on field programmable gate arrays (FPGAs). The fast Fourier transform (FFT) is one of the most widely-used and important signal processing functions. However, FFT computation generally becomes a major bottleneck for overall system performance due to its high computational requirements. We propose a systematic approach for synthesizing FPGA implementations of one- and two-dimensional (1D and 2D) FFT computations, andrigorously exploring trade-offs between cost (in terms of FPGA resource requirements) and performance (in terms of throughput). Our approach provides an efficient hardware synthesis framework that can be customized to specific design
with Behavioral Transparency
"... This manuscript contains over 30 % new material in terms of new algorithms, experimental results, and in-depth discussion. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for p ..."
Abstract
- Add to MetaCart
This manuscript contains over 30 % new material in terms of new algorithms, experimental results, and in-depth discussion. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee.

