Results 1 - 10
of
19
Functional DIF for rapid prototyping
- in Proceedings of the International Symposium on Rapid System Prototyping
, 2008
"... Dataflow formalisms have provided designers of digital signal processing systems with optimizations and guarantees to arrive at quality prototypes quickly. As system complexity increases, designers are expressing more types of behavior in dataflow languages to retain these implementation benefits. W ..."
Abstract
-
Cited by 28 (23 self)
- Add to MetaCart
Dataflow formalisms have provided designers of digital signal processing systems with optimizations and guarantees to arrive at quality prototypes quickly. As system complexity increases, designers are expressing more types of behavior in dataflow languages to retain these implementation benefits. While the semantic range of DSP-oriented dataflow models has expanded to cover quasi-static and dynamic applications, efficient functional simulation of such applications has not. Complexity in scheduling and modeling has impeded efforts towards functional simulation that matches the final implementation. We provide this functionality by introducing a new dataflow model of computation, called enable-invoke dataflow (EIDF), that supports flexible and efficient prototyping of dataflow-based application representations. EIDF permits the natural description of actors for dynamic and static dataflow models. We integrate EIDF into the dataflow interchange format (DIF) package and demonstrate the approach on the design of a polynomial evaluation accelerator targeting an FPGA implementation. Our experiments show that a design environment based on EIDF can achieve functionally-correct simulation compared to Verilog, allowing the application designer to arrive at a verified functional simulation faster, and therefore at a functional prototype much more quickly than traditional design practices. 1.
Parameterized looped schedules for compact representation of execution sequences
- in Proc. of the Intl. Conf. on Application Specific Systems, Architectures, and Processors, Steamboat
, 2006
"... This paper is concerned with the compact representation of execution sequences in terms of efficient looping constructs. Here, by a looping construct, we mean a compact way of specifying a finite repetition of a set of execution primitives. Such compaction, which can be viewed as a form of hierarchi ..."
Abstract
-
Cited by 22 (16 self)
- Add to MetaCart
This paper is concerned with the compact representation of execution sequences in terms of efficient looping constructs. Here, by a looping construct, we mean a compact way of specifying a finite repetition of a set of execution primitives. Such compaction, which can be viewed as a form of hierarchical run-length encoding (RLE), has application in many DSP system synthesis contexts, including efficient control generation for Kahn processes on FPGAs, and software synthesis for static dataflow models of computation. In this paper, we significantly generalize previous models for loop-based code compaction of DSP programs to yield a configurable code compression methodology that exhibits a broad range of achievable trade-offs. Specifically, we formally develop and apply to DSP hardware and software implementation a parameterizable loop scheduling approach with compact format, dynamic reconfigurability, and low-overhead decompression. In our experiments, this new approach demonstrates up to 99 % storage saving (versus RLE) and up to 46 % frequency enhancement (versus another parameterized approach) in FPGA synthesis, and an average of 11 % code size reduction in software synthesis compared to existing methods for code size reduction. 1.
Fractional Rate Dataflow Model and Efficient Code Synthesis for Multimedia Applications
- ACM SIGPLAN NOTICE
, 2002
"... this paper, we propose a new dataflow extension called fractional rate dataflow (FRDF) in which fractional number of samples can be produced and consumed. In the proposed FRDF model, a constituent data type is considered as a fraction of the composite data type. Existent integer rate dataflow models ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
this paper, we propose a new dataflow extension called fractional rate dataflow (FRDF) in which fractional number of samples can be produced and consumed. In the proposed FRDF model, a constituent data type is considered as a fraction of the composite data type. Existent integer rate dataflow models can be easily extended to incorporate the fractional rates without loosing analytical properties. In this paper, the SDF model is extended to include FRDF, which can reduce the buffer memory requirements significantly, up to 70%, for some multimedia applications
Efficient Code Synthesis from Extended Dataflow Graphs for Multimedia Applications
- In Proc. 39th DAC, 2002
, 2002
"... This paper presents efficient automatic code synthesis techniques from dataflow graphs for multimedia applications. Since multimedia applications require large size buffers containing composite type data, we aim to reduce the buffer sizes with fractional rate dataflow extension and buffer sharing te ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
This paper presents efficient automatic code synthesis techniques from dataflow graphs for multimedia applications. Since multimedia applications require large size buffers containing composite type data, we aim to reduce the buffer sizes with fractional rate dataflow extension and buffer sharing technique. In an H.263 encoder experiment, the FRDF extension and buffer sharing technique enable us to reduce the buffer size by 67%. The final buffer size is no more than in a manual reference code.
Language and Compiler Support for Stream Programs
, 2009
"... Stream programs represent an important class of high-performance computations. Defined by their regular processing of sequences of data, stream programs appear most commonly in the context of audio, video, and digital signal processing, though also in networking, encryption, and other areas. Stream ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Stream programs represent an important class of high-performance computations. Defined by their regular processing of sequences of data, stream programs appear most commonly in the context of audio, video, and digital signal processing, though also in networking, encryption, and other areas. Stream programs can be naturally represented as a graph of independent actors that communicate explicitly over data channels. In this work we focus on programs where the input and output rates of actors are known at compile time, enabling aggressive transformations by the compiler; this model is known as synchronous dataflow. We develop a new programming language, StreamIt, that empowers both programmers and compiler writers to leverage the unique properties of the streaming domain. StreamIt offers several new abstractions, including hierarchical single-input single-output streams, composable primitives for data reordering, and a mechanism called teleport messaging that enables precise event handling
Heterogeneous Design in Functional DIF
"... Abstract. Dataflow formalisms have provided designers of digital signal processing systems with analysis and optimizations for many years. As system complexity increases, designers are relying on more types of dataflow models to describe applications while retaining these implementation benefits. Th ..."
Abstract
-
Cited by 8 (6 self)
- Add to MetaCart
Abstract. Dataflow formalisms have provided designers of digital signal processing systems with analysis and optimizations for many years. As system complexity increases, designers are relying on more types of dataflow models to describe applications while retaining these implementation benefits. The semantic range of DSP-oriented dataflow models has expanded to cover heterogeneous models and dynamic applications, but efficient design, simulation, and scheduling of such applications has not. To facilitate implementing heterogeneous applications, we utilize a new dataflow model of computation and show how actors designed in other dataflow models are directly supported by this framework, allowing system designers to immediately compose and simulate actors from different models. Using an example, we show how this approach can be applied to quickly describe and functionally simulate a heterogeneous dataflowbased application such that a designer may analyze and tune trade-offs among different models and schedules for simulation time, memory consumption, and schedule size. Keywords: Dataflow, Heterogeneous, Signal Processing. 1
The CBP parameter — a module characterization approach for DSP software optimization
- Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology
, 2004
"... Abstract. Memory consumption is an important metric for DSP software implementation. In this paper, we develop a module characterization technique that promotes more economical use of memory resources at the system level. Our work is developed in the context of software synthesis from signal/video/i ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract. Memory consumption is an important metric for DSP software implementation. In this paper, we develop a module characterization technique that promotes more economical use of memory resources at the system level. Our work is developed in the context of software synthesis from signal/video/image processing applications expressed as synchronous dataflow (SDF) graphs. SDF is a restricted form of dataflow where each computational module (actor) consumes and produces a fixed number of data values (tokens)oneach execution. Usually, no assumption is made about when during the execution of an actor, the tokens are actually consumed and produced; the firing of an actor is treated as an atomic event for most purposes. However, we show in this paper that it is possible to concisely and precisely capture key properties pertaining to the relative times at which tokens are produced and consumed by an actor. We show this by introducing the consumed-before-produced (CBP) parameter, which provides a general method for characterizing the token transfer of an SDF actor. Good bounds on the CBP parameter can aid an SDF compiler in performing more aggressive optimizations for reducing buffer sizes on the edges between actors. We formally define the CBP parameter; derive some useful properties of this parameter; illustrate how the value of the parameter is derived by examining in detail the multirate FIR filter, which is a fundamental actor in multirate signal processing applications; and examine CBP parameterizations for several other practical SDF actors.
Memory optimal single appearance schedule with dynamic loop count for synchronous dataflow graphs
- In Proceedings of the Asia and South Pacific Design Automation Conference
"... Abstract — In this paper, we propose a new single appearance schedule for synchronous dataflow programs to minimize data memory and code memory size simultaneously. While a single appearance schedule promises only one appearance of each node definition in the generated code, it requires significant ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract — In this paper, we propose a new single appearance schedule for synchronous dataflow programs to minimize data memory and code memory size simultaneously. While a single appearance schedule promises only one appearance of each node definition in the generated code, it requires significant amount of data memory overhead compared with a buffer optimal schedule allowing multiple appearance. The key idea of the proposed technique is to make a dynamic decision of loop count to make a schedule quasi-static. The proposed quasi-static schedule produces a single appearance schedule code with minimum data memory requirement. We prove that every buffer optimal schedule can be transformed to our single appearance schedule which requires optimal buffer size for arbitrary synchronous dataflow graphs. The only penalty for the proposed technique is slight performance overhead of computing loop counts dynamically. In order to minimize the overhead we propose optimization techniques. Experimental results show that the proposed algorithm reduces 20 % total memory with less than 1 % performance overhead compared with the previous single appearance schedule algorithms. I.
Memory-constrained Block Processing Optimization for Synthesis of DSP Software
"... Abstract—Digital signal processing (DSP) applications involve processing long streams of input data. It is important to take into account this form of processing when implementing embedded software for DSP systems. Task-level vectorization, or block processing, is a useful dataflow graph transformat ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract—Digital signal processing (DSP) applications involve processing long streams of input data. It is important to take into account this form of processing when implementing embedded software for DSP systems. Task-level vectorization, or block processing, is a useful dataflow graph transformation that can significantly improve execution performance by allowing subsequences of data items to be processed through individual task invocations. In this way, several benefits can be obtained, including reduced context switch overhead, increased memory locality, improved utilization of processor pipelines, and use of more efficient DSPoriented addressing modes. On the other hand, block processing generally results in increased memory requirements since it effectively increases the sizes of the input and output values associated with processing tasks. In this paper, we investigate the memory-performance tradeoff associated with block processing. We develop novel block processing algorithms that take carefully take into account memory constraints to achieve efficient block processing configurations within given memory space limitations. Our experimental results indicate that these methods derive optimal memory-constrained block processing solutions most of the time. We demonstrate the advantages of our block processing techniques on practical kernel functions and applications in the DSP domain. I.
A Step Towards Unifying Schedule and Storage Optimization
"... We present a unified mathematical framework for analyzing the tradeoffs between parallelism and storage allocation within a parallelizing compiler. Using this framework, we show how to find a good storage mapping for a given schedule, a good schedule for a given storage mapping, and a good storage m ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We present a unified mathematical framework for analyzing the tradeoffs between parallelism and storage allocation within a parallelizing compiler. Using this framework, we show how to find a good storage mapping for a given schedule, a good schedule for a given storage mapping, and a good storage mapping that is valid for all legal (one-dimensional affine) schedules. We consider storage mappings that collapse one dimension of a multidimensional array, and programs that are in a single assignment form and accept a one-dimensional affine schedule. Our method combines affine scheduling techniques with occupancy vector analysis and incorporates general affine dependences across statements and loop nests. We formulate the constraints imposed by the data dependences and storage mappings as a set of linear inequalities, and apply numerical programming techniques to solve for the shortest occupancy vector. We consider our method to be a first step towards automating a procedure that finds the optimal tradeoff between parallelism and storage space.

