Results 1 - 10
of
52
Exploring trade-offs in buffer requirements and throughput constraints for synchronous dataflow graphs
- DESIGN AUTOMATION CONFERRENCE, PROC. ACM
, 2006
"... Multimedia applications usually have throughput constraints. An implementation must meet these constraints, while it minimizes resource usage and energy consumption. The compute intensive kernels of these applications are often specified as Synchronous Dataflow Graphs. Communication between nodes in ..."
Abstract
-
Cited by 24 (7 self)
- Add to MetaCart
Multimedia applications usually have throughput constraints. An implementation must meet these constraints, while it minimizes resource usage and energy consumption. The compute intensive kernels of these applications are often specified as Synchronous Dataflow Graphs. Communication between nodes in these graphs requires storage space which influences throughput. We present exact techniques to chart the Pareto space of throughput and storage tradeoffs, which can be used to determine the minimal storage space needed to execute a graph under a given throughput constraint. The feasibility of the approach is demonstrated with a number of examples.
Minimising buffer requirements of synchronous dataflow graphs with model checking
- IN PROCEEDINGS OF THE DESIGN AUTOMATION CONFERENCE
, 2005
"... ..."
Compiling Concurrent Languages for Sequential Processors
- ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS
, 2001
"... ... This paper surveys a variety of techniques for translating these concurrent specifications into sequential code. The techniques address compiling a wide variety of languages, ranging from dataflow to Petri nets. Each uses a different technique, to some degree chosen to match the semantics of co ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
... This paper surveys a variety of techniques for translating these concurrent specifications into sequential code. The techniques address compiling a wide variety of languages, ranging from dataflow to Petri nets. Each uses a different technique, to some degree chosen to match the semantics of concurrent language. Each technique is considered to consist of a partial evaluator operating on an interpreter. This combination provides a clearer picture of how parts of each technique could be used in a different setting.
Software Synthesis and Code Generation for Signal Processing Systems
- PHILOSOPHY OF SCIENCE
, 1999
"... The role of software is becoming increasingly important in the implementation of DSP applications. As this trend intensifies, and the complexity of applications escalates, we are seeing an increased need for automated tools to aid in the development of DSP software. This paper reviews the state of t ..."
Abstract
-
Cited by 19 (4 self)
- Add to MetaCart
The role of software is becoming increasingly important in the implementation of DSP applications. As this trend intensifies, and the complexity of applications escalates, we are seeing an increased need for automated tools to aid in the development of DSP software. This paper reviews the state of the art in programming language and compiler technology for DSP software implementation. In particular, we review techniques for high level, block-diagram-based modeling of DSP applications; the translation of block diagram specifications into efficient C programs using global, target-independent optimization techniques; and the compilation of C programs into streamlined machine code for programmable DSP processors, using architecture-specific and retargetable back-end optimizations. In our review, we also point out some important directions for further investigation.
Phased Scheduling of Stream Programs
- In LCTES
, 2003
"... As embedded DSP applications become more complex, it is increasingly important to provide high-level stream abstractions that can be compiled without sacrificing efficiency. In this paper, we describe scheduler support for StreamIt, a high-level language for signal processing applications. A StreamI ..."
Abstract
-
Cited by 16 (6 self)
- Add to MetaCart
As embedded DSP applications become more complex, it is increasingly important to provide high-level stream abstractions that can be compiled without sacrificing efficiency. In this paper, we describe scheduler support for StreamIt, a high-level language for signal processing applications. A StreamIt program consists of a set of autonomous tikers that communicate with each other via FIFO queues. As in Syn- chronous Dataflow (SDF), the input and output rates of each fiker are known at compile time. However, unlike SDF, the stream graph is represented using hierarchical structures, each of which has a single input and a single output.
Throughput analysis of synchronous data flow graphs
- In ACSD’06, Proc. (2006), IEEE
, 2006
"... Synchronous Data Flow Graphs (SDFGs) are a useful tool for modeling and analyzing embedded data flow applications, both in a single processor and a multiprocessing context or for application mapping on platforms. Throughput analysis of these SDFGs is an important step for verifying throughput requir ..."
Abstract
-
Cited by 15 (10 self)
- Add to MetaCart
Synchronous Data Flow Graphs (SDFGs) are a useful tool for modeling and analyzing embedded data flow applications, both in a single processor and a multiprocessing context or for application mapping on platforms. Throughput analysis of these SDFGs is an important step for verifying throughput requirements of concurrent real-time applications, for instance within design-space exploration activities. Analysis of SDFGs can be hard, since the worst-case complexity of analysis algorithms is often high. This is also true for throughput analysis. In particular, many algorithms involve a conversion to another kind of data flow graph, the size of which can be exponentially larger than the size of the original graph. In this paper, we present a method for throughput analysis of SDFGs, based on explicit statespace exploration and we show that the method, despite its worst-case complexity, works well in practice, while existing methods often fail. We demonstrate this by comparing the method with state-of-the-art cycle mean computation algorithms. Moreover, since the state-space exploration method is essentially the same as simulation of the graph, the results of this paper can be easily obtained as a byproduct in existing simulation tools. 1
Synchroscalar: A multiple clock domain, power-aware, tile-based embedded processor
- in Proceedings of the International Symposium on Computer Architecture
, 2004
"... We present Synchroscalar, a tile-based architecture for embedded processing that is designed to provide the flexibility of DSPs while approaching the power efficiency of ASICs. We achieve this goal by providing high parallelism and voltage scaling while minimizing control and communication costs. Sp ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
We present Synchroscalar, a tile-based architecture for embedded processing that is designed to provide the flexibility of DSPs while approaching the power efficiency of ASICs. We achieve this goal by providing high parallelism and voltage scaling while minimizing control and communication costs. Specifically, Synchroscalar uses columns of processor tiles organized into statically-assigned frequency-voltage domains to minimize power consumption. Furthermore, while columns use SIMD control to minimize overhead, data-dependent computations can be supported by extremely flexible statically-scheduled communication between columns. We provide a detailed evaluation of Synchroscalar including SPICE simulation, wire and device models, synthesis of key components, cycle-level simulation, and compiler- and hand-optimized signal processing applications. We find that the goal of meeting, not exceeding, performance targets with data-parallel applications leads to designs that depart significantly from our intuitions derived from general-purpose microprocessor design. In particular, synchronous design and substantial global interconnect are desirable in the low-frequency, low-power domain. This global interconnect supports parallelization and reduces processor idle time, which are critical to energy efficient implementations of high bandwidth signal processing. Overall, Synchroscalar provides programmability while achieving power efficiencies within 8-30X of known ASIC implementations, which is 10-60X better than conventional DSPs. In addition, frequency-voltage scaling in Synchroscalar provides between 3-32 % power savings in our application suite. 1.
DIF: An interchange format for dataflow-based design tools
- in Proceedings of the International Workshop on Systems, Architectures, Modeling, and Simulation, Samos
, 2004
"... Abstract. The dataflow interchange format (DIF) is a textual language that is geared towards capturing the semantics of graphical design tools for DSP system design. A key objective of DIF is to facilitate technology transfer across dataflow-based DSP design tools by providing a common, extensible s ..."
Abstract
-
Cited by 14 (10 self)
- Add to MetaCart
Abstract. The dataflow interchange format (DIF) is a textual language that is geared towards capturing the semantics of graphical design tools for DSP system design. A key objective of DIF is to facilitate technology transfer across dataflow-based DSP design tools by providing a common, extensible semantics for representing coarse-grain dataflow graphs, and recognizing useful sub-classes of dataflow models. DIF captures essential modeling information that is required in dataflow-based analysis and optimization techniques, such as algorithms for consistency analysis, scheduling, memory management, and block processing, while optionally hiding proprietary details such as the actual code that implements the dataflow blocks. Accompanying DIF is a software package of intermediate representations and algorithms that operate on application models that are captured through DIF. This paper describes the structure of the DIF language together with several implementation and usage examples. 1
3D Exploration of Software Schedules for DSP Algorithms
- IN PROCEEDINGS OF INTERNATIONAL SYMPOSIUM ON HARDWARE/SOFTWARE CODESIGN (CODES). SIGDA, ACM
, 1999
"... This paper addresses the problem of exploring tradeoffs between program memory, data memory and execution time requirements (3D) for DSP algorithms specified by data flow graphs. Such an exploration is of utmost importance for being able to analyze the feasibility and range of possible software solu ..."
Abstract
-
Cited by 14 (6 self)
- Add to MetaCart
This paper addresses the problem of exploring tradeoffs between program memory, data memory and execution time requirements (3D) for DSP algorithms specified by data flow graphs. Such an exploration is of utmost importance for being able to analyze the feasibility and range of possible software solutions as part of a hardware/software codesign methodology where the target processor and the code generation style may lead to complete different solutions of the same specification. For solving this multi-objective optimization problem, an Evolutionary Algorithm approach is applied. In particular, a new Pareto-optimization algorithm is introduced. For different well-known target DSP processors, the Pareto-fronts are analyzed and compared.
Extended Synchronous Dataflow for Efficient DSP System Prototyping
- in 10 th IEEE International Workshop on Rapid System Prototyping
, 1999
"... Though dataflow graph has been a successful input specification language for DSP system prototyping, lack of support for global states makes it unsuitable to some important applications that need global states for efficient implementation. In this paper, we propose an extension of synchronous datafl ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
Though dataflow graph has been a successful input specification language for DSP system prototyping, lack of support for global states makes it unsuitable to some important applications that need global states for efficient implementation. In this paper, we propose an extension of synchronous dataflow graph to accommodate global states without side effects. Global states are accessed by a special block that piggybacks the state update request on data samples. Such an extension enlarges the domain of application where dataflow representation can be used for rapid system prototyping. Only penalty it incurs is scheduling complexity since the scheduler now considers the control dependency as well as data dependency. We show experimental results with real-life examples such as MPEG-audio decoder and 3D graphics pipeline to present the novelty and usefulness of our approach. 1 Introduction Dataflow graph (DFG) has been a successful representation for DSP algorithms since dataflow semantics ...

