Results 1  10
of
16
Synchronous data flow
, 1987
"... Data flow is a natural paradigm for describing DSP applications for concurrent implementation on parallel hardware. Data flow programs for signal processing are directed graphs where each node represents a function and each arc represents a signal path. Synchronous data flow (SDF) is a special case ..."
Abstract

Cited by 483 (44 self)
 Add to MetaCart
Data flow is a natural paradigm for describing DSP applications for concurrent implementation on parallel hardware. Data flow programs for signal processing are directed graphs where each node represents a function and each arc represents a signal path. Synchronous data flow (SDF) is a special case of data flow (either atomic or large grain) in which the number of data samples produced or consumed by each node on each invocation is specified a priori. Nodes can be scheduled statically (at compile time) onto single or parallel programmable processors so the runtime overhead usually associated with data flow evaporates. Multiple sample rates within the same system are easily and naturally handled. Conditions for correctness of SDF graph are explained and scheduling algorithms are described for homogeneous parallel processors sharing memory. A preliminary SDF software system for automatically generating assembly language code for DSP microcomputers is described. Two new efficiency techniques are introduced, static buffering and an extension to SDF to efficiently implement conditionals.
A Hierarchical Multiprocessor Scheduling Framework For Synchronous Dataflow Graphs
 Laboratory, University of California at Berkeley
, 1995
"... This paper discusses a hierarchical scheduling framework to reduce the complexity of scheduling synchronous dataflow (SDF) graphs onto multiple processors. The core of this framework is a clustering algorithm that reduces the number of nodes before expanding the SDF graph into a precedence DAG (dire ..."
Abstract

Cited by 21 (7 self)
 Add to MetaCart
This paper discusses a hierarchical scheduling framework to reduce the complexity of scheduling synchronous dataflow (SDF) graphs onto multiple processors. The core of this framework is a clustering algorithm that reduces the number of nodes before expanding the SDF graph into a precedence DAG (directed acyclic graph). The internals of the clusters are then scheduled with uniprocessor SDF schedulers which can optimize for memory usage. The clustering is done in such a manner as to leave ample parallelism exposed for the multiprocessor scheduler. The advantages of this framework are demonstrated with several practical, realtime examples.
Low Power Architectural Design Methodologies
 PH.D THESIS, MEMORANDUM NO. UCB/ERL M94/62, 30TH
, 1994
"... In recent years, power consumption has become a critical design concern for many VLSI systems. Nowhere is this more true than for portable, batteryoperated applications, where power consumption has perhaps superceded speed and area as the overriding implementation constraint. This adds another de ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
In recent years, power consumption has become a critical design concern for many VLSI systems. Nowhere is this more true than for portable, batteryoperated applications, where power consumption has perhaps superceded speed and area as the overriding implementation constraint. This adds another degree of freedom  and complexity  to the design process and mandates the need for design techniques and CAD tools that address power, as well as area and speed. This thesis presents a methodology and a set of tools that support lowpower system design. Lowpower techniques at levels ranging from technology to architecture are presented and their relative merits are compared. Several case studies demonstrate that architecture and systemlevel optimizations offer the greatest opportunities for power reduction. A survey of existing power analysis tools, however, reveals a marked lack of powerconscious tools at these levels. Addressing this issue, a collection of techniques for modeling power at the registertransfer (RT) level of abstraction is described. These techniques model the impact of design complexity and signal activity on datapath, memory, control, and interconnect power consumption. Several VLSI design examples are used to verify the proposed tools, which exhibit near switchlevel accuracy at RTlevel speeds. Finally, an integrated design space exploration environment is described that spans several levels of abstraction and embodies many of the power optimization and analysis strategies presented in this thesis.
Determining the order of processor transactions in statically scheduled multiprocessors
 VLSI Signal Processing
, 1997
"... This paper addresses embedded multiprocessor implementation of iterative, realtime applications, such as digital signal and image processing, that are specified as dataflow graphs. Scheduling dataflow graphs on multiple processors involves assigning tasks to processors (processor assignment), order ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
This paper addresses embedded multiprocessor implementation of iterative, realtime applications, such as digital signal and image processing, that are specified as dataflow graphs. Scheduling dataflow graphs on multiple processors involves assigning tasks to processors (processor assignment), ordering the execution of tasks within each processor (task ordering), and determining when each task must commence execution. We consider three scheduling strategies: fullystatic, selftimed and ordered transactions, all of which perform the assignment and ordering steps at compile time. Run time costs are small for the fullystatic strategy; however it is not robust with respect to changes or uncertainty in task execution times. The selftimed approach is tolerant of variations in task execution times, but pays the penalty of high run time costs, because processors need to explicitly synchronize whenever they communicate. The ordered transactions approach lies between the fullystatic and selftimed strategies; in this approach the order in which processors communicate is determined at compile time and enforced at run time. The ordered transactions strategy retains some of the flexibility of selftimed schedules and at the same time has lower run time costs than the selftimed approach. In this paper we determine an order of processor transactions that is nearly optimal given information about task execution times at compile time, and for a given processor assignment and task ordering. The criterion for optimality is the average throughput achieved by the schedule. Our main result is that it is possible to choose a transaction order such that the resulting ordered transactions schedule incurs no performance penalty compared to the more flexible selftimed strategy, even when the higher run time costs implied by the selftimed strategy are ignored.
On Retiming of Multirate DSP Algorithms
 In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing
, 1996
"... In the paper retiming of DSP algorithms exhibiting multirate behavior is treated. Using the nonordinary marked graph model and the reachability theory, we provide a new condition for valid retiming of multirate graphs. We show that for a graph with n nodes the reachability condition can be split in ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
In the paper retiming of DSP algorithms exhibiting multirate behavior is treated. Using the nonordinary marked graph model and the reachability theory, we provide a new condition for valid retiming of multirate graphs. We show that for a graph with n nodes the reachability condition can be split into the reachability condition for the topologically equivalent unitrate graph (all rates set to one), and (n 2 \Gamma n)=2 ratedependent conditions. Using this property a class of equivalent graphs of reduced complexity is introduced which are equivalent in terms of retiming. Additionally, the circuitbased necessary condition for valid retiming of multirate graphs is extended for the sufficient part. 1. INTRODUCTION Retiming was introduced as a technique to optimize hardware circuits by redistributing registers without affecting functionality [1]. Retiming is also useful for DSP software design. It changes precedence constraints among instructions or tasks, and can improve singlepro...
Statically Scheduling Communication Resources in Multiprocessor DSP Architectures
, 1994
"... In statically scheduled multiprocessors interprocessor communication resources can be scheduled by determining, at compile time, the order in which processors require access to shared resources and enforcing this order at run time. We show how to choose an access order such that, under certain assu ..."
Abstract

Cited by 8 (7 self)
 Add to MetaCart
In statically scheduled multiprocessors interprocessor communication resources can be scheduled by determining, at compile time, the order in which processors require access to shared resources and enforcing this order at run time. We show how to choose an access order such that, under certain assumptions, imposing that order incurs no performance penalty.
Latency Minimization for Synchronous Data Flow Graphs
, 2007
"... Synchronous Data Flow Graphs (SDFGs) are a very useful means for modeling and analyzing streaming applications. Some performance indicators, such as throughput, have been studied before. Although throughput is a very useful performance indicator for concurrent realtime applications, another import ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
Synchronous Data Flow Graphs (SDFGs) are a very useful means for modeling and analyzing streaming applications. Some performance indicators, such as throughput, have been studied before. Although throughput is a very useful performance indicator for concurrent realtime applications, another important metric is latency. Especially for applications such as video conferencing, telephony and games, latency beyond a certain limit cannot be tolerated. This paper proposes an algorithm to determine the minimal achievable latency, providing an execution scheme for executing an SDFG with this latency. In addition, a heuristic is proposed for optimizing latency under a throughput constraint. Experimental results show that latency computations are efficient despite the theoretical complexity of the problem. Substantial latency improvements are obtained, of 2454 % on average for a synthetic benchmark of 900 models, and up to 37 % for a
Retiming synchronous dataflow graphs to reduce execution time
 IEEE TRANSACTIONS ON SIGNAL PROCESSING
, 2001
"... Many common iterative or recursive DSP applications can be represented by synchronous dataflow graphs (SDFGs). A great deal of research has been done attempting to optimize such applications through retiming. However, despite its proven effectiveness in transforming singlerate dataflow graphs to ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Many common iterative or recursive DSP applications can be represented by synchronous dataflow graphs (SDFGs). A great deal of research has been done attempting to optimize such applications through retiming. However, despite its proven effectiveness in transforming singlerate dataflow graphs to equivalent DFGs with smaller clock periods, the use of retiming for attempting to reduce the execution time of synchronous DFGs has never been explored. In this paper, we do just this. We develop the basic definitions and results necessary for expressing and studying SDFGs. We review the problems faced when attempting to retime a SDFG in order to minimize clock period, then present algorithms for doing this. Finally, we demonstrate the effectiveness of our methods on several examples.
Liveness and boundedness of synchronous data flow graphs
 IN FORMAL METHODS IN COMPUTER AIDED DESIGN, FMCAD 06, PROCEEDINGS. IEEE, 2006
, 2006
"... Synchronous Data Flow Graphs (SDFGs) have proven to be suitable for specifying and analyzing streaming applications that run on single or multiprocessor platforms. Streaming applications essentially continue their execution indefinitely. Therefore, one of the key properties of an SDFG is liveness, ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Synchronous Data Flow Graphs (SDFGs) have proven to be suitable for specifying and analyzing streaming applications that run on single or multiprocessor platforms. Streaming applications essentially continue their execution indefinitely. Therefore, one of the key properties of an SDFG is liveness, i.e., whether all parts of the SDFG can run infinitely often. Another elementary requirement is whether an implementation of an SDFG is feasible using a limited amount of memory. In this paper, we study two interpretations of this property, called boundedness and strict boundedness, that were either already introduced in the SDFG literature or studied for other models. A third and new definition is introduced, namely selftimed boundedness, which is very important to SDFGs, because selftimed execution results in the maximal throughput of an SDFG. Necessary and sufficient conditions for liveness in combination with all variants of boundedness are given, as well as algorithms for checking those conditions. As a byproduct, we obtain an algorithm to compute the maximal achievable throughput of an SDFG that relaxes the requirement of strong connectedness in earlier work on throughput analysis.
Bufferspace efficient and deadlockfree scheduling of stream applications on multicore architectures
 in Proc. of the 22nd ACM Symp. on Parallelism in Algorithms and Architectures
, 2010
"... We present a scheduling algorithm of stream programs for multicore architectures called team scheduling. Compared to previous multicore stream scheduling algorithms, team scheduling achieves 1) similar synchronization overhead, 2) coverage of a larger class of applications, 3) better control over ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
We present a scheduling algorithm of stream programs for multicore architectures called team scheduling. Compared to previous multicore stream scheduling algorithms, team scheduling achieves 1) similar synchronization overhead, 2) coverage of a larger class of applications, 3) better control over buffer space, 4) deadlockfree feedback loops, and 5) lower latency. We compare team scheduling to the latest stream scheduling algorithm, sgms, by evaluating 14 applications on a multicore architecture with 16 cores. Team scheduling successfully targets applications that cannot be validly scheduled by sgms due to excessive buffer requirement or deadlocks in feedback loops (e.g., gsm and wcdma). For applications that can be validly scheduled by sgms, team scheduling shows on average 37 % higher throughput within the same buffer space constraints.