Results 1  10
of
74
Synchronous data flow
, 1987
"... Data flow is a natural paradigm for describing DSP applications for concurrent implementation on parallel hardware. Data flow programs for signal processing are directed graphs where each node represents a function and each arc represents a signal path. Synchronous data flow (SDF) is a special case ..."
Abstract

Cited by 483 (44 self)
 Add to MetaCart
Data flow is a natural paradigm for describing DSP applications for concurrent implementation on parallel hardware. Data flow programs for signal processing are directed graphs where each node represents a function and each arc represents a signal path. Synchronous data flow (SDF) is a special case of data flow (either atomic or large grain) in which the number of data samples produced or consumed by each node on each invocation is specified a priori. Nodes can be scheduled statically (at compile time) onto single or parallel programmable processors so the runtime overhead usually associated with data flow evaporates. Multiple sample rates within the same system are easily and naturally handled. Conditions for correctness of SDF graph are explained and scheduling algorithms are described for homogeneous parallel processors sharing memory. A preliminary SDF software system for automatically generating assembly language code for DSP microcomputers is described. Two new efficiency techniques are introduced, static buffering and an extension to SDF to efficiently implement conditionals.
Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing
 IEEE Transactions on Computers
, 1987
"... Abstracthrge grain data flow (LGDF) programming is natural and convenient for describing digital signal processing (DSP) systems, but its runtime overhead is costly in real time or costsensitive applications. In some situations, designers are not willing to squander computing resources for the sak ..."
Abstract

Cited by 480 (35 self)
 Add to MetaCart
Abstracthrge grain data flow (LGDF) programming is natural and convenient for describing digital signal processing (DSP) systems, but its runtime overhead is costly in real time or costsensitive applications. In some situations, designers are not willing to squander computing resources for the sake of programmer convenience. This is particularly true when the target machine is a programmable DSP chip. However, the runtime overhead inherent in most LGDF implementations is not required for most signal processing systems because such systems are mostly synchronous (in the DSP sense). Synchronous data flow (SDF) differs from traditional data flow in that the amount of data produced and consumed by a data flow node is specified a priori for each input and output. This is equivalent to specifying the relative sample rates in signal processing system. This means that the scheduling of SDF nodes need not be done at runtime, but can be done at compile time (statically), so the runtime overhead evaporates. The sample rates can all be different, which is not true of most current datadriven digital signal processing programming methodologies. Synchronous data flow is closely related to computation graphs, a special case of Petri nets. This selfcontained paper develops the theory necessary to statically schedule SDF programs on single or multiple processors. A class of static (compile time) scheduling algorithms is proven valid, and specific algorithms are given for scheduling SDF systems onto single or multiple processors. Index TermsBlock diagram, computation graphs, data flow digital signal processing, hard realtime systems, multiprocessing,
Performance Analysis and Optimization of Asynchronous Circuits
, 1991
"... We present a method for analyzing the time performance of asynchronous circuits, in particular, those derived by program transformation from concurrent programs using the synthesis approach developed by the second author. The analysis method produces a performance metric (related to the time needed ..."
Abstract

Cited by 138 (7 self)
 Add to MetaCart
We present a method for analyzing the time performance of asynchronous circuits, in particular, those derived by program transformation from concurrent programs using the synthesis approach developed by the second author. The analysis method produces a performance metric (related to the time needed to perform an operation) in terms of the primitive gate delays of the circuit. Such a metric provides a quantitative means by which to compare competing designs. Because the gate delays are functions of transistor sizes, the performance metric can be optimized with respect to these sizes. For a large class of asynchronous circuitsincluding those produced by using our synthesis methodthese techniques produce the global optimum of the performance metric. A CAD tool has been implemented to perform this optimization. 1 Introduction Performance analysis of a synchronous computer system is simplified by an external clock that partitions the events in the system into discrete segments. In a...
Minimizing Register Requirements under ResourceConstrained RateOptimal Software Pipelining
, 1995
"... The rapid advances in highperformance computer architecture and compilation techniques provide both challenges and opportunities to exploit the rich solution space of software pipelined loop schedules. In this paper, we develop a framework to construct a software pipelined loop schedule which runs ..."
Abstract

Cited by 75 (12 self)
 Add to MetaCart
The rapid advances in highperformance computer architecture and compilation techniques provide both challenges and opportunities to exploit the rich solution space of software pipelined loop schedules. In this paper, we develop a framework to construct a software pipelined loop schedule which runs on the given architecture (with a fixed number of processor resources) at the maximum possible iteration rate (`a la rateoptimal) while minimizing the number of buffers  a close approximation to minimizing the number of registers. The main contributions of this paper are: ffl First, we demonstrate that such problem can be described by a simple mathematical formulation with precise optimization objectives under a periodic linear scheduling framework. The mathematical formulation provides a clear picture which permits one to visualize the overall solution space (for rateoptimal schedules) under different sets of constraints. ffl Secondly, we show that a precise mathematical formulation...
A Novel Framework of Register Allocation for Software Pipelining
, 1993
"... ing with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept., ACM Inc., fax +1 (212) 8690481, or (permissions@acm.org). Qi Ning Guang R. Gao School of Com ..."
Abstract

Cited by 61 (10 self)
 Add to MetaCart
ing with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept., ACM Inc., fax +1 (212) 8690481, or (permissions@acm.org). Qi Ning Guang R. Gao School of Computer Science McGill University Montreal, Quebec Canada H3A 2A7 email: ning@cs.mcgill.ca gao@cs.mcgill.ca Abstract Although software pipelining has been proposed as one of the most important loop scheduling methods, simultaneous scheduling and register allocation is less understood and remains an open problem [28]. The objective of this paper is to develop a unified algorithmic framework for concurrent scheduling and register allocation to support timeoptimal software pipelining. A key intuition leading to this surprisingly simple formulation and its efficient solution is the association of maximum computation rate of a program graph with its critical cycles due to Reiter's pioneering work...
Decentralizing execution of composite web services
 In OOPSLA ’04: Proceedings of the 19th annual ACM SIGPLAN conference on Objectoriented programming, systems, languages, and applications
, 2004
"... Distributed enterprise applications today are increasingly being built from services available over the web. A unit of functionality in this framework is a web service, a software application that exposes a set of “typed ” connections that can be accessed over the web using standard protocols. These ..."
Abstract

Cited by 36 (0 self)
 Add to MetaCart
Distributed enterprise applications today are increasingly being built from services available over the web. A unit of functionality in this framework is a web service, a software application that exposes a set of “typed ” connections that can be accessed over the web using standard protocols. These units can then be composed into a composite web service. BPEL (Business Process Execution Language) is a highlevel distributed programming language for creating composite web services. Although a BPEL program invokes services distributed over several servers, the orchestration of these services is typically under centralized control. Because performance and throughput are major concerns in enterprise applications, it is important to remove the inefficiencies introduced by the centralized control. In a distributed, or decentralized
A Polynomial Time Method for Optimal Software Pipelining
 In Proc. of the Conf. on Vector and Parallel Processing, CONPAR92, number 634 in Lec. Notes in Comp. Sci
, 1992
"... Software pipelining is one of the most important loop scheduling methods used by parallelizing compilers. It determines a static parallel schedule  a periodic pattern  to overlap instructions of a loop body from different iterations. The main contributions of this paper are the following: First, ..."
Abstract

Cited by 29 (5 self)
 Add to MetaCart
Software pipelining is one of the most important loop scheduling methods used by parallelizing compilers. It determines a static parallel schedule  a periodic pattern  to overlap instructions of a loop body from different iterations. The main contributions of this paper are the following: First, we propose to express the finegrain loop scheduling problem (in particular, software pipelining) on the basis of the mathematical formulation of rperiodic scheduling. This formulation overcomes some of the problems encountered by existing software pipelining methods. Second, we demonstrate the feasibility of the proposed method by (1) presenting a polynomial time algorithm to find an optimal schedule in this rperiodic form that maximizes the computation rate (in fact, we show that this schedule maximizes the computation rate theoretically possible), and by (2) establishing polynomial bounds for the optimal schedule, i.e. bounds on its period, its periodicity, the pattern size, and the c...
Efficient computation of buffer capacities for cyclostatic datatflow graphs
 In Proceedings of the 44th annual Design Automation Conference
, 2007
"... Abstract. A key step in the design of cyclostatic realtime systems is the determination of buffer capacities. In our multiprocessor system, we apply backpressure, which means that tasks wait for space in output buffers. Consequently buffer capacities affect the throughput. This requires the deriv ..."
Abstract

Cited by 28 (9 self)
 Add to MetaCart
Abstract. A key step in the design of cyclostatic realtime systems is the determination of buffer capacities. In our multiprocessor system, we apply backpressure, which means that tasks wait for space in output buffers. Consequently buffer capacities affect the throughput. This requires the derivation of buffer capacities that both result in a satisfaction of the throughput constraint, and also satisfy the constraints on the maximum buffer capacities. Existing exact solutions suffer from the computational complexity that is associated with the required conversion from a cyclostatic dataflow graph to a singlerate dataflow graph. In this paper we present an algorithm, with linear computational complexity, that does not require this conversion and that strives to obtain close to minimal buffer capacities. The algorithm is applied to an MP3 playback application that is mapped on our multiprocessor system. 1.
A Framework for ResourceConstrained RateOptimal Software Pipelining
 IEEE Transactions on Parallel and Distributed Systems
, 1996
"... The rapid advances in highperformance computer architecture and compilation techniques provide both challenges and opportunities to exploit the rich solution space of software pipelined loop schedules. In this paper, we develop a framework to construct a software pipelined loop schedule which runs ..."
Abstract

Cited by 24 (9 self)
 Add to MetaCart
The rapid advances in highperformance computer architecture and compilation techniques provide both challenges and opportunities to exploit the rich solution space of software pipelined loop schedules. In this paper, we develop a framework to construct a software pipelined loop schedule which runs on the given architecture (with a fixed number of processor resources) at the maximum possible iteration rate (`a la rateoptimal) while minimizing the number of buffers  a close approximation to minimizing the number of registers. The main contributions of this paper are: ffl First, we demonstrate that such problem can be described by a simple mathematical formulation with precise optimization objectives under a periodic linear scheduling framework. The mathematical formulation provides a clear picture which permits one to visualize the overall solution space (for rateoptimal schedules) under different sets of constraints. ffl Secondly, we show that a precise mathematical formulation...
Throughput analysis of synchronous data flow graphs
 In ACSD’06, Proc. (2006), IEEE
, 2006
"... Synchronous Data Flow Graphs (SDFGs) are a useful tool for modeling and analyzing embedded data flow applications, both in a single processor and a multiprocessing context or for application mapping on platforms. Throughput analysis of these SDFGs is an important step for verifying throughput requir ..."
Abstract

Cited by 22 (12 self)
 Add to MetaCart
Synchronous Data Flow Graphs (SDFGs) are a useful tool for modeling and analyzing embedded data flow applications, both in a single processor and a multiprocessing context or for application mapping on platforms. Throughput analysis of these SDFGs is an important step for verifying throughput requirements of concurrent realtime applications, for instance within designspace exploration activities. Analysis of SDFGs can be hard, since the worstcase complexity of analysis algorithms is often high. This is also true for throughput analysis. In particular, many algorithms involve a conversion to another kind of data flow graph, the size of which can be exponentially larger than the size of the original graph. In this paper, we present a method for throughput analysis of SDFGs, based on explicit statespace exploration and we show that the method, despite its worstcase complexity, works well in practice, while existing methods often fail. We demonstrate this by comparing the method with stateoftheart cycle mean computation algorithms. Moreover, since the statespace exploration method is essentially the same as simulation of the graph, the results of this paper can be easily obtained as a byproduct in existing simulation tools. 1