Results 1  10
of
10
A General Constraintcentric Scheduling Framework for Spatial Architectures
"... Specialized execution using spatial architectures provides energy efficient computation, but requires effective algorithms for spatially scheduling the computation. Generally, this has been solved with architecturespecific heuristics, an approach which suffers from poor compiler/architect productiv ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Specialized execution using spatial architectures provides energy efficient computation, but requires effective algorithms for spatially scheduling the computation. Generally, this has been solved with architecturespecific heuristics, an approach which suffers from poor compiler/architect productivity, lack of insight on optimality, and inhibits migration of techniques between architectures. Our goal is to develop a scheduling framework usable for all spatial architectures. To this end, we expresses spatial scheduling as a constraint satisfaction problem using Integer Linear Programming (ILP). We observe that architecture primitives and scheduler responsibilities can be related through five abstractions: placement of computation, routing of data, managing event timing, managing resource utilization, and forming the optimization objectives. We encode these responsibilities as 20 general ILP constraints, which are used to create schedulers for the disparate TRIPS, DySER, and PLUG architectures. Our results show that a general declarative approach using ILP is implementable, practical, and typically matches or outperforms specialized schedulers.
A Theoretical and Practical Approach to Instruction Scheduling on Spatial Architectures
, 2002
"... This paper studies the problem of instruction assignment and scheduling on spatial architectures. Spatial architectures are architectures whose resources are organized in clusters, with nonzero communication delays between the clusters. On these architectures, instruction scheduling includes both s ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
This paper studies the problem of instruction assignment and scheduling on spatial architectures. Spatial architectures are architectures whose resources are organized in clusters, with nonzero communication delays between the clusters. On these architectures, instruction scheduling includes both space scheduling, where instructions are mapped to clusters, and the traditional time scheduling. This paper considers the problem from both the theoretical and practical perspectives. It presents two integer linear program formulations with known performance bounds. We also present an 8approximation algorithm for constant m and constant communication delays. Then, we introduce three heuristic algorithms based on list scheduling. Then we study a layer partitioning method. Our final algorithm is a combination of layer partitioning and the third heuristic. Two of the better algorithms are evaluated on the Raw machine...
Code Generation For General Loops Using Methods From
 Proc. of the IASTED Parallel and Distributed Computing and Systems Conference (PCDS 2004), Cambrige
, 2004
"... This paper deals with general nested loops and proposes a novel dynamic scheduling technique. General loops contain complex loop bodies (consisting of arbitrary program statements, such as assignments, conditions and repetitions) that exhibit uniform loopcarried dependencies. Therefore it is now po ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
This paper deals with general nested loops and proposes a novel dynamic scheduling technique. General loops contain complex loop bodies (consisting of arbitrary program statements, such as assignments, conditions and repetitions) that exhibit uniform loopcarried dependencies. Therefore it is now possible to achieve efficient parallelization for a vast class of loops, mostly found in DSP, PDEs, signal and video coding. At the core of this technique lies a simple and efficient dynamic rule (SDS  Successive Dynamic Scheduling) for determining the next readytobeexecuted iteration at runtime. The central idea is to schedule the iterations onthefly by using SDS, along the optimal hyperplane (determined using the QuickHull algorithm) . Furthermore, a tool (CRONUS/1) that implements this theory and automatically produces the SPMD parallel code for message passing architectures is presented. As a testing case study, the FSBM motion estimation algorithm (used in video coding standards, e.g., MPEG2, H.261) was used. The tool was also tested on a suite of randomly generated loops. The experimental results validate the presented theory and corroborate the efficiency of the generated parallel code.
Scheduling Nested Loops With the Least Number of Processors
, 2003
"... Usually the most computationally intensive part of a program is attributed to the nested loops it contains. It is therefore of interest to try to parallelize the nested loops in order to reduce the overall computation time. A special category of FOR(DO) nested loops are the uniform dependence loops, ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Usually the most computationally intensive part of a program is attributed to the nested loops it contains. It is therefore of interest to try to parallelize the nested loops in order to reduce the overall computation time. A special category of FOR(DO) nested loops are the uniform dependence loops, which yield efficient parallelization techniques and are the focus of this paper. The primary goals in this area of research are: (1) achieving the optimal parallel time and (2) minimizing the number of processing elements required for the execution of the parallel program. In this paper we present a new dynamic lower bound on the number of processors needed for scheduling and we use a decision algorithm to verify that all uniform dependence loops can achieve this bound.
Minimizing the makespan for a UET bipartite graph on a single processor with an integer precedence delay
, 2001
"... We consider a set of tasks of unit execution times and a bipartite precedence delays graph with a positive precedence delay d : an arc (i; j) of this graph means that j can be executed at least d time units after the completion time of i. The problem is to sequence the tasks in order to minimize ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We consider a set of tasks of unit execution times and a bipartite precedence delays graph with a positive precedence delay d : an arc (i; j) of this graph means that j can be executed at least d time units after the completion time of i. The problem is to sequence the tasks in order to minimize the makespan. Firstly, we prove that the associated decision problem is NPcomplete. Then, we provide a non trivial polynomial time algorithm if the degree of every tasks from one of the two sets is 2. Lastly, we give an approximation algorithm with ratio 3 2 . 1
An Efficient Scheduling of Uniform Dependence Loops
 THE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS
, 2003
"... Usually the most computationally intensive part of a program is attributed to the nested loops it contains. It is therefore of interest to try to parallelize nested loops in order to reduce the overall computation time. A special category of FOR(DO) nested loops are the uniform dependence loops, whi ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Usually the most computationally intensive part of a program is attributed to the nested loops it contains. It is therefore of interest to try to parallelize nested loops in order to reduce the overall computation time. A special category of FOR(DO) nested loops are the uniform dependence loops, which are the focus of this paper. The primary goals in this area of research are: (1) achieving the optimal parallel time and (2) minimizing the number of processing elements. In this paper we present an algorithm for the efficient assignment of computations onto the minimum number of processing elements that guarantees the optimal makespan. The proposed algorithm is polynomial in the size of the index space and performs a binary search between a lower and an upper bound of the optimal number of processors. We provide experimental results that demonstrate the feasibility of our algorithm.
Simple Code Generation for special UDLs
 IN 1ST BALKAN CONFERENCE IN INFORMATICS (BCI’03)
, 2003
"... This paper focuses on transforming sequential perfectly nested loops into their equivalent parallel form. A special category of FOR nested loops is the uniform dependence loops (UDLs), which yield efficient parallelization techniques. An automatic code ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
This paper focuses on transforming sequential perfectly nested loops into their equivalent parallel form. A special category of FOR nested loops is the uniform dependence loops (UDLs), which yield efficient parallelization techniques. An automatic code
Algorithms, Theory
"... We introduce a scheduling algorithm IntermediateSRPT, and show that it is O(logP)competitive with respect to average waiting time when scheduling jobs whose parallelizability is intermediate between being fully parallelizable and sequential. Here the parameter P denotes the ratio between the maxi ..."
Abstract
 Add to MetaCart
(Show Context)
We introduce a scheduling algorithm IntermediateSRPT, and show that it is O(logP)competitive with respect to average waiting time when scheduling jobs whose parallelizability is intermediate between being fully parallelizable and sequential. Here the parameter P denotes the ratio between the maximum job size to the minimum. We also show a general matching lower bound on the competitive ratio. Our analysis builds on an interesting combination of potential function and local competitiveness arguments.