Results 1  10
of
26
EndtoEnd Scheduling to Meet Deadlines in Distributed Systems
, 1994
"... In a distributed system or communication network tasks may need to be executed on more than one processor. For timecritical tasks, the timing constraints are typically given as endtoend releasetimes and deadlines. This paper describes algorithms to schedule a class of systems where all the tasks ..."
Abstract

Cited by 68 (3 self)
 Add to MetaCart
In a distributed system or communication network tasks may need to be executed on more than one processor. For timecritical tasks, the timing constraints are typically given as endtoend releasetimes and deadlines. This paper describes algorithms to schedule a class of systems where all the tasks execute on different processors in turn in the same order. This endtoend scheduling problem is known as the flowshop problem. We present two cases where the problem is tractable and evaluate a heuristic for the N Phard general case. We generalize the traditional flowshop model in two directions. First, we present an algorithm for scheduling flow shops where tasks can be serviced more than once by some processors. Second, we describe a heuristic algorithm to schedule flow shops that consist of periodic tasks. Some considerations are made about scheduling systems with more than one flow shop. 1
Balanced Scheduling: Instruction scheduling when memory latency is uncertain
, 1992
"... Traditional list schedulers order instructions based on an optimistic estimate of the load delay imposed by the implementation. Therefore they cannot respond to variations in load latencies (due to cache hits or misses, congestion in the memory interconnect, etc.) and cannot easily be applied across ..."
Abstract

Cited by 53 (3 self)
 Add to MetaCart
Traditional list schedulers order instructions based on an optimistic estimate of the load delay imposed by the implementation. Therefore they cannot respond to variations in load latencies (due to cache hits or misses, congestion in the memory interconnect, etc.) and cannot easily be applied across different implementations. We have developed an alternative algorithm, known as balanced scheduling, that schedules instructions based on an estimate of the amount of instruction level parallelism in the program. Since scheduling decisions are program rather than machinebased, balanced scheduling is unaffected by implementation changes. Since it is based on the amount of instruction level parallelism that a program can support, it can respond better to variations in load latencies. Performance improvements over a traditional list scheduler on a Fortran workload and simulating several different machine types (cachebased workstations, large parallel machines with a multipath interconnect an...
Optimal Instruction Scheduling Using Integer Programming
 Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation
, 2000
"... Abstract { This paper presents a new approach to local instruction scheduling based on integer programming that produces optimal instruction schedules in a reasonable time, even for very large basic blocks. The new approach rst uses a set of graph transformations to simplify the datadependency graph ..."
Abstract

Cited by 45 (3 self)
 Add to MetaCart
Abstract { This paper presents a new approach to local instruction scheduling based on integer programming that produces optimal instruction schedules in a reasonable time, even for very large basic blocks. The new approach rst uses a set of graph transformations to simplify the datadependency graph while preserving the optimality of the nal schedule. The simpli ed graph results in a simpli ed integer program which can be solved much faster. A new integerprogramming formulation is then applied to the simpli ed graph. Various techniques are used to simplify the formulation, resulting in fewer integerprogram variables, fewer integerprogram constraints and fewer terms in some of the remaining constraints, thus reducing integerprogram solution time. The new formulation also uses certain adaptively added constraints (cuts) to reduce solution time. The proposed optimal instruction scheduler is built within the Gnu Compiler Collection (GCC) and is evaluated experimentally using the SPEC95 oating point benchmarks. Although optimal scheduling for the target processor is considered intractable, all of the benchmarks ' basic blocks are optimally scheduled, including blocks with up to 1000 instructions, while total compile time increases by only 14%. 1
Approximation Bounds for a General Class of Precedence Constrained Parallel Machine Scheduling Problems
 Integer Programming and Combinatorial Optimization, volume 1412 of Lecture Notes in Computer Science
, 1998
"... A well studied and difficult class of scheduling problems concerns parallel machines and precedence constraints. In order to model more realistic situations, we consider precedence delays, associating with each precedence constraint a certain amount of time which must elapse between the completion a ..."
Abstract

Cited by 27 (5 self)
 Add to MetaCart
A well studied and difficult class of scheduling problems concerns parallel machines and precedence constraints. In order to model more realistic situations, we consider precedence delays, associating with each precedence constraint a certain amount of time which must elapse between the completion and start times of the corresponding jobs. Release dates, among others, may be modeled in this fashion. We provide the first constantfactor approximation algorithms for the makespan and the total weighted completion time objectives in this general class of problems. These algorithms are rather simple and practical forms of list scheduling. Our analysis also unifies and simplifies that of a number of special cases heretofore separately studied, while actually improving some of the former approximation results.
Fast Optimal Instruction Scheduling for Singleissue Processors with Arbitrary Latencies
, 2001
"... Instruction scheduling is one of the most important steps for improving the performance of object code produced by a compiler. The local instruction scheduling problem is to find a minimum length instruction schedule for a basic block subject to precedence, latency, and resource constraints. In ..."
Abstract

Cited by 24 (9 self)
 Add to MetaCart
Instruction scheduling is one of the most important steps for improving the performance of object code produced by a compiler. The local instruction scheduling problem is to find a minimum length instruction schedule for a basic block subject to precedence, latency, and resource constraints. In this paper we consider local instruction scheduling for singleissue processors with arbitrary latencies. The problem is considered intractable, and heuristic approaches are currently used in production compilers. In contrast, we present a relatively simple approach to instruction scheduling based on constraint programming which is fast and optimal. The proposed approach uses an improved constraint model which allows it to scale up to very large, real problems. We describe powerful redundant constraints that allow a standard constraint solver to solve these scheduling problems in an almost backtrackfree manner. The redundant constraints are lower bounds on selected subproblems which take advantage of the structure inherent in the problems. Under specified conditions, these constraints are sometimes further improved by testing the consistency of a subproblem using a fast test. We experimentally evaluated our approach by integrating it into the Gnu Compiler Collection (GCC) and then applying it to the SPEC95 floating point benchmarks. All 7402 of the benchmarks' basicblocks were optimally scheduled, including basicblocks with up to 1000 instructions. Our results compare favorably to the best previous approach which is based on integer linear programming (Wilken et al., 2000): Across the same benchmarks, the total optimal scheduling time for their approach is 98 seconds while the total time for our approach is less than 5 seconds. 1
Adaptive Explicitly Parallel Instruction Computing
, 2000
"... Current processors are programmed through a fixed interface called the Instruction Set Architecture (ISA). Consequently, a compiler targeting such a processor is forced to choose instructions from the provided instruction set while generating code for a given application. Often this instruction set ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
Current processors are programmed through a fixed interface called the Instruction Set Architecture (ISA). Consequently, a compiler targeting such a processor is forced to choose instructions from the provided instruction set while generating code for a given application. Often this instruction set is not a suitable match for the computational requirements of the application program. With in this context, we ask ourselves the following questions. 1. Can application performance be improved if the compiler had the freedom to pick the instruction set on a per application basis? 2. Can we build costeffective processors that provide the ability to efficiently emulate compiler determined instruction sets and yet are not application specific? 3. Given that the desired processor capabilities are feasible, can the compiler determine an optimal set of instructions for a given application and generate code that can effectively exploit the processor capabilities? In this thesis, we provide sufficient evidence to answer these questions in the affirmative. Through a combination of architectural innovations and novel compilation techniques, this dissertation demonstrates that it is possible to attain significant improvement in performance, up to an order of magnitude in some cases, on general purpose and multimedia applications over comparable fixed ISA processors. We propose classes of microprocessors that allow application programs to add and subtract functional units yielding a dynamically varying instruction set interface to the running application without compromising current compatibility model. First half of this dissertation describes this novel class of architectures, focusing on a specific subclass called Adaptive Explicitly Parallel Instruction Computing (AEPIC) architectures...
Allocating Registers in Multiple InstructionIssuing Processors
 IN PROCEEDINGS OF THE IFIP WG 10.3 WORKING CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT'95
, 1995
"... This work addresses the problem of scheduling a basic block of operations on a multiple instructionissuing processor. We show that integrating register constraints into operation sequencing algorithms is a complex problem in itself. Indeed, while scheduling a forest of unit time operations on a pro ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
This work addresses the problem of scheduling a basic block of operations on a multiple instructionissuing processor. We show that integrating register constraints into operation sequencing algorithms is a complex problem in itself. Indeed, while scheduling a forest of unit time operations on a processor with P parallel instruction slots can be solved in polynomial time, the problem becomes NPhard when P is unbounded but only R registers are available. As a result we have devised a concise integer linear programming formulation of this scheduling problem that accounts for both register and instruction issuing constraints. This allows the use of offtheshelf routines to find optimum solutions, which can then be compared with the results obtained by polynomialtime heuristics. Two such heuristics are given, and their combined results are shown to be optimal in 99.5% of the cases for trees of height at most 6. A byproduct of these experiments is to show that our integer programming f...
Efficient Instruction Scheduling for DelayedLoad Architectures
 ACM Trans. Program. Lang. Syst
, 1995
"... this article was presented at the ACM SIGPLAN '91 Conference on Programming Languages Design and Implementation. Authors' addresses: S. M. Kurlander, C. N. Fischer, Computer Sciences Department, University of WisconsinMadison, 1210 W. Dayton St., Madison, WI 53706; email: fsmk; fischerg@cs.wisc.edu ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
this article was presented at the ACM SIGPLAN '91 Conference on Programming Languages Design and Implementation. Authors' addresses: S. M. Kurlander, C. N. Fischer, Computer Sciences Department, University of WisconsinMadison, 1210 W. Dayton St., Madison, WI 53706; email: fsmk; fischerg@cs.wisc.edu; T. A. Proebsting, Department of Computer Science, University of Arizona, Tucson, AZ 85721; email: todd@cs.arizona.edu. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of ACM. To copy otherwise, or to republish, requires a fee and/or specific permission. c fl 2 \Delta Steven M. Kurlander et al. Optimal Nonoptimal , r1 , r1 , r2 , r2 , r3 nop add r1, r2, r1 add r1, r2, r1
A Fast Algorithm for Scheduling Instructions with Deadline Constraints on RISC Machines
 Proc. of the 22 nd IEEE RealTime Systems Symposium (RTSS
, 2000
"... We present a fast algorithm for scheduling UET(Unit Execution Time) instructions with deadline constraints in a basic block on RISC machines with multiple processors. Unlike Palem and Simon's algorithm, our algorithm allows latency of l ij = \Gamma1 which denotes that instruction v j cannot be start ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
We present a fast algorithm for scheduling UET(Unit Execution Time) instructions with deadline constraints in a basic block on RISC machines with multiple processors. Unlike Palem and Simon's algorithm, our algorithm allows latency of l ij = \Gamma1 which denotes that instruction v j cannot be started before v i . The time complexity of our algorithm is O(ne + nd), where n is the number of instructions, e is the number of edges in the precedence graph and d is the maximum latency. Our algorithm is guaranteed to compute a feasible schedule whenever one exists in the following special cases: 1) Arbitrary precedence constraints, latencies in f0; 1g and one processor. In this special case, our algorithm improves the existing fastest algorithm from O(ne + e 0 log n) to O(minfne; n 2:376 g), where e 0 is the number of edges in the transitively closed precedence graph. 2) Arbitrary precedence constraints, latencies in f\Gamma1; 0g and two processors. In the special case where all latencies are 0, our algorithm degenerates to Garey and Johnson's two processor algorithm. 3) Special precedence constraints in the form of monotone interval graph, arbitrary latencies in f\Gamma1; 0; 1; \Delta \Delta \Delta ; dg and multiple processors. 4) Special precedence constraints in the form of inforest, equal latencies and multiple processors. In the above special cases, if no feasible schedule exists, our algorithm will compute a schedule with minimum lateness. Moreover, by setting all deadlines to a sufficiently large integer, our algorithm will compute a schedule with minimum length in all the above special cases and the special case of outforest, equal latencies and multiple processors.