Results 1  10
of
31
Flow and Stretch Metrics for Scheduling Continuous Job Streams
 In Proceedings of the 9th Annual ACMSIAM Symposium on Discrete Algorithms
, 1998
"... this paper, we isolate and study the problem of scheduling a continuous stream of requests of varying sizes. More precisely, assume a request or job j has ..."
Abstract

Cited by 126 (9 self)
 Add to MetaCart
this paper, we isolate and study the problem of scheduling a continuous stream of requests of varying sizes. More precisely, assume a request or job j has
Approximation Techniques for Average Completion Time Scheduling
, 1997
"... We consider the problem of nonpreemptive scheduling to minimize average (weighted) completion time, allowing for release dates, parallel machines, and precedence constraints. Recent work has led to constantfactor approximations for this problem, based on solving a preemptive or linear programming ..."
Abstract

Cited by 82 (8 self)
 Add to MetaCart
We consider the problem of nonpreemptive scheduling to minimize average (weighted) completion time, allowing for release dates, parallel machines, and precedence constraints. Recent work has led to constantfactor approximations for this problem, based on solving a preemptive or linear programming relaxation and then using the solution to get an ordering on the jobs. We introduce several new techniques which generalize this basic paradigm. We use these ideas to obtain improved approximation algorithms for onemachine scheduling to minimize average completion time with release dates. In the process, we obtain an optimal randomized online algorithm for the same problem that beats a lower bound for deterministic online algorithms. We consider extensions to the case of parallel machine scheduling, and for this we introduce two new ideas: first, we show that a preemptive onemachine relaxation is a powerful tool for designing parallel machine scheduling algorithms that simultaneously pro...
Continuous Program Optimization: A Case Study
 ACM Transactions on Programming Languages and Systems
, 2003
"... This paper presents a system that provides code generation at loadtime and continuous program optimization at runtime. First, the architecture of the system is presented. Then, two optimization techniques are discussed that were developed specifically in the context of continuous optimization. The ..."
Abstract

Cited by 45 (7 self)
 Add to MetaCart
This paper presents a system that provides code generation at loadtime and continuous program optimization at runtime. First, the architecture of the system is presented. Then, two optimization techniques are discussed that were developed specifically in the context of continuous optimization. The first of these optimizations continually adjusts the storage layouts of dynamic data structures to maximize data cache locality, while the second performs profiledriven instruction rescheduling to increase instructionlevel parallelism. These two optimizations have very di#erent cost/benefit ratios, presented in a series of benchmarks. The paper concludes with an outlook to future research directions and an enumeration of some remaining research problems. The empirical results presented in this paper make a case in favor of continuous optimization, but indicate that it needs to be applied judiciously. In many situations, the costs of dynamic optimizations outweigh their benefit, so that no breakeven point is ever reached. In favorable circumstances, on the other hand, speedups of over 120% have been observed. It appears as if the main beneficiaries of continuous optimization are shared libraries, which at di#erent times can be optimized in the context of the currently dominant client application.
Inducing Heuristics To Decide Whether To Schedule
 IN PROCEEDINGS OF THE ACM SIGPLAN ’04 CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION
, 2004
"... Instruction scheduling is a compiler optimization that can improve program speed, sometimes by 10% or morebut it can also be expensive. Furthermore, time spent optimizing is more important in a Java justintime (JIT) compiler than in a traditional one because a JIT compiles code at run time, add ..."
Abstract

Cited by 41 (10 self)
 Add to MetaCart
Instruction scheduling is a compiler optimization that can improve program speed, sometimes by 10% or morebut it can also be expensive. Furthermore, time spent optimizing is more important in a Java justintime (JIT) compiler than in a traditional one because a JIT compiles code at run time, adding to the running time of the program. We found that, on any given block of code, instruction scheduling often does not produce significant benefit and sometimes degrades speed. Thus, we hoped that we could focus scheduling effort on those blocks that benefit from it. Using
Interval Analysis on Directed Acyclic Graphs for Global Optimization
 J. Global Optimization
, 2004
"... A directed acyclic graph (DAG) representation of optimization problems represents each variable, each operation, and each constraint in the problem formulation by a node of the DAG, with edges representing the ow of the computation. ..."
Abstract

Cited by 40 (8 self)
 Add to MetaCart
A directed acyclic graph (DAG) representation of optimization problems represents each variable, each operation, and each constraint in the problem formulation by a node of the DAG, with edges representing the ow of the computation.
Compiling for EDGE architectures
 In International Symposium on Code Generation and Optimization
, 2006
"... Explicit Data Graph Execution (EDGE) architectures offer the possibility of high instructionlevel parallelism with energy efficiency. In EDGE architectures, the compiler breaks a program into a sequence of structured blocks that the hardware executes atomically. The instructions within each block c ..."
Abstract

Cited by 36 (24 self)
 Add to MetaCart
Explicit Data Graph Execution (EDGE) architectures offer the possibility of high instructionlevel parallelism with energy efficiency. In EDGE architectures, the compiler breaks a program into a sequence of structured blocks that the hardware executes atomically. The instructions within each block communicate directly, instead of communicating through shared registers. The TRIPS EDGE architecture imposes several restrictions on its blocks to simplify the microarchitecture: each TRIPS block has at most 128 instructions, issues at most 32 loads and/or stores, and executes at most 32 register bank reads and 32 writes. To detect block completion, each TRIPS block must produce a constant number of outputs (stores and register writes) and a branch decision. The goal of the TRIPS compiler is to produce TRIPS blocks full of useful instructions while enforcing these constraints. This paper describes a set of compiler algorithms that meet these sometimes conflicting goals, including an algorithm that assigns load and store identifiers to maximize the number of loads and stores within a block. We demonstrate the correctness of these algorithms in simulation on SPEC2000, EEMBC, and microbenchmarks extracted from SPEC2000 and others. We measure speedup in cycles over an Alpha 21264 on microbenchmarks. 1.
SchedulingLPs bear probabilities: Randomized approximations for minsum criteria
 In R. Burkard and G.J. Woeginger eds, ESA'97, LNCS 1284
, 1997
"... Abstract. In this paper, we provide a new class of randomized approximation algorithms for scheduling problems by directly interpreting solutions to socalled timeindexed LPs as probabilities. The most general model we consider is scheduling unrelated parallel machines with release dates (or even n ..."
Abstract

Cited by 28 (5 self)
 Add to MetaCart
Abstract. In this paper, we provide a new class of randomized approximation algorithms for scheduling problems by directly interpreting solutions to socalled timeindexed LPs as probabilities. The most general model we consider is scheduling unrelated parallel machines with release dates (or even network scheduling) so as to minimize the average weighted completion time. The crucial idea for these multiple machine problems is not to use standard list scheduling but rather to assign jobs randomly to machines (with probabilities taken from an optimal LP solution) and to perform list scheduling on each of them. For the general model, we give a (2+ e)approximation algorithm. The best previously known approximation algorithm has a performance guarantee of 16/3 [HSW96]. Moreover, our algorithm also improves upon the best previously known approximation algorithms for the special case of identical parallel machine scheduling (performance guarantee (2.89 + e) in general [CPS+96] and 2.85 for the average completion time [CMNS97], respectively). A perhaps surprising implication for identical parallel machines is that jobs are randomly assigned to machines, in which each machine is equally likely. In addition, in this case the algorithm has running time O(nlogn) and performance guarantee 2. The same algorithm also is a 2approximation for the corresponding preemptive scheduling problem on identical parallel machines. Finally, the results for identical parallel machine scheduling apply to both the offline and the online settings with no difference in performance guarantees. In the online setting, we are scheduling jobs that continually arrive to be processed and, for each time t, we must construct the schedule until time t without any knowledge of the jobs that will arrive afterwards. 1
Speculative hedge: Regulating compiletime speculation against profile variations
 In Proceedings of the 29th International Symposium on Microarchitecture
, 1996
"... Pathoriented scheduling methods, such as trace scheduling and hyperblock scheduling, use speculation to extract instructionlevel parallelism from controlintensive programs. These methods predict important execution paths in the current scheduling scope using execution pro ling or frequency estima ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
Pathoriented scheduling methods, such as trace scheduling and hyperblock scheduling, use speculation to extract instructionlevel parallelism from controlintensive programs. These methods predict important execution paths in the current scheduling scope using execution pro ling or frequency estimation. Aggressive speculation is then applied to the important execution paths, possibly at the cost of degraded performance along other paths. Therefore, the speed of the output code can be sensitive to the compiler's ability to accurately predict the important execution paths. Prior work in this area has utilized the speculative yield function by Fisher, coupled with dependence height, to distribute instruction priority among execution paths in the scheduling scope. While this technique provides more stability of performance by paying attention to the needs of all paths, it does not directly address the problem of mismatch between compiletime prediction and runtime behavior. The work presented in this paper extends the speculative yield and dependence height heuristic to explicitly minimize the penalty su ered by other paths when instructions are speculated along a path. Since the execution time of a path is determined by the number of cycles spent between a path's entrance and exit in the scheduling scope, the heuristic attempts to eliminate unnecessary speculation that delays any path's exit. Such control of speculation makes the performance much less sensitive to the actual path taken at run time. The proposed method has a strong emphasis on achieving minimal delay to all exits. Thus the name, speculative hedge, is used. This paper presents the speculative hedge heuristic, and shows how it controls overspeculation in a superblock/hyperblock scheduler. The stability of out
Precedence Constrained Scheduling to Minimize Sum of Weighted Completion Times on a Single Machine
 Discrete Applied Mathematics
, 1997
"... We consider the problem of scheduling a set of jobs on a single machine with the objective of minimizing sum of weighted completion times. The problem is NPhard when there are precedence constraints between jobs [15]. We provide an efficient combinatorial 2approximation algorithm for this problem. ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
We consider the problem of scheduling a set of jobs on a single machine with the objective of minimizing sum of weighted completion times. The problem is NPhard when there are precedence constraints between jobs [15]. We provide an efficient combinatorial 2approximation algorithm for this problem. In contrast to our work, earlier approximation algorithms [12] achieving constant factor approximations are based on solving a linear programming relaxation of the problem. We also show that the linear ordering relaxation of Potts [20] has an integrality gap of 2. 1 Introduction We consider the following scheduling problem. We are given a set of jobs J 1 ; J 2 ; : : : ; J n where each job J i has a processing time p i and a weight w i . Jobs have precedence constraints between them that are specified in the form of a directed acyclic graph. If i OE j, J j cannot be scheduled before J i completes. The objective is to find a nonpreemptive schedule of the jobs on a single machine (or equiva...
Scheduling Unrelated Machines by Randomized Rounding
 SIAM Journal on Discrete Mathematics
, 1999
"... In this paper, we provide a new class of randomized approximation algorithms for parallel machine scheduling problems. The most general model we consider is scheduling unrelated machines with release dates (or even network scheduling) so as to minimize the average weighted completion time. We introd ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
In this paper, we provide a new class of randomized approximation algorithms for parallel machine scheduling problems. The most general model we consider is scheduling unrelated machines with release dates (or even network scheduling) so as to minimize the average weighted completion time. We introduce an LP relaxation in timeindexed variables for this problem. The crucial idea to derive approximation results is not to use standard list scheduling, but rather to assign jobs randomly to machines (by interpreting LP solutions as probabilities), and to perform list scheduling on each of them. Our main result is a (2 + e)approximation algorithm for this general model which improves upon performance guarantee 16=3 due to Hall, Shmoys, and Wein. In the absence of nontrivial release dates, we get a (3=2 + e)approximation. At the same time we prove corresponding bounds on the quality of the LP relaxation. A perhaps surprising implication for identical parallel machines is that jobs are ra...