Results 1  10
of
27
Optimal Instruction Scheduling Using Integer Programming
 Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation
, 2000
"... Abstract { This paper presents a new approach to local instruction scheduling based on integer programming that produces optimal instruction schedules in a reasonable time, even for very large basic blocks. The new approach rst uses a set of graph transformations to simplify the datadependency graph ..."
Abstract

Cited by 48 (3 self)
 Add to MetaCart
Abstract { This paper presents a new approach to local instruction scheduling based on integer programming that produces optimal instruction schedules in a reasonable time, even for very large basic blocks. The new approach rst uses a set of graph transformations to simplify the datadependency graph while preserving the optimality of the nal schedule. The simpli ed graph results in a simpli ed integer program which can be solved much faster. A new integerprogramming formulation is then applied to the simpli ed graph. Various techniques are used to simplify the formulation, resulting in fewer integerprogram variables, fewer integerprogram constraints and fewer terms in some of the remaining constraints, thus reducing integerprogram solution time. The new formulation also uses certain adaptively added constraints (cuts) to reduce solution time. The proposed optimal instruction scheduler is built within the Gnu Compiler Collection (GCC) and is evaluated experimentally using the SPEC95 oating point benchmarks. Although optimal scheduling for the target processor is considered intractable, all of the benchmarks ' basic blocks are optimally scheduled, including blocks with up to 1000 instructions, while total compile time increases by only 14%. 1
Approximation Bounds for a General Class of Precedence Constrained Parallel Machine Scheduling Problems
 Integer Programming and Combinatorial Optimization, volume 1412 of Lecture Notes in Computer Science
, 1998
"... A well studied and difficult class of scheduling problems concerns parallel machines and precedence constraints. In order to model more realistic situations, we consider precedence delays, associating with each precedence constraint a certain amount of time which must elapse between the completion a ..."
Abstract

Cited by 28 (5 self)
 Add to MetaCart
A well studied and difficult class of scheduling problems concerns parallel machines and precedence constraints. In order to model more realistic situations, we consider precedence delays, associating with each precedence constraint a certain amount of time which must elapse between the completion and start times of the corresponding jobs. Release dates, among others, may be modeled in this fashion. We provide the first constantfactor approximation algorithms for the makespan and the total weighted completion time objectives in this general class of problems. These algorithms are rather simple and practical forms of list scheduling. Our analysis also unifies and simplifies that of a number of special cases heretofore separately studied, while actually improving some of the former approximation results.
Fast Optimal Instruction Scheduling for Singleissue Processors with Arbitrary Latencies
, 2001
"... Instruction scheduling is one of the most important steps for improving the performance of object code produced by a compiler. The local instruction scheduling problem is to find a minimum length instruction schedule for a basic block subject to precedence, latency, and resource constraints. In ..."
Abstract

Cited by 24 (9 self)
 Add to MetaCart
Instruction scheduling is one of the most important steps for improving the performance of object code produced by a compiler. The local instruction scheduling problem is to find a minimum length instruction schedule for a basic block subject to precedence, latency, and resource constraints. In this paper we consider local instruction scheduling for singleissue processors with arbitrary latencies. The problem is considered intractable, and heuristic approaches are currently used in production compilers. In contrast, we present a relatively simple approach to instruction scheduling based on constraint programming which is fast and optimal. The proposed approach uses an improved constraint model which allows it to scale up to very large, real problems. We describe powerful redundant constraints that allow a standard constraint solver to solve these scheduling problems in an almost backtrackfree manner. The redundant constraints are lower bounds on selected subproblems which take advantage of the structure inherent in the problems. Under specified conditions, these constraints are sometimes further improved by testing the consistency of a subproblem using a fast test. We experimentally evaluated our approach by integrating it into the Gnu Compiler Collection (GCC) and then applying it to the SPEC95 floating point benchmarks. All 7402 of the benchmarks' basicblocks were optimally scheduled, including basicblocks with up to 1000 instructions. Our results compare favorably to the best previous approach which is based on integer linear programming (Wilken et al., 2000): Across the same benchmarks, the total optimal scheduling time for their approach is 98 seconds while the total time for our approach is less than 5 seconds. 1
Multithreaded Architectures: Principles, Projects and Issues
, 1994
"... The architecture of future high performance computer systems will respond to the possibilities offered by technology and to the increasing demand for attention to issues of programmability. Multithreaded processing element architectures are a promising alternative to RISC architecture and its multip ..."
Abstract

Cited by 23 (12 self)
 Add to MetaCart
The architecture of future high performance computer systems will respond to the possibilities offered by technology and to the increasing demand for attention to issues of programmability. Multithreaded processing element architectures are a promising alternative to RISC architecture and its multipleinstructionissue extensions such as VLIW, superscalar, and superpipelined architectures. This paper presents an overview of multithreaded computer architectures and the technical issues affecting their prospective evolution. We introduce the basic concepts of multithreaded computer architecture and describe several architectures representative of the design space for multithreaded, parallel computers. We review design issues for multithreaded processing elements intended for use as the node processor of parallel computers for scientific computing. These include the question of choosing an appropriate program execution model, the organization of the processing element to achieve good utilization of major resources, support for finegrain interprocessor communication and global memory access, compiling machine code for multithreaded processors, and the challenge of implementing virtual memory in largescale multiprocessor systems.
Automatic Design of Computer Instruction Sets
, 1993
"... This dissertation presents the thesis that good and usable instruction sets can be automatically derived for a specified data path and benchmark set. This is achieved by a multistep process: generating execution traces for the benchmark programs, sampling these traces to form a large set of small c ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
This dissertation presents the thesis that good and usable instruction sets can be automatically derived for a specified data path and benchmark set. This is achieved by a multistep process: generating execution traces for the benchmark programs, sampling these traces to form a large set of small code segments, optimally recompiling these segments using exhaustive search, and finding the cover of the new instructions generated that optimizes the performance metric. The complete process is illustrated by generating an instruction set for a processor optimized for executing compiled Prolog programs. The generated instruction set is compared with the handdesigned VLSIBAM instruction set. The automatically designed instruction set is smaller and has only a few percent less performance on th...
Parallel processor scheduling with delay constraints
 In 12th Annual Symp. on Discrete algorithms
, 2001
"... We consider the problem of scheduling unitlength jobs on identical parallel machines such that the makespan of the resulting schedule is minimized. Precedence constraints impose a partial order on the jobs, and both communication and precedence delays impose relative timing constraints on dependent ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
We consider the problem of scheduling unitlength jobs on identical parallel machines such that the makespan of the resulting schedule is minimized. Precedence constraints impose a partial order on the jobs, and both communication and precedence delays impose relative timing constraints on dependent jobs. The combination of these two types of timing constraints naturally models the instruction scheduling problem that occurs during software compilation for stateoftheart VLIW (Very Long Instruction Word) processors and multiprocessor parallel machines. We present the first known polynomialtime algorithm for the case where the precedence constraint graph is a forest of intrees (or a forest of outtrees), the number of machines m is fixed, and the delays (which are a function of both the job pair and the machines on which they run) are bounded by a constant D. Our algorithm relies on a new structural theorem for scheduling jobs with arbitrary precedence constraints. Given an instance with many independent dags, the theorem shows how to convert, in linear time, a schedule S for only the largest dags into a complete schedule that is either optimal or has the same makespan as S. 1
A Fast Algorithm for Scheduling Instructions with Deadline Constraints on RISC Machines
 Proc. of the 22 nd IEEE RealTime Systems Symposium (RTSS
, 2000
"... We present a fast algorithm for scheduling UET(Unit Execution Time) instructions with deadline constraints in a basic block on RISC machines with multiple processors. Unlike Palem and Simon's algorithm, our algorithm allows latency of l ij = \Gamma1 which denotes that instruction v j cannot be ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
We present a fast algorithm for scheduling UET(Unit Execution Time) instructions with deadline constraints in a basic block on RISC machines with multiple processors. Unlike Palem and Simon's algorithm, our algorithm allows latency of l ij = \Gamma1 which denotes that instruction v j cannot be started before v i . The time complexity of our algorithm is O(ne + nd), where n is the number of instructions, e is the number of edges in the precedence graph and d is the maximum latency. Our algorithm is guaranteed to compute a feasible schedule whenever one exists in the following special cases: 1) Arbitrary precedence constraints, latencies in f0; 1g and one processor. In this special case, our algorithm improves the existing fastest algorithm from O(ne + e 0 log n) to O(minfne; n 2:376 g), where e 0 is the number of edges in the transitively closed precedence graph. 2) Arbitrary precedence constraints, latencies in f\Gamma1; 0g and two processors. In the special case where all latencies are 0, our algorithm degenerates to Garey and Johnson's two processor algorithm. 3) Special precedence constraints in the form of monotone interval graph, arbitrary latencies in f\Gamma1; 0; 1; \Delta \Delta \Delta ; dg and multiple processors. 4) Special precedence constraints in the form of inforest, equal latencies and multiple processors. In the above special cases, if no feasible schedule exists, our algorithm will compute a schedule with minimum lateness. Moreover, by setting all deadlines to a sufficiently large integer, our algorithm will compute a schedule with minimum length in all the above special cases and the special case of outforest, equal latencies and multiple processors.
A fast algorithm for scheduling timeconstrained instructions on processors with ILP
 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
, 1998
"... Instruction scheduling is central to achieving performance in modern processors with instruction level parallelism (ILP). Classical work in this area has spanned the theoretical foundations of algorithms for instruction scheduling with provable optimality, as well as heuristic approaches with exper ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Instruction scheduling is central to achieving performance in modern processors with instruction level parallelism (ILP). Classical work in this area has spanned the theoretical foundations of algorithms for instruction scheduling with provable optimality, as well as heuristic approaches with experimentally validated performance improvements. Typically, the theoretical foundations are developed in the context of basicblocks of code. In this paper, we provide the theoretical foundations for scheduling basicblocks of instructions with timeconstraints, which can play an important role in compiletime ILP optimizations in embedded applications. We present an algorithm for scheduling unitexecutiontime instructions on machines with multiple pipelines, in the presence of precedence constraints, releasetimes, deadlines, and latencies l ij between any pairs of instructions i and j. Our algorithm runs in time O(n 3 ff(n)), where ff(n) is the functional inverse of the Ackermann function....