Results 1 - 10
of
21
Optimal Instruction Scheduling Using Integer Programming
- Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation
, 2000
"... Abstract { This paper presents a new approach to local instruction scheduling based on integer programming that produces optimal instruction schedules in a reasonable time, even for very large basic blocks. The new approach rst uses a set of graph transformations to simplify the datadependency graph ..."
Abstract
-
Cited by 38 (3 self)
- Add to MetaCart
Abstract { This paper presents a new approach to local instruction scheduling based on integer programming that produces optimal instruction schedules in a reasonable time, even for very large basic blocks. The new approach rst uses a set of graph transformations to simplify the datadependency graph while preserving the optimality of the nal schedule. The simpli ed graph results in a simpli ed integer program which can be solved much faster. A new integerprogramming formulation is then applied to the simpli ed graph. Various techniques are used to simplify the formulation, resulting in fewer integer-program variables, fewer integer-program constraints and fewer terms in some of the remaining constraints, thus reducing integer-program solution time. The new formulation also uses certain adaptively added constraints (cuts) to reduce solution time. The proposed optimal instruction scheduler is built within the Gnu Compiler Collection (GCC) and is evaluated experimentally using the SPEC95 oating point benchmarks. Although optimal scheduling for the target processor is considered intractable, all of the benchmarks ' basic blocks are optimally scheduled, including blocks with up to 1000 instructions, while total compile time increases by only 14%. 1
Approximation Bounds for a General Class of Precedence Constrained Parallel Machine Scheduling Problems
- Integer Programming and Combinatorial Optimization, volume 1412 of Lecture Notes in Computer Science
, 1998
"... A well studied and difficult class of scheduling problems concerns parallel machines and precedence constraints. In order to model more realistic situations, we consider precedence delays, associating with each precedence constraint a certain amount of time which must elapse between the completion a ..."
Abstract
-
Cited by 24 (4 self)
- Add to MetaCart
A well studied and difficult class of scheduling problems concerns parallel machines and precedence constraints. In order to model more realistic situations, we consider precedence delays, associating with each precedence constraint a certain amount of time which must elapse between the completion and start times of the corresponding jobs. Release dates, among others, may be modeled in this fashion. We provide the first constant-factor approximation algorithms for the makespan and the total weighted completion time objectives in this general class of problems. These algorithms are rather simple and practical forms of list scheduling. Our analysis also unifies and simplifies that of a number of special cases heretofore separately studied, while actually improving some of the former approximation results.
Multithreaded Architectures: Principles, Projects and Issues
, 1994
"... The architecture of future high performance computer systems will respond to the possibilities offered by technology and to the increasing demand for attention to issues of programmability. Multithreaded processing element architectures are a promising alternative to RISC architecture and its multip ..."
Abstract
-
Cited by 23 (12 self)
- Add to MetaCart
The architecture of future high performance computer systems will respond to the possibilities offered by technology and to the increasing demand for attention to issues of programmability. Multithreaded processing element architectures are a promising alternative to RISC architecture and its multiple-instruction-issue extensions such as VLIW, superscalar, and superpipelined architectures. This paper presents an overview of multithreaded computer architectures and the technical issues affecting their prospective evolution. We introduce the basic concepts of multithreaded computer architecture and describe several architectures representative of the design space for multithreaded, parallel computers. We review design issues for multithreaded processing elements intended for use as the node processor of parallel computers for scientific computing. These include the question of choosing an appropriate program execution model, the organization of the processing element to achieve good utilization of major resources, support for fine-grain interprocessor communication and global memory access, compiling machine code for multithreaded processors, and the challenge of implementing virtual memory in large-scale multiprocessor systems.
Fast Optimal Instruction Scheduling for Single-issue Processors with Arbitrary Latencies
, 2001
"... Instruction scheduling is one of the most important steps for improving the performance of object code produced by a compiler. The local instruction scheduling problem is to find a minimum length instruction schedule for a basic block subject to precedence, latency, and resource constraints. In ..."
Abstract
-
Cited by 22 (7 self)
- Add to MetaCart
Instruction scheduling is one of the most important steps for improving the performance of object code produced by a compiler. The local instruction scheduling problem is to find a minimum length instruction schedule for a basic block subject to precedence, latency, and resource constraints. In this paper we consider local instruction scheduling for single-issue processors with arbitrary latencies. The problem is considered intractable, and heuristic approaches are currently used in production compilers. In contrast, we present a relatively simple approach to instruction scheduling based on constraint programming which is fast and optimal. The proposed approach uses an improved constraint model which allows it to scale up to very large, real problems. We describe powerful redundant constraints that allow a standard constraint solver to solve these scheduling problems in an almost backtrack-free manner. The redundant constraints are lower bounds on selected sub-problems which take advantage of the structure inherent in the problems. Under specified conditions, these constraints are sometimes further improved by testing the consistency of a sub-problem using a fast test. We experimentally evaluated our approach by integrating it into the Gnu Compiler Collection (GCC) and then applying it to the SPEC95 floating point benchmarks. All 7402 of the benchmarks' basic-blocks were optimally scheduled, including basic-blocks with up to 1000 instructions. Our results compare favorably to the best previous approach which is based on integer linear programming (Wilken et al., 2000): Across the same benchmarks, the total optimal scheduling time for their approach is 98 seconds while the total time for our approach is less than 5 seconds. 1
Automatic Design of Computer Instruction Sets
, 1993
"... This dissertation presents the thesis that good and usable instruction sets can be automatically derived for a specified data path and benchmark set. This is achieved by a multistep process: generating execution traces for the benchmark programs, sampling these traces to form a large set of small c ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
This dissertation presents the thesis that good and usable instruction sets can be automatically derived for a specified data path and benchmark set. This is achieved by a multistep process: generating execution traces for the benchmark programs, sampling these traces to form a large set of small code segments, optimally recompiling these segments using exhaustive search, and finding the cover of the new instructions generated that optimizes the performance metric. The complete process is illustrated by generating an instruction set for a processor optimized for executing compiled Prolog programs. The generated instruction set is compared with the hand-designed VLSI-BAM instruction set. The automatically designed instruction set is smaller and has only a few percent less performance on th...
Parallel processor scheduling with delay constraints
- In 12th Annual Symp. on Discrete algorithms
, 2001
"... We consider the problem of scheduling unit-length jobs on identical parallel machines such that the makespan of the resulting schedule is minimized. Precedence constraints impose a partial order on the jobs, and both communication and precedence delays impose relative timing constraints on dependent ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
We consider the problem of scheduling unit-length jobs on identical parallel machines such that the makespan of the resulting schedule is minimized. Precedence constraints impose a partial order on the jobs, and both communication and precedence delays impose relative timing constraints on dependent jobs. The combination of these two types of timing constraints naturally models the instruction scheduling problem that occurs during software compilation for stateof-the-art VLIW (Very Long Instruction Word) processors and multiprocessor parallel machines. We present the first known polynomial-time algorithm for the case where the precedence constraint graph is a forest of in-trees (or a forest of out-trees), the number of machines m is fixed, and the delays (which are a function of both the job pair and the machines on which they run) are bounded by a constant D. Our algorithm relies on a new structural theorem for scheduling jobs with arbitrary precedence constraints. Given an instance with many independent dags, the theorem shows how to convert, in linear time, a schedule S for only the largest dags into a complete schedule that is either optimal or has the same makespan as S. 1
Scheduling Time-Constrained Instructions on Pipelined Processors
- ACM Transactions on Programming Languages and Systems
, 2001
"... this paper, we describe the rst polynomial time algorithm that is guaranteed to nd feasible schedules given basic-blocks of code with these time constraints, whenever such schedules exist, on various models of RISC machines. In doing so, we unify and generalize many earlier results on schedulin ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
this paper, we describe the rst polynomial time algorithm that is guaranteed to nd feasible schedules given basic-blocks of code with these time constraints, whenever such schedules exist, on various models of RISC machines. In doing so, we unify and generalize many earlier results on scheduling for pipelined machines, including the earlier work due to Palem and Simons [34] on scheduling on pipelined machines with deadlines, and that of Garey and Johnson on scheduling tasks with release-times and deadlines on two identical processors [14], plus other related pipelined scheduling problems with identical latencies proposed by Bruno, Jones and So [8]
A fast algorithm for scheduling time-constrained instructions on processors with ILP
- Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
, 1998
"... Instruction scheduling is central to achieving performance in modern processors with instruction level parallelism (ILP). Classical work in this area has spanned the theoretical foundations of algorithms for instruction scheduling with provable optimality, as well as heuristic approaches with exper ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Instruction scheduling is central to achieving performance in modern processors with instruction level parallelism (ILP). Classical work in this area has spanned the theoretical foundations of algorithms for instruction scheduling with provable optimality, as well as heuristic approaches with experimentally validated performance improvements. Typically, the theoretical foundations are developed in the context of basic-blocks of code. In this paper, we provide the theoretical foundations for scheduling basic-blocks of instructions with time-constraints, which can play an important role in compile-time ILP optimizations in embedded applications. We present an algorithm for scheduling unit-execution-time instructions on machines with multiple pipelines, in the presence of precedence constraints, release-times, deadlines, and latencies l ij between any pairs of instructions i and j. Our algorithm runs in time O(n 3 ff(n)), where ff(n) is the functional inverse of the Ackermann function....

