Results 1 - 10
of
82
The Superblock: An effective technique for VLIW and superscalar compilation
- THE JOURNAL OF SUPERCOMPUTING
, 1993
"... A compiler for VLIW and superscalar processors must expose sufficient instruction-level parallelism (ILP) to eddectively utilize the parallel hardware. However, ILP within basic blocks is extremely limited for control-intensive programs. We have developed a set of techniques for exploiting ILP acros ..."
Abstract
-
Cited by 249 (26 self)
- Add to MetaCart
A compiler for VLIW and superscalar processors must expose sufficient instruction-level parallelism (ILP) to eddectively utilize the parallel hardware. However, ILP within basic blocks is extremely limited for control-intensive programs. We have developed a set of techniques for exploiting ILP across basic block boundaries. These techniques are based on a novel structure called the superblock. The superblock enables the optimizer and scheduler to extract more ILP along the important execution paths by systematically removing constraints due to the unimportant paths. Superblock optimization and scheduling have been implemented in the IMPACT-I compiler. This implementation gives us a unique opportunity to fully understand the issues involved in incorporating these techniques into a real compiler. Superblock optimizations and scheduling are shown to be useful while taking into account a variety of architectural features.
Limits of Control Flow on Parallelism
, 1992
"... This paper discusses three techniques useful in relaxing the constraints imposed by control flow on parallelism: control dependence analysis, executing multiple flows of control simultaneously, and speculative execution. We evaluate these techniques by using trace simulations to find the limits of p ..."
Abstract
-
Cited by 218 (2 self)
- Add to MetaCart
This paper discusses three techniques useful in relaxing the constraints imposed by control flow on parallelism: control dependence analysis, executing multiple flows of control simultaneously, and speculative execution. We evaluate these techniques by using trace simulations to find the limits of parallelism for machines that employ different combinations of these techniques. We have three major results. First, local regions of code have limited parallelism, and control dependence analysis is useful in extracting global parallelism from different parts of a program. Second, a superscalar processor is fundamentally limited because it cannot execute independent regions of code concurrently. Higher performance can be obtained with machines, such as multiprocessors and dataflow machines, that can simultaneously follow multiple flows of control. Finally, without speculative execution to allow instructions to execute before their control dependences are resolved, only modest amounts of parallelism can be obtained for programs with complex control flow.
Instruction-Level Parallel Processing: History, Overview and Perspective
, 1992
"... Instruction-level Parallelism CILP) is a family of processor and compiler design techniques that speed up execution by causing individual machine operations to execute in parallel. Although ILP has appeared in the highest performance uniprocessors for the past 30 years, the 1980s saw it become a muc ..."
Abstract
-
Cited by 166 (0 self)
- Add to MetaCart
Instruction-level Parallelism CILP) is a family of processor and compiler design techniques that speed up execution by causing individual machine operations to execute in parallel. Although ILP has appeared in the highest performance uniprocessors for the past 30 years, the 1980s saw it become a much more significant force in computer design. Several systems were built, and sold commercially, which pushed ILP far beyond where it had been before, both in terms of the amount of ILP offered and in the central role ILP played in the design of the system. By the end of the decade, advanced microprocessor design at all major CPU manufacturers had incorporated ILP, and new techniques for ILP have become a popular topic at academic conferences. This article provides an overview and historical perspective of the field of ILP and its development over the past three decades.
Scout: A Communications-Oriented Operating System
, 1994
"... This white paper describes Scout, a new operating system being designed for systems connected to the National Information Infrastructure (NII). Scout provides a communication-oriented software architecture for building operating system code that is specialized for the different systems that we expec ..."
Abstract
-
Cited by 114 (3 self)
- Add to MetaCart
This white paper describes Scout, a new operating system being designed for systems connected to the National Information Infrastructure (NII). Scout provides a communication-oriented software architecture for building operating system code that is specialized for the different systems that we expect to be available on the NII. It includes an explicit path abstraction that both facilitates effective resource management and permits optimizations of the critical path that I/O data follows. These path-enabled optimizations, along with the application of advanced compiler techniques, result in a system that has both predictable and scalable performance. June 17, 1994 Department of Computer Science The University of Arizona Tucson, AZ 1 Introduction As the National Information Infrastructure (NII) evolves, and digital computer networks become ubiquitous, communication will play an increasingly important role in computer systems. In fact, a recent report on the NII rejects the term "compu...
Efficient Superscalar Performance through Boosting
, 1992
"... The foremost goal of superscalar processor design is to increase performance through tie exploitation of instruction-level parallelism (ILP). Previous studies have shown that speculative execution is required for high instruction per cycle (IPC) rates in non-numerical applications. The general trend ..."
Abstract
-
Cited by 78 (5 self)
- Add to MetaCart
The foremost goal of superscalar processor design is to increase performance through tie exploitation of instruction-level parallelism (ILP). Previous studies have shown that speculative execution is required for high instruction per cycle (IPC) rates in non-numerical applications. The general trend has been toward supporting speculative execution in complicated, dynamically-scheduled processors. Performance, though, is more than just a high IPC rate; it also depends upon instruction count and cycle time. Boosting is an architectural technique that supports general speculative execution in simpler, statically-scheduled processors. Boosting labels speculative instructions with their control dependence information. This Iabelling eliminates control dependence constraints on instruction scheduling while still providing full dependence information to the hardwere. We have incorporated boosting into a trace-based, global scheduling algorithm that exploits ILP without adversely affecting the instruction count of a program. We use this algorithm and estimates of the boosting hardware involved to evaluate how much speculative execution support is rerdly necessary to achieve good performance. We find that a statically-scheduled superscalar processor using a minimal implementation of boosting can easily reach the performance of a much more complex dynamically-schcduled superscalar processor.
Software Support for Speculative Loads
- Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems
, 1992
"... This paper describes a simple hardware mechanism and related compiler support for software-controlled speculative loads. The compiler issues speculative load instructions based on anticipated data references and the ability of the memory system to hide memory latency in high-performance processors. ..."
Abstract
-
Cited by 62 (3 self)
- Add to MetaCart
This paper describes a simple hardware mechanism and related compiler support for software-controlled speculative loads. The compiler issues speculative load instructions based on anticipated data references and the ability of the memory system to hide memory latency in high-performance processors. The architectural support for such a mechanism is simple and minimal, yet handles faults gracefully. We have simulated the speculative load mechanism based on a MIPS processor and a detailed memory system. The results of scientific kernel loops indicate that our speculative load technique is an effective approaches to hiding memory latency. 1 Introduction The performance gap between processors and memory has widened in the last few years. In the last decade, microprocessor speeds have increased at a rate of 50% to 100% each year whereas DRAM speeds have increased at a rate of 10% or less each year [13]. As the performance gap becomes wider, high-performance processors become more sensitive...
Reverse If-Conversion
- in Proceedings of the ACM SIGPLAN 1993 Conference on Programming Language Design and Implementation
, 1993
"... In this paper we present a set of isomorphic control transformations that allow the compiler to apply local scheduling techniques to acyclic subgraphs of the control flow graph. Thus, the code motion complexities of global scheduling are eliminated. This approach relies on a new technique, Reverse I ..."
Abstract
-
Cited by 61 (8 self)
- Add to MetaCart
In this paper we present a set of isomorphic control transformations that allow the compiler to apply local scheduling techniques to acyclic subgraphs of the control flow graph. Thus, the code motion complexities of global scheduling are eliminated. This approach relies on a new technique, Reverse If-Conversion (RIC), that transforms scheduled If-Converted code back to the control flow graph representation. This paper presents the predicate internal representation, the algorithms for RIC, and the correctness of RIC. In addition, the scheduling issues are addressed and an application to software pipelining is presented. 1 Introduction Compilers for processors with instruction level parallelism hardware need a large pool of operations to schedule from. In processors without support for conditional execution, branches present a scheduling barrier that limits the pool of operations to the basic block. Since basic blocks tend to have only a few operations, global scheduling techniques are ...
Precise Compile-Time Performance Prediction for Superscalar-Based Computers
- in Proc. ACM SIGPLAN PLDI'94
, 1994
"... Optimizing compilers (particularly parallel compilers) are constrained by their ability to predict performance consequences of the transformations they apply. Many factors, such as unknowns in control structures, dynamic behavior of programs, and complexity of the underlying hardware, make it very d ..."
Abstract
-
Cited by 45 (0 self)
- Add to MetaCart
Optimizing compilers (particularly parallel compilers) are constrained by their ability to predict performance consequences of the transformations they apply. Many factors, such as unknowns in control structures, dynamic behavior of programs, and complexity of the underlying hardware, make it very difficult for compilers to estimate the performance of the transformations accurately and efficiently. In this paper, we present a performance prediction framework that combines several innovative approaches to solve this problem. First, the framework employs a detailed, architecture-specific, but portable, cost model that can be used to estimate the cost of straight line code efficiently. Second, aggregated costs of loops and conditional statements are computed and represented symbolically. This avoids unnecessary, premature guesses and preserves the precision of the prediction. Third, symbolic comparison allows compilers to choose the best transformation dynamically and systematically. Some...

