Results 1 - 10
of
11
IMPACT: An architectural framework for multiple-instruction-issue processors
- in Proceedings of the 18th International Symposium on Computer Architecture
, 1991
"... The performance of multiple-instruction-issue processors can be severely limited by the compiler's ability to generate e cient code for concurrent hardware. In the IM-PACT project, we havedeveloped IMPACT-I, a highly optimizing C compiler to exploit instruction level concurrency. The optimization ca ..."
Abstract
-
Cited by 203 (41 self)
- Add to MetaCart
The performance of multiple-instruction-issue processors can be severely limited by the compiler's ability to generate e cient code for concurrent hardware. In the IM-PACT project, we havedeveloped IMPACT-I, a highly optimizing C compiler to exploit instruction level concurrency. The optimization capabilities of the IMPACT-I C compiler are summarized in this paper. Using the IMPACT-I C compiler, we ran experiments to analyze the performance of multiple-instruction-issue processors executing some important non-numerical programs. The multiple-instruction-issue processors achieve solid speedup over high-performance single-instruction-issue processors. We ran experiments to characterize the following architectural design issues: code scheduling model, instruction issue rate, memory load latency, and function unit resource limitations. Based on the experimental results, we propose the IMPACT Architectural Framework, a set of architectural features that best support the IMPACT-I C compiler to generate e cient code for multiple-instructionissue processors. By supporting these architectural features, multiple-instruction-issue implementations of existing and new architectures receive immediate compilation support from the IMPACT-I C compiler. 1
Three Architectural Models for Compiler-Controlled Speculative Execution
- IEEE Transactions on Computers
, 1995
"... To effectively exploit instruction level parallelism, the compiler must move instructions across branches. When an instruction is moved above a branch that it is control dependent on, it is considered to be speculatively executed since it is executed before it is known whether or not its result is n ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
To effectively exploit instruction level parallelism, the compiler must move instructions across branches. When an instruction is moved above a branch that it is control dependent on, it is considered to be speculatively executed since it is executed before it is known whether or not its result is needed. There are potential hazards when speculatively executing instructions. If these hazards can be eliminated, the compiler can more aggressively schedule the code. The hazards of speculative execution are outlined in this paper. Three architectural models: restricted, general and boosting, which have increasing amounts of support for removing these hazards are discussed. The performance gained by each level of additional hardware support is analyzed using the IMPACT C compiler which performs superblock scheduling for superscalar and superpipelined processors. Index terms - Conditional branches, exception handling, speculative execution, static code scheduling, superblock, superpipelinin...
DATA PRELOAD FOR SUPERSCALAR AND VLIW PROCESSORS
, 1993
"... ... decreased the average number of clock cycles per instruction. As a result, each execution cycle has become more significant to overall system performance. To maximize the effectiveness of each cycle, one must expose instruction-level parallelism and employ memory latency tolerant techniques. How ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
... decreased the average number of clock cycles per instruction. As a result, each execution cycle has become more significant to overall system performance. To maximize the effectiveness of each cycle, one must expose instruction-level parallelism and employ memory latency tolerant techniques. However, without special architecture support, a superscalar compiler cannot effectively accomplish these two tasks in the presence of control and memory access dependences. Preloading is a class of architectural support which allows memory reads to be performed early in spite of potential violation of control and memory access dependences. With preload support, a superscalar compiler can perform more aggressive code reordering to provide increased tolerance of cache and memory access latencies and increasing instruction-level parallelism. This thesis discusses the architectural features and compiler support required to effectively utilize preload instructions to increase the overall system performance. The first hardware support is preload register update, a data preload support for load scheduling to reduce first-level cache hit latency. Preload register update keeps the load destination
Comparing Static And Dynamic Code Scheduling for Multiple-Instruction-Issue Processors
- In Proc. of the 24th International Symposium on Microarchitecture
, 1991
"... This paper examines two alternative approaches to supporting code scheduling for multiple-instruction-issue processors. One is to provide a set of non-trapping instructions so that the compiler can perform aggressive static code scheduling. The application of this approach to existing commercial arc ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
This paper examines two alternative approaches to supporting code scheduling for multiple-instruction-issue processors. One is to provide a set of non-trapping instructions so that the compiler can perform aggressive static code scheduling. The application of this approach to existing commercial architectures typically requires extending the instruction set. The other approach is to support out-of-order execution in the microarchitecture so that the hardware can perform aggressive dynamic code scheduling. This approach usually does not require modifying the instruction set but requires complex hardware support. In this paper, we analyze the performance of the two alternative approaches using a set of important nonnumerical C benchmark programs. A distinguishing feature of the experiment is that the code for the dynamic approach has been optimized and scheduled as much as allowed by the architecture. The hardware is only responsible for the additional reordering that cannot be performed...
Hardware Support for Hiding Cache Latency
- University of Michigan, Dept. Of Electrical Engineering and Computer Science
, 1993
"... As the decrease in processor cycle time continues to outpace the decrease in memory cycle time, even moderately sized on-chip caches may require several cycles of access time in the near future. This means that time is lost, even on a cache hit, if independent instructions cannot be scheduled after ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
As the decrease in processor cycle time continues to outpace the decrease in memory cycle time, even moderately sized on-chip caches may require several cycles of access time in the near future. This means that time is lost, even on a cache hit, if independent instructions cannot be scheduled after a read from memory. A novel hardware device is proposed that keeps track of the history of load instructions and predicts their targets before they are computed by the instruction pipeline. This allows the saving of several processor cycles. The storage required to implement such a device is quite large, but as the latency required to read from the first level cache grows, a moderate performance improvement is seen. Hardware Support for Hiding Cache Latency January 13, 1995 2 1.0 Introduction As processor speeds increase to higher and higher levels, the need for a fast memory system becomes more pronounced. In the past, a small, fast first-level cache was adequate to match the memory spe...
Three Superblock Scheduling Models for Superscalar and Superpipelined Processors
, 1991
"... To efficiently schedule superscalar and superpipelined processors, it is necessary to move instructions across branches. This requires increasing the scheduling scope beyond the basic block. Superblock scheduling, a static scheduling method, is a variant of trace scheduling that removes the bookk ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
To efficiently schedule superscalar and superpipelined processors, it is necessary to move instructions across branches. This requires increasing the scheduling scope beyond the basic block. Superblock scheduling, a static scheduling method, is a variant of trace scheduling that removes the bookkeeping complexity associated with branches into a trace by removing these entrances using a method called tail duplication. Once the scheduling scope is enlarged, there are hazards to moving an instruction above a conditional branch because the instruction is normally only executed on one path of the conditional branch. To allow the compiler to schedule code more aggressively, hardware support can be provided to prevent such hazards. In this paper we analyze the architecture support and performance of three superblock scheduling models.
Compiler support for SPARC architecture processors
- University of Illinois
, 1994
"... This work shows how a single compiler front-end and optimization suite may be used to generate high quality code speci c processors. A C language front-end is used. An initial pass of the compiler is used to instrument the program in order to collect a trace of the dynamic behavior of programs into ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
This work shows how a single compiler front-end and optimization suite may be used to generate high quality code speci c processors. A C language front-end is used. An initial pass of the compiler is used to instrument the program in order to collect a trace of the dynamic behavior of programs into an execution pro le used to guide later code optimization phases. Generic code optimization techniques are applied, then machine speci c optimizations are performed. Code is then generated using a machine speci c code generator, and then several more machine speci c optimizations are performed. Results gathered in generating code for the Sparc architecture are presented in this work. In spite of an incomplete suite of optimizations, the output code is of comparable overall quality to that generated by the Sun compiler. iii ACKNOWLEDGMENTS My Mom and Dad's encouragement and support enabled me to achieve asmuch asIhave. I thank Professor Janek Patel with whom I spoke at great length; the conversations I had with him helped me to choose the University of Illinois. I thank Professor Wen-Mei Hwu; the thesis work I have done was interesting and appropriately challenging. As a result, I was able to change careers and do exciting and challenging work on production quality compilers. I thank all my friends in the Center for Reliable and High-performance Computing and in the Computer Science Department at the University. The time we spentworking, sharing information and relaxing together allowed me to remain sane and fairly relaxed at a time that would have otherwise been extremely stressful. In particular, I thank John Coolidge, Johnny
A Practical Methodology for the Formal Verification of RISC Processors
, 1995
"... In this paper a practical methodology for formally verifying RISC cores is presented. This methodology is based on a hierarchical model of interpreters which reflects the abstraction levels used by a designer in the implementation of RISC cores, namely the architecture level, the pipeline stage leve ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
In this paper a practical methodology for formally verifying RISC cores is presented. This methodology is based on a hierarchical model of interpreters which reflects the abstraction levels used by a designer in the implementation of RISC cores, namely the architecture level, the pipeline stage level, the clock phase level and the hardware implementation. The use of this model allows us to successively prove the correctness between two neighbouring levels of abstractions, so that the verification process is simplified. The parallelism in the execution of the instructions, resulting from the pipelined architecture of RISCs is handled by splitting the proof into two independent steps. The first step shows that each architectural instruction is implemented correctly by the sequential execution of its pipeline stages. The second step shows that the instructions are correctly processed by the pipeline in that we prove that under certain constraints from the actual architecture, no conflic...
Lanalysis: A Performance Analysis Tool For The Impact Compiler
, 1996
"... CONTENTS Page 1. INTRODUCTION................................................................................................. 1 1.1 Organization of the Thesis ............................................................................... 1 1.2 Motivation for the Project........................ ..."
Abstract
- Add to MetaCart
CONTENTS Page 1. INTRODUCTION................................................................................................. 1 1.1 Organization of the Thesis ............................................................................... 1 1.2 Motivation for the Project................................................................................ 2 2. OVERVIEW OF THE IMPACT COMPILER....................................................... 4 2.1 The IMPACT Compilation Process ................................................................. 4 2.2 The Lcode Intermediate Code Representation.................................................. 7 3. FUNCTIONALITY OF THE LANALYSIS TOOL .............................................. 9 3.1 Basic Organization .......................................................................................... 9 3.2 Getting Started ................................................................................................ 12 3.3 Current Func
Loop Optimization Techniques On Multi-Issue Architectures
, 1994
"... CONTENTS ACKNOWLEDGMENTS.................................................................................................. iii LIST OF TABLES ............................................................................................................. vi LIST OF FIGURES .......................... ..."
Abstract
- Add to MetaCart
CONTENTS ACKNOWLEDGMENTS.................................................................................................. iii LIST OF TABLES ............................................................................................................. vi LIST OF FIGURES .......................................................................................................... vii CHAPTER I INTRODUCTION ...............................................................................................................1 1 Scheduling....................................................................................................2 2 Methodology. ...............................................................................................5 3 Research Contributions ..............................................................................12 4 Thesis Organization ...................................................................................13 CHAPTER II INSTRUCTION

