Results 1 - 10
of
74
Limits of instruction-level parallelism
, 1991
"... research relevant to the design and application of high performance scientific computers. We test our ideas by designing, building, and using real systems. The systems we build are research prototypes; they are not intended to become products. There two other research laboratories located in Palo Al ..."
Abstract
-
Cited by 339 (7 self)
- Add to MetaCart
research relevant to the design and application of high performance scientific computers. We test our ideas by designing, building, and using real systems. The systems we build are research prototypes; they are not intended to become products. There two other research laboratories located in Palo Alto, the Network Systems
IMPACT: An architectural framework for multiple-instruction-issue processors
- in Proceedings of the 18th International Symposium on Computer Architecture
, 1991
"... The performance of multiple-instruction-issue processors can be severely limited by the compiler's ability to generate e cient code for concurrent hardware. In the IM-PACT project, we havedeveloped IMPACT-I, a highly optimizing C compiler to exploit instruction level concurrency. The optimization ca ..."
Abstract
-
Cited by 203 (41 self)
- Add to MetaCart
The performance of multiple-instruction-issue processors can be severely limited by the compiler's ability to generate e cient code for concurrent hardware. In the IM-PACT project, we havedeveloped IMPACT-I, a highly optimizing C compiler to exploit instruction level concurrency. The optimization capabilities of the IMPACT-I C compiler are summarized in this paper. Using the IMPACT-I C compiler, we ran experiments to analyze the performance of multiple-instruction-issue processors executing some important non-numerical programs. The multiple-instruction-issue processors achieve solid speedup over high-performance single-instruction-issue processors. We ran experiments to characterize the following architectural design issues: code scheduling model, instruction issue rate, memory load latency, and function unit resource limitations. Based on the experimental results, we propose the IMPACT Architectural Framework, a set of architectural features that best support the IMPACT-I C compiler to generate e cient code for multiple-instructionissue processors. By supporting these architectural features, multiple-instruction-issue implementations of existing and new architectures receive immediate compilation support from the IMPACT-I C compiler. 1
Instruction-Level Parallel Processing: History, Overview and Perspective
, 1992
"... Instruction-level Parallelism CILP) is a family of processor and compiler design techniques that speed up execution by causing individual machine operations to execute in parallel. Although ILP has appeared in the highest performance uniprocessors for the past 30 years, the 1980s saw it become a muc ..."
Abstract
-
Cited by 166 (0 self)
- Add to MetaCart
Instruction-level Parallelism CILP) is a family of processor and compiler design techniques that speed up execution by causing individual machine operations to execute in parallel. Although ILP has appeared in the highest performance uniprocessors for the past 30 years, the 1980s saw it become a much more significant force in computer design. Several systems were built, and sold commercially, which pushed ILP far beyond where it had been before, both in terms of the amount of ILP offered and in the central role ILP played in the design of the system. By the end of the decade, advanced microprocessor design at all major CPU manufacturers had incorporated ILP, and new techniques for ILP have become a popular topic at academic conferences. This article provides an overview and historical perspective of the field of ILP and its development over the past three decades.
Lifetime-Sensitive Modulo Scheduling
- In Proc. of the ACM SIGPLAN '93 Conf. on Programming Language Design and Implementation
, 1993
"... This paper shows how to software pipeline a loop for minimal register pressure without sacrificing the loop's minimum execution time. This novel bidirectional slack-scheduling method has been implemented in a FORTRAN compiler and tested on many scientific benchmarks. The empirical results---when me ..."
Abstract
-
Cited by 129 (0 self)
- Add to MetaCart
This paper shows how to software pipeline a loop for minimal register pressure without sacrificing the loop's minimum execution time. This novel bidirectional slack-scheduling method has been implemented in a FORTRAN compiler and tested on many scientific benchmarks. The empirical results---when measured against an absolute lower bound on execution time, and against a novel schedule-independent absolute lower bound on register pressure---indicate nearoptimal performance. 1 Introduction Software pipelining increases a loop's throughput by overlapping the loop's iterations; that is, by initiating successive iterations before prior iterations complete. With sufficient overlap, a functional unit can be saturated, at which point the loop initiates iterations at the maximum possible rate. To find an overlapped schedule, a compiler must represent the complex resource constraints that can arise. Efficiently representing these constraints is especially difficult when adjacent iterations do n...
Compiler optimization-space exploration
- In Proceedings of the international symposium on Code generation and optimization
, 2003
"... To meet the demands of modern architectures, optimizing compilers must incorporate an ever larger number of increasingly complex transformation algorithms. Since code transformations may often degrade performance or interfere with subsequent transformations, compilers employ predictive heuristics to ..."
Abstract
-
Cited by 87 (1 self)
- Add to MetaCart
To meet the demands of modern architectures, optimizing compilers must incorporate an ever larger number of increasingly complex transformation algorithms. Since code transformations may often degrade performance or interfere with subsequent transformations, compilers employ predictive heuristics to guide optimizations by predicting their effects a priori. Unfortunately, the unpredictability of optimization interaction and the irregularity of today’s wide-issue machines severely limit the accuracy of these heuristics. As a result, compiler writers may temper high variance optimizations with overly conservative heuristics or may exclude these optimizations entirely. While this process results in a compiler capable of generating good average code quality across the target benchmark set, it is at the cost of missed optimization opportunities in individual code segments. To replace predictive heuristics, researchers have proposed compilers which explore many optimization options, selecting the best one a posteriori. Unfortunately, these existing iterative compilation techniques are not practical for reasons of compile time and applicability. In this paper, we present the Optimization-Space Exploration (OSE) compiler organization, the first practical iterative compilation strategy applicable to optimizations in general-purpose compilers. Instead of replacing predictive heuristics, OSE uses the compiler writer’s knowledge encoded in the heuristics to select a small number of promising optimization alternatives for a given code segment. Compile time is limited by evaluating only these alternatives for hot code segments using a general compiletime performance estimator. An OSE-enhanced version of Intel’s highly-tuned, aggressively optimizing production compiler for IA-64 yields a significant performance improvement, more than 20 % in some cases, on Itanium for SPEC codes. 1.
Efficient Superscalar Performance through Boosting
, 1992
"... The foremost goal of superscalar processor design is to increase performance through tie exploitation of instruction-level parallelism (ILP). Previous studies have shown that speculative execution is required for high instruction per cycle (IPC) rates in non-numerical applications. The general trend ..."
Abstract
-
Cited by 78 (5 self)
- Add to MetaCart
The foremost goal of superscalar processor design is to increase performance through tie exploitation of instruction-level parallelism (ILP). Previous studies have shown that speculative execution is required for high instruction per cycle (IPC) rates in non-numerical applications. The general trend has been toward supporting speculative execution in complicated, dynamically-scheduled processors. Performance, though, is more than just a high IPC rate; it also depends upon instruction count and cycle time. Boosting is an architectural technique that supports general speculative execution in simpler, statically-scheduled processors. Boosting labels speculative instructions with their control dependence information. This Iabelling eliminates control dependence constraints on instruction scheduling while still providing full dependence information to the hardwere. We have incorporated boosting into a trace-based, global scheduling algorithm that exploits ILP without adversely affecting the instruction count of a program. We use this algorithm and estimates of the boosting hardware involved to evaluate how much speculative execution support is rerdly necessary to achieve good performance. We find that a statically-scheduled superscalar processor using a minimal implementation of boosting can easily reach the performance of a much more complex dynamically-schcduled superscalar processor.
Concrete Type Inference: Delivering Object-Oriented Applications
, 1995
"... Unlimited copying without fee is permitted provided that the copies are not made nor distributed for direct commercial advantage, and credit to the source is given. Otherwise, no part of this work covered by copyright hereon may be reproduced in any form or by any means graphic, electronic, or mecha ..."
Abstract
-
Cited by 49 (0 self)
- Add to MetaCart
Unlimited copying without fee is permitted provided that the copies are not made nor distributed for direct commercial advantage, and credit to the source is given. Otherwise, no part of this work covered by copyright hereon may be reproduced in any form or by any means graphic, electronic, or mechanical, including photocopying, recording, taping, or storage in an information retrieval system, without the prior written permission of the copyright owner. TRADEMARKS Sun, Sun Microsystems, and the Sun logo are trademarks or registered trademarks of Sun Microsystems, Inc. UNIX is a registered trademark in the United States and other countries, exclusively licensed through X/Open Company, Ltd. All SPARC trademarks, including the SCD Compliant Logo, are trademarks or registered trademarks of SPARC International, Inc. SPARCstation, SPARCserver, SPARCengine, SPARCworks, and SPARCompiler are licensed exclusively to Sun Microsystems, Inc. All other product names mentioned herein are the trademarks of their respective owners.
Compiling for the Multiscalar Architecture
, 1998
"... High-performance, general-purpose microprocessors serve as compute engines for computers ranging from personal computers to supercomputers. Sequential programs constitute a major portion of real-world software that run on the computers. State-of-the-art microprocessors exploit instruction level para ..."
Abstract
-
Cited by 45 (2 self)
- Add to MetaCart
High-performance, general-purpose microprocessors serve as compute engines for computers ranging from personal computers to supercomputers. Sequential programs constitute a major portion of real-world software that run on the computers. State-of-the-art microprocessors exploit instruction level parallelism (ILP) to achieve high performance on such applications by searching for independent instructions in a dynamic window of instructions and executing them on a wide-issue pipeline. Increasing the window size and the issue width to extract more ILP may hinder achieving high clock speeds, limiting overall performance. The Multiscalar architecture employs multiple small windows and many narrow-issue processing units to exploit ILP at high clock speeds. Sequential programs are partitioned into code fragments called tasks, which are speculatively executed in parallel. Inter-task register dependences are honored via communication and synchronization and inter-task control flow and memory depe...
Sentinel scheduling: a model for compiler-controlled speculative execution
- ACM Transactions on Computer Systems
, 1993
"... Speculative execution is an important source of parallelism for VLIW and superscalar processors. A serious challenge with compiler-controlled speculative execution is to e ciently handle exceptions for speculative instructions. In this paper, a set of architectural features and compiletime schedulin ..."
Abstract
-
Cited by 41 (13 self)
- Add to MetaCart
Speculative execution is an important source of parallelism for VLIW and superscalar processors. A serious challenge with compiler-controlled speculative execution is to e ciently handle exceptions for speculative instructions. In this paper, a set of architectural features and compiletime scheduling support collectively referred to as sentinel scheduling is introduced. Sentinel scheduling provides an e ective framework for both compiler-controlled speculative execution and exception handling. All program exceptions are accurately detected and reported in a timely manner with sentinel scheduling. Recovery from exceptions is also ensured with the model. Experimental results show the e ectiveness of sentinel scheduling for exploiting instruction-level parallelism and the overhead associated with exception handling. Categories and Subject Descriptors: B.3.2 [Memory Structures]: Design Styles{associative memories � C.0[Computer Systems Organization]: General{hardware/software interfaces� instruction set design � system architectures � C.1.2 [Processor Architectures]: Single Data Stream Architectures{pipeline processors � D.2.4 [Software Engineering]: Testing and Debugging{

