Results 1 - 10
of
109
The Multicluster Architecture: Reducing Cycle Time Through Partitioning
, 1997
"... The multicluster architecture that we introduce offers a decentralized, dynamically-scheduled architecture, in which the register files, dispatch queue, and functional units of the architecture are distributed across multiple clusters, and each cluster is assigned a subset of the architectural regis ..."
Abstract
-
Cited by 155 (0 self)
- Add to MetaCart
The multicluster architecture that we introduce offers a decentralized, dynamically-scheduled architecture, in which the register files, dispatch queue, and functional units of the architecture are distributed across multiple clusters, and each cluster is assigned a subset of the architectural registers. The motivation for the multicluster architecture is to reduce the clock cycle time, relative to a single-cluster architecture with the same number of hardware resources, by reducing the size and complexity of components on critical timing paths. Resource partitioning, however, introduces instruction-execution overhead and may reduce the number of concurrently executing instructions. To counter these two negative by-products of partitioning, we developed a static instruction scheduling algorithm. We describe this algorithm, and using tracedriven simulations of SPEC92 benchmarks, evaluate its effectiveness. This evaluation indicates that for the configurations considered, the multicluste...
Iterated Register Coalescing
- ACM Transactions on Programming Languages and Systems
, 1996
"... An important function of any register allocator is to target registers so as to eliminate copy instructions. Graph-coloring register allocation is an elegant approach to this problem. If the source and destination of a move instruction do not interfere, then their nodes can be coalesced in the inter ..."
Abstract
-
Cited by 132 (4 self)
- Add to MetaCart
An important function of any register allocator is to target registers so as to eliminate copy instructions. Graph-coloring register allocation is an elegant approach to this problem. If the source and destination of a move instruction do not interfere, then their nodes can be coalesced in the interference graph. Chaitin's coalescing heuristic could make a graph uncolorable (i.e., introduce spills); Briggs et al. demonstrated a conservative coalescing heuristic that preserves colorability. But Briggs's algorithm is too conservative, and leaves too many move instructions in our programs. We show how to interleave coloring reductions with Briggs's coalescing heuristic, leading to an algorithm that is safe but much more aggressive. 1 Introduction Graph coloring is a powerful approach to register allocation and can have a significant impact on the execution of compiled code. A good register allocator does copy propagation, eliminating many move instructions by "coloring" the source tempor...
Linear Scan Register Allocation
- ACM Transactions on Programming Languages and Systems
, 1999
"... this article we use depth-first order. The choice of instruction ordering does not a#ect the correctness of the algorithm, but it may a#ect the quality of allocation. We discuss alternative orderings in Section 6. ..."
Abstract
-
Cited by 108 (4 self)
- Add to MetaCart
this article we use depth-first order. The choice of instruction ordering does not a#ect the correctness of the algorithm, but it may a#ect the quality of allocation. We discuss alternative orderings in Section 6.
Optimizing for Reduced Code Space Using Genetic Algorithms
, 1999
"... Code space is a critical issue facing designers of software for embedded systems. Many traditional compiler optimizations are designed to reduce the execution time of compiled code, but not necessarily the size of the compiled code. Further, di#erent results can be achieved by running some optimizat ..."
Abstract
-
Cited by 95 (10 self)
- Add to MetaCart
Code space is a critical issue facing designers of software for embedded systems. Many traditional compiler optimizations are designed to reduce the execution time of compiled code, but not necessarily the size of the compiled code. Further, di#erent results can be achieved by running some optimizations more than once and changing the order in which optimizations are applied. Register allocation only complicates matters, as the interactions between di#erent optimizations can cause more spill code to be generated. The compiler for embedded systems, then, must take care to use the best sequence of optimizations to minimize code space.
Memory-System Design Considerations For Dynamically-Scheduled Microprocessors
, 1997
"... Memory-System Design Considerations for Dynamically-Scheduled Microprocessors Keith Istvan Farkas Doctor of Philosophy Graduate Department of Electrical and Computer Engineering University of Toronto 1997 Dynamically-scheduled processors challenge hardware and software architects to develop designs ..."
Abstract
-
Cited by 66 (4 self)
- Add to MetaCart
Memory-System Design Considerations for Dynamically-Scheduled Microprocessors Keith Istvan Farkas Doctor of Philosophy Graduate Department of Electrical and Computer Engineering University of Toronto 1997 Dynamically-scheduled processors challenge hardware and software architects to develop designs that balance hardware complexity and compiler technology against performance targets. This dissertation presents a first thorough look at some of the issues introduced by this hardware complexity. The focus of the investigation of these issues is the register file and the other components of the data memory system. These components are: the lockup-free data cache, the stream buffers, and the interface to the lower levels of the memory system. The investigation is based on software models. These models incorporate the features of a dynamically-scheduled processor that affect the design of the data-memory components. The models represent a balance between accuracy and generality, and ar...
Marmot: An Optimizing Compiler for Java
, 1998
"... The Marmot system is a research platform for studying the implementation of high level programming languages. It currently comprises an optimizing native-code compiler, runtime system, and libraries for a large subset of Java. Marmot integrates well-known representation, optimization, code generat ..."
Abstract
-
Cited by 63 (6 self)
- Add to MetaCart
The Marmot system is a research platform for studying the implementation of high level programming languages. It currently comprises an optimizing native-code compiler, runtime system, and libraries for a large subset of Java. Marmot integrates well-known representation, optimization, code generation, and runtime techniques with a few Java-specific features to achieve competitive performance. This paper contains a description of the Marmot system design, along with highlights of our experience applying and adapting traditional implementation techniques to Java. A detailed performance evaluation assesses both Marmot's overall performance relative to other Java and C++ implementations and the relative costs of various Java language features in Marmot-compiled code. Our experience with Marmot has demonstrated that well-known compilation techniques can produce very good performance for static Java applications---comparable or superior to other Java systems, and approaching that o...
Optimal Spilling for CISC Machines with Few Registers
- In Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
, 2000
"... Register allocation based on graph coloring performs poorly for machines with few registers, if each temporary is held either in machine registers or memory over its entire lifetime. With the exception of short-lived temporaries, most temporaries must spill -- including long lived temporaries that a ..."
Abstract
-
Cited by 58 (1 self)
- Add to MetaCart
Register allocation based on graph coloring performs poorly for machines with few registers, if each temporary is held either in machine registers or memory over its entire lifetime. With the exception of short-lived temporaries, most temporaries must spill -- including long lived temporaries that are used within inner loops. Liverange splitting before or during register allocation helps to alleviate the problem but prior techniques are sometimes complex, make no guarantees about subsequent colorability and thus require further iterations of splitting, pay no attention to addressing modes, and make no claim to optimality. We formulate the register allocation problem for CISC architectures with few registers in two parts: an integer linear program that determines the optimal location to break up the implementation of a live range between registers and memory, and a register assignment phase that we guarantee to complete without further spill code insertion. Our linear programming model ...
Practical Improvements to the Construction and Destruction of Static Single Assignment Form
, 1998
"... Static single assignment (SSA) form is a program representation becoming increasingly popular for compiler-based code optimization. In this paper, we address three problems that have arisen in our use of SSA form. Two are variations to the SSA construction algorithms presented by Cytron et al. The f ..."
Abstract
-
Cited by 55 (3 self)
- Add to MetaCart
Static single assignment (SSA) form is a program representation becoming increasingly popular for compiler-based code optimization. In this paper, we address three problems that have arisen in our use of SSA form. Two are variations to the SSA construction algorithms presented by Cytron et al. The first variation is a version of...
Quality and Speed in Linear-Scan Register Allocation
- In SIGPLAN Conference on Programming Language Design and Implementation
, 1998
"... ing control flow as a linear ordering of the basic blocks makes linear-scan allocators run efficiently. If the allocation decisions in each basic block were independent from the decisions in other blocks, then the order in which we processed the blocks would be immaterial. But in fact some informati ..."
Abstract
-
Cited by 51 (2 self)
- Add to MetaCart
ing control flow as a linear ordering of the basic blocks makes linear-scan allocators run efficiently. If the allocation decisions in each basic block were independent from the decisions in other blocks, then the order in which we processed the blocks would be immaterial. But in fact some information about the register state and consistency is carried beyond basic block boundaries. This section enumerates the possible edges and transitions in the linear ordering and their effect on this information. The simplest edge (u,v) followed in the linear ordering occurs when u has no other successors and v has no other predecessors. The edge (2,3) in Figure 5 is an example. 21 Since this edge is the only possible transition out of u and into v, any state existing at the bottom of u must also hold at the top of v. This kind of edge is relatively rare since the compiler usually collapses the two blocks into a single one. The next kind of edge occurs when u has multiple successors, but v has on...
Register Promotion in C Programs
- Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI-97
, 1997
"... The combination of pointers and pointer arithmetic in C makes the task of improving C programs somewhat more difficult than improving programs written in simpler languages like Fortran. While much work has been published that focuses on the analysis of pointers, little has appeared that uses the res ..."
Abstract
-
Cited by 46 (3 self)
- Add to MetaCart
The combination of pointers and pointer arithmetic in C makes the task of improving C programs somewhat more difficult than improving programs written in simpler languages like Fortran. While much work has been published that focuses on the analysis of pointers, little has appeared that uses the results of such analysis to improve the code compiled for C. This paper examines the problem of register promotion in C and presents experimental results showing that it can have dramatic effects on memory traffic. 1 Introduction The presence of pointer-valued variables in C has long been recognized as an impediment to effective compiletime optimization. Pointers introduce a degree of uncertainty into the results of static analysis. Pointer assignments create multiple names for storage locations, with the result that the compiler must avoid reordering stores to memory. Pointer arithmetic introduces further ambiguity; understanding the results of *(p+8) requires a detailed knowledge, at compil...

