Results 1 -
6 of
6
Effective compilation support for variable instruction set architecture
- in IEEE International Conference on Parallel Architectures and Compilation Techniques
, 2002
"... Traditional compilers perform their code generation tasks based on a fixed, pre-determined instruction set. This paper describes the implementation of a compiler that determines the best instruction set to use for a given program and generates efficient code sequence based on it. We first give an ov ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Traditional compilers perform their code generation tasks based on a fixed, pre-determined instruction set. This paper describes the implementation of a compiler that determines the best instruction set to use for a given program and generates efficient code sequence based on it. We first give an overview of the VISC Architecture pioneered at Cognigine that exemplifies a Variable Instruction Set Architecture. We then present three compilation techniques that, when combined, enable us to provide effective compilation and optimization support for such an architecture. The first technique involves the use of an abstract operation representation that enables the code generator to optimize towards the core architecture of the processor without committing to any specific instruction format. The second technique uses an enumeration approach to scheduling that yields near-optimal instruction schedules while still adhering to the irregular constraints imposed by the architecture. We then derive the dictionary and the instruction output based on this schedule. The third technique superimposes dictionary re-use on the enumeration algorithm to provide trade-off between program performance and dictionary budget. This enables us to make maximal use of the dictionary space without exceeding its limit. Finally, we provide measurements to show the effectiveness of these techniques.
FLASH: Foresighted latency-aware scheduling heuristic for processors with customized datapaths
- In CGO
, 2004
"... Application-specific instruction set processors (ASIPs) have the potential to meet the challenging cost, performance, and power goals of future embedded processors by customizing the hardware to suit an application. A central problem is creating compilers that are capable of dealing with the heterog ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Application-specific instruction set processors (ASIPs) have the potential to meet the challenging cost, performance, and power goals of future embedded processors by customizing the hardware to suit an application. A central problem is creating compilers that are capable of dealing with the heterogeneous and non-uniform hardware created by the customization process. The processor datapath provides an effective area to customize, but specialized datapaths often have non-uniform connectivity between the function units, making the effective latency of a function unit dependent on the consuming operation. Traditional instruction schedulers break down in this environment due to their locally greedy nature of binding the best choice for a single operation even though that choice may be poor due to a lack of communication paths. To effectively schedule with non-uniform connectivity, we propose a foresighted latencyaware scheduling heuristic (FLASH) that performs lookahead across future scheduling steps to estimate the effects of a potential binding. FLASH combines a set of lookahead heuristics to achieve effective foresight with low compiletime overhead. 1.
Automatically Constructing Compiler Optimization Heuristics Using Supervised Learning
, 2004
"... This dissertation is dedicated to my mom, Maria, whose love and support made it possible. ACKNOWLEDGMENTS Eliot Moss has been a great thesis advisor. He has helped me to become a better re-searcher by shaping my critical thinking as well as by improving my expressive skills. I would like to thank th ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
This dissertation is dedicated to my mom, Maria, whose love and support made it possible. ACKNOWLEDGMENTS Eliot Moss has been a great thesis advisor. He has helped me to become a better re-searcher by shaping my critical thinking as well as by improving my expressive skills. I would like to thank the members of my thesis committee, Andy Barto, Emery Berger, and Wayne Burleson for their feedback and advice that helped to improve the overall quality of this dissertation. I gratefully acknowledge the friendships and interactions from all members of the Ar-chitecture and Language Implementation group (ALI). Beginning with my first lab meeting talk, I have received helpful feedback on the best way to present myself and my work. The ongoing discussions in the lab helped to stimulate my research. Thanks especially to M. Tyler Maxwell for some of the amazing diagrams in this dissertation. Robbie Moll was helpful at stimulating my research interests in the applications of machine learning and for believing in me as an instructor. I especially would like to acknowledge Emmanuel Agu, who has been a good friend and with whom I have had many rewarding discussions on research and life. Finally, I am extremely grateful for the love and support of my entire family. Overall, I am extremely lucky to be part of such a close and wonderful family. I would like to express my sincerest gratitude to my mother, Maria. As a young child I remember my mother always telling me that I could accomplish anything that I set my mind to. She was right as always. Her confidence in me gave me the strength both to overcome any difficulties and to maintain high goals. This work was supported by National Physical Science Consortium and Lawrence Liv-ermore National Laboratory.
Exploring Energy-Performance Trade-offs for Heterogeneous Interconnect Clustered VLIW Processors
- In Proc. of Intl. Conf. on High Performance Computing
, 2005
"... Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption. Although clustering helps by improving clock speed, reducing energy consumption of the logic, and making design ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption. Although clustering helps by improving clock speed, reducing energy consumption of the logic, and making design simpler, it introduces extra overheads by way of inter-cluster communication. This communication happens over long global wires having high load capacitance which leads to delay in execution and significantly high energy consumption. Technological advancements permit design of a variety of clustered architectures by varying the degree of clustering and the type of interconnects. In this paper, we focus on exploring energy performance trade-offs in going from a unified VLIW architecture to different types of clustered VLIW architectures. We propose a new instruction scheduling algorithm that exploits scheduling slacks of instructions and communication slacks of data values together to achieve better energy-performance trade-offs for clustered architectures. Our instruction scheduling algorithm for clustered architectures with heterogeneous interconnect achieves 35 % and 40 % reduction in communication energy, whereas the overall energy-delay product improves by 4.5 % and 6.5 % respectively for 2 cluster and 4 cluster machines with marginal 1.6 % and 1.1 % increase in execution time. Our test bed uses the Trimaran compiler infrastructure. 1 1.
Backtracking-based Instruction Scheduling To Fill Branch Delay Slots
"... Conventional schedulers schedule operations in dependence order and never revisit or undo a scheduling decision on any operation. In contrast, backtracking schedulers may unschedule operations and can often generate better schedules. This paper develops and evaluates the backtracking approach to fil ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Conventional schedulers schedule operations in dependence order and never revisit or undo a scheduling decision on any operation. In contrast, backtracking schedulers may unschedule operations and can often generate better schedules. This paper develops and evaluates the backtracking approach to fill branch delay slots. We first present the structure of a generic backtracking scheduling algorithm and prove that it terminates. We then describe two more aggressive backtracking schedulers and evaluate their effectiveness. The full-backtracking OperBT scheduler enables backtracking for all operations and unschedules already scheduled operations to make space for the current operation. For the SPECint95 benchmark, the OperBT scheduler increases the percentage of superblocks scheduled optimally over a conventional non-backtracking scheduler from a geometric mean of 66.9 % to 81.4%, an increase of 21.7%. The selective backtracking ListBT scheduler enables backtracking only when scheduling certain types of operations for which backtracking is likely to be advantageous. This hybrid scheduler is almost as good as the OperBT scheduler in terms of generated schedule length but backtracks about four times less often. We conclude that aggressive backtracking-based instruction schedulers can effectively improve schedule quality by eliminating branch delay slots with a small amount of additional computation.
Inter-Block Scoreboard Scheduling in a JIT Compiler for VLIW Processors
"... Abstract. We present a postpass instruction scheduling technique suitable for Just-In-Time (JIT) compilers targeted to VLIW processors. Its key features are: reduced compilation time and memory requirements; satisfaction of scheduling constraints along all program paths; and the ability to preserve ..."
Abstract
- Add to MetaCart
Abstract. We present a postpass instruction scheduling technique suitable for Just-In-Time (JIT) compilers targeted to VLIW processors. Its key features are: reduced compilation time and memory requirements; satisfaction of scheduling constraints along all program paths; and the ability to preserve existing prepass schedules, including software pipelines. This is achieved by combining two ideas: instruction scheduling similar to the dynamic scheduler of an out-of-order superscalar processor; the satisfaction of inter-block scheduling constraints by propagating them across the control-flow graph until fixed-point. We implemented this technique in a Common Language Infrastructure JIT compiler for the ST200 VLIW processors and the ARM processors. 1

