• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

An e cient framework for performing execution-constraint-sensitive transformations that increase instruction-level parallelism (1997)

by J C Gyllenhaal
Add To MetaCart

Tools

Sorted by:
Results 1 - 8 of 8

Systematic Compilation For Predicated Execution

by David Isaac August , 2000
"... ... synergistically to realize the potential of predication. The Partial Reverse If-Conversion Framework provides the first compilation framework to accurately balance control and predication, while providing other compiler components with complete access to the predicated code for further optimizat ..."
Abstract - Cited by 14 (2 self) - Add to MetaCart
... synergistically to realize the potential of predication. The Partial Reverse If-Conversion Framework provides the first compilation framework to accurately balance control and predication, while providing other compiler components with complete access to the predicated code for further optimization. Though the full potential of the Partial Reverse If-Conversion Framework remains unexplored, current compiler technology justies its worth. To operate on predicated code, the optimizer, scheduler, and register allocator require accurate information regarding the relationships among predicates. The Predicate Analysis System is the first efficient predicate relationship database to provide an approximation-free representation. Optimization, scheduling, and register allocation also require accurate knowledge of the flow of information in the predicated code. Using the Predicate Analysis System, the Predicate Dataow Graph is built to provide dataflow information. The Predicate Dataflow Graph

Microarchitecture Modeling for Design-Space Exploration Design-Space Exploration

by Manish Vachharajani , 2004
"... To identify the best processor designs, designers explore a vast design space. To assess the quality of candidate designs, designers construct and use simulators. Unfortunately, simulator construction is a bottleneck in this design-space exploration because existing simulator construction methodolog ..."
Abstract - Cited by 9 (2 self) - Add to MetaCart
To identify the best processor designs, designers explore a vast design space. To assess the quality of candidate designs, designers construct and use simulators. Unfortunately, simulator construction is a bottleneck in this design-space exploration because existing simulator construction methodologies lead to long simulator development times. This bottleneck limits exploration to a small set of designs, potentially diminishing quality of the final design.

A Systematic Approach to Delivering INSTRUCTION-LEVEL PARALLELISM IN EPIC SYSTEMS

by John Wollenburg Sias , 2005
"... Computer systems designed under the explicitly parallel instruction computing (EPIC) paradigm rely extensively on compiler technology to deliver the instruction-level parallelism (ILP) required for them to achieve high levels of performance. While manifold techniques have been proposed in the litera ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
Computer systems designed under the explicitly parallel instruction computing (EPIC) paradigm rely extensively on compiler technology to deliver the instruction-level parallelism (ILP) required for them to achieve high levels of performance. While manifold techniques have been proposed in the literature for delivering such parallelism, this dissertation is unique in integrating and applying a comprehensive suite of techniques, embodied in the IMPACT Research Compiler, to a concrete system, comprised of the SPEC CINT2000 benchmarks and the Intel Itanium 2 platform. These techniques include advanced pointer analysis, aggressive cross-file procedure inlining, targeted region formation, profile-guided optimizations, and, most importantly, aggressive and pervasive use of predication and control speculation. The collective effect of these techniques is evaluated with real-system measurements, showing them to achieve a 1.20 average (up to 1.59) speedup relative to classically optimized code and a 1.70 average (up to 2.51) speedup relative to code compiled with the Gnu GCC compiler. Achieving these results in the real-machine environment required advances in region formation heuristics, optimization, and speculation methods. Modern

A framework for profile-driven optimization in the IMPACT binary reoptimization system

by Matthew Carl Merten , 1999
"... 312 teaching abilities, and Dan Lavery for his mentoring efforts in the early days of my IMPACT membership. In addition, I want to thank our corporate research partners. First, thanks to Advanced Micro Devices, in particular Dr. David Christie. The AMD team has provided hardware, software tools, a ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
312 teaching abilities, and Dan Lavery for his mentoring efforts in the early days of my IMPACT membership. In addition, I want to thank our corporate research partners. First, thanks to Advanced Micro Devices, in particular Dr. David Christie. The AMD team has provided hardware, software tools, and invaluable insight into the K6 microarchitecture and into the entire software development process. Thanks to Microsoft for their donation of the Microsoft Developer iii Network tool set and to Microsoft Research for our post-link optimization discussions. And thanks to Hewlett-Packard for my internship opportunity and for general support of the IMPACT research group. Last, I want to thank my family for their constant support. My wife, Polly, is understanding about my long work hours, encourages me when I get discouraged, celebrates with me in my triumphs, and in general, is the loving support that drives me each day. My parents, parentsin -law, and sister are always confident in

Optimization and Executable Regeneration in the Impact Binary Reoptimization Framework

by Michael Stephen Thiems , 1998
"... this memory address, eax is the base register and ebx is the index register. The index register is scaled by a factor of 4, and a displacement of 24 is also present. ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
this memory address, eax is the base register and ebx is the index register. The index register is scaled by a factor of 4, and a displacement of 24 is also present.

An overview of the IMPACT x86 binary reoptimization framework

by Matthew Merten, Michael Thiems , 1998
"... this report is organized as follows: Chapter 2 provides an overview of the x86 binary reoptimization system. Chapter 3 describes our use of IMPACT's low-level intermediate representation. Chapter 4 describes each of the software modules in more detail. Chapter 5 provides a summary and concluding rem ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
this report is organized as follows: Chapter 2 provides an overview of the x86 binary reoptimization system. Chapter 3 describes our use of IMPACT's low-level intermediate representation. Chapter 4 describes each of the software modules in more detail. Chapter 5 provides a summary and concluding remarks.

iii TABLE OF CONTENTS

by Michael Stephen Thiems
"... Many people have provided invaluable support throughout my education, in my research, and in the writing of this thesis. I wish to thank my advisor, Professor Wen-mei Hwu, for his gifted teaching and advice that has helped guide me to where I am and to where I am going in my career. John Gyllenhaal ..."
Abstract - Add to MetaCart
Many people have provided invaluable support throughout my education, in my research, and in the writing of this thesis. I wish to thank my advisor, Professor Wen-mei Hwu, for his gifted teaching and advice that has helped guide me to where I am and to where I am going in my career. John Gyllenhaal served as a mentor and provided many useful suggestions in the development of this system and in the writing of this thesis. Besides developing the IMPACT scheduler manager and machine description technology, he also had a hand in many other parts of the IMPACT infrastructure. Matthew Merten, who developed x86toM and with whom I have spent many hours working over the past year, has also been a great help in working out the many problems in this system. The PEwrite program is based on binary profiling work done by John Sias, Chris George, and Guanyao Cheng. As my cubiclemate, Qudus Olaniran provided many useful insights after listening to both my ideas and my frustrations. Justin Donoho helped build both the original versions of the NT benchmarks and the testing environment. David August provided data flow support, and Dan Connors helped with very practical thesis advice. Many thanks are also due to the past and present members of the IMPACT group, whose excellent compiler infrastructure has made this work possible.

LIST OF TABLES................................................

by Sain-zee Ueng
"... One of the differences between out-of-order and in-order computer architectures is the dispersal of operations to functional units. This responsibility is part of the transfer of power from hardware (out-of-order) to the compiler (in-order). As a result of this shift, the hardware support for operat ..."
Abstract - Add to MetaCart
One of the differences between out-of-order and in-order computer architectures is the dispersal of operations to functional units. This responsibility is part of the transfer of power from hardware (out-of-order) to the compiler (in-order). As a result of this shift, the hardware support for operation dispersal was simplified. IPF, the first hardware implementation of an EPIC in-order architecture, introduced new concepts to further simplify the hardware’s operation dispersal. However, they present additional constraints to the compiler’s scheduler. This thesis presents the template bundling algorithm to extend the IMPACT compiler to handle these new constraints. An exhaustive, systematic exploration over the newly introduced search space is employed to produce a schedule that conforms to the scheduler’s performance expectations while keeping compilation time under control via efficient implementation. iii ACKNOWLEDGMENTS I would first like to thank my adviser, Professor Wen-mei Hwu, for his guidance and support. Many thanks go out to the IMPACT research group members, both past and present, who have contributed to this framework which my work is built upon. Special thanks are due to John Sias, without whose assistance, patience, and insight this work would not have been possible, and Marie Conte, who believed in me and gave me the chance. I would like to thank my parents, my sister, and my family for their support and encouragement. Last, but not least, I would like to thank Dr. Hungwen Li for motivating me towards my graduate studies.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University