Results 1 - 10
of
10
Fast and efficient searches for effective optimization-phase sequences
- ACM Trans. Archit. Code Optim
"... It has long been known that a fixed ordering of optimization phases will not produce the best code for every application. One approach for addressing this phase-ordering problem is to use an evolutionary algorithm to search for a specific sequence of phases for each module or function. While such se ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
It has long been known that a fixed ordering of optimization phases will not produce the best code for every application. One approach for addressing this phase-ordering problem is to use an evolutionary algorithm to search for a specific sequence of phases for each module or function. While such searches have been shown to produce more efficient code, the approach can be extremely slow because the application is compiled and possibly executed to evaluate each sequence’s effectiveness. Consequently, evolutionary or iterative compilation schemes have been promoted for compilation systems targeting embedded applications where meeting strict constraints on execution time, code size, and power consumption is paramount and longer compilation times may be tolerated in the final stage of development, when an application is compiled one last time and embedded in a product. Unfortunately, even for small embedded applications, the search process can take many hours or even days making the approach less attractive to developers. In this paper, we describe two complementary general approaches for achieving faster searches for effective optimization sequences when using a genetic algorithm. The first approach reduces the search time by avoiding unnecessary executions of the application when possible. Results indicate search time reductions of 62%, on average, often reducing searches from hours to minutes. The second approach
Automatic Validation of Code-Improving Transformations
- In Proceedings of the ACM SIGPLAN Workshop on Language, Compilers, and Tools for Embedded Systems
, 2000
"... Programmers of embedded systems often develop software in assembly code due to inadequate support from compilers and the need to meet critical speed and/or space constraints. Many embedded applications are being used as a component of an increasing number of critical systems. While achieving high pe ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
Programmers of embedded systems often develop software in assembly code due to inadequate support from compilers and the need to meet critical speed and/or space constraints. Many embedded applications are being used as a component of an increasing number of critical systems. While achieving high performance for these systems is important, ensuring that these systems execute correctly is vital. One portion of this process is to ensure that code-improving transformations applied to a program will not change the program's semantic behavior, which is jeopardized when transformations are specified manually. This paper describes a general approach for validation of many low-level code-improving transformations made either by a compiler or specified by hand. Initially, we associate a region of the program representation with a code-improving transformation. Afterwards, we calculate the region's effects on the rest of the program before and after the transformation. The transformation is cons...
LANCET: A Nifty Code Editing Tool
, 2005
"... This paper presents Lancet, a multi-platform software visualization tool that enables the inspection of programs at the binary code level. Implemented on top of the linktime rewriting framework Diablo, Lancet provides several views on the interprocedural control flow graph of a program. These views ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
This paper presents Lancet, a multi-platform software visualization tool that enables the inspection of programs at the binary code level. Implemented on top of the linktime rewriting framework Diablo, Lancet provides several views on the interprocedural control flow graph of a program. These views can be used to navigate through the program, to edit the program in a e#cient manner, and to interact with the existing whole-program analyses and optimizations that are implemented in Diablo or existing applications of Diablo. As such, Lancet is an ideal tool to examine compiler-generated code, to assist the development of new compiler optimizations, or to optimize assembly code manually.
On the impact of data input sets on statistical compiler tuning
- In Proceedings of the Workshop on Performance Optimization for High-Level Languages and Libraries (POHLL
, 2006
"... In recent years, several approaches have been proposed to use profile information in compiler optimization. This profile information can be used at the source level to guide loop transformations as well as in the backend to guide low level optimizations. At the same time, profile guided library gene ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In recent years, several approaches have been proposed to use profile information in compiler optimization. This profile information can be used at the source level to guide loop transformations as well as in the backend to guide low level optimizations. At the same time, profile guided library generators have been proposed also, like Atlas, Spiral, or FFTW, that tune their routines for the underlying hardware. These approaches have led to excellent performance improvements. However, a possible drawback of these approaches is that applications are optimized using a single or a limited set of data inputs. It is well known that programs can exhibit vastly differing behaviors for different inputs. Therefore, it is not clear whether the performance numbers reported are still valid for other input than the input used to optimize the program. In this paper, we address this problem for a specific statistical compiler tuning method. We use three different platforms and several SPECint2000 benchmarks. We show that when we tune the compiler using train data, we obtain a compiler setting that still performs well for reference data. These results suggest that profile guided optimization may be more stable than is sometimes believed and that a limited number of train data sets is sufficient to obtain a well optimized program for all inputs. 1
Using de-optimization to re-optimize code
- In Proceedings of the EMSOFT Conference
, 2004
"... ii To Mom, Dad, and Frank... iii ACKNOWLEDGMENTS I am very grateful for the help of my advisor, Dr. David Whalley. Without you, this thesis would not have been possible. Thank you for believing in me, as well as inspiring me to work hard to achieve my goals. I would also like to thank the other memb ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
ii To Mom, Dad, and Frank... iii ACKNOWLEDGMENTS I am very grateful for the help of my advisor, Dr. David Whalley. Without you, this thesis would not have been possible. Thank you for believing in me, as well as inspiring me to work hard to achieve my goals. I would also like to thank the other members of the Compilers Group (Prasad Kulkarni, Bill Kreahling, Clint Whaley, Wankang Zhao) for their assistance. This work would have been extraordinarily difficult without your insight and your friendship. I would like to extend a big thanks to my family and friends for their unwavering love and support. You may not understand all of the complexities involved in my research, but I certainly learned that you are always willing to listen to me. I am truly blessed to have each of you in my life.
Improving WCET by Applying Worst-Case Path Optimizations ∗
"... It is advantageous to perform compiler optimizations that attempt to lower the worst-case execution time (WCET) of an embedded application since tasks with lower WCETs are easier to schedule and more likely to meet their deadlines. Compiler writers in recent years have used profile information to de ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
It is advantageous to perform compiler optimizations that attempt to lower the worst-case execution time (WCET) of an embedded application since tasks with lower WCETs are easier to schedule and more likely to meet their deadlines. Compiler writers in recent years have used profile information to detect the frequently executed paths in a program and there has been considerable effort to develop compiler optimizations to improve these paths in order to reduce the average-case execution time (ACET). In this paper, we describe an approach to reduce the WCET by adapting and applying optimizations designed for frequent paths to the worst-case (WC) paths in an application. Instead of profiling to find the frequent paths, our WCET path optimization uses feedback from a timing analyzer to detect the WC paths in a function. Since these path-based optimizations may increase code size, the subsequent effects on the WCET due to these optimizations is measured to ensure that the worst-case path optimizations actually improve the WCET before committing to a code size increase. We evaluate these WC path optimizations and present results showing the decrease in WCET versus the increase in code size.
Effective Algorithms for Partitioned Memory Hierarchies in Embedded Systems
, 2005
"... Many architectures today, especially embedded systems, have multiple memory partitions, each with potentially different performance and energy characteristics. To meet the strict time-to-market requirements of systems containing these chips, compilers require retar-getable algorithms for effectively ..."
Abstract
- Add to MetaCart
Many architectures today, especially embedded systems, have multiple memory partitions, each with potentially different performance and energy characteristics. To meet the strict time-to-market requirements of systems containing these chips, compilers require retar-getable algorithms for effectively assigning values to the memory partitions. Furthermore, embedded system designers need a methodology for quickly evaluating the performance of a candidate memory hierarchy on an application without relying on time-consuming simulation. This dissertation presents algorithms and techniques to effectively meet these needs. First, EMBARC is presented. EMBARC is the first algorithm to realize a comprehensive, retargetable algorithm for effective partition assignment of variables in an arbitrary memory hierarchy. It supports a wide variety of memory models including on-chip SRAMs, multiple layers of caches, and even uncached DRAM partitions. Even though it is designed to handle a wide range of memory hierarchies, EMBARC is capable of generating partition assignments of similar quality to algorithms designed for specific memory hierarchies. A
Project-Team caps
"... Compilation, architectures des processeurs superscalaires et spécialisés ..."

