Results 1 - 10
of
10
Performance, Measurement
"... This paper presents a finding and a technique on program behavior prediction. The finding is that surprisingly strong statistical correlations exist among the behaviors of different program components (e.g., loops) and among different types of programlevel behaviors (e.g., loop trip-counts versus da ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper presents a finding and a technique on program behavior prediction. The finding is that surprisingly strong statistical correlations exist among the behaviors of different program components (e.g., loops) and among different types of programlevel behaviors (e.g., loop trip-counts versus data values). Furthermore, the correlations can be beneficially exploited: They help resolve the proactivity-adaptivity dilemma faced by existing program behavior predictions, making it possible to gain the strengths of both approaches—the large scope and earliness of offline-profiling–based predictions, and the cross-input adaptivity of runtime sampling-based predictions. The main technique contributed by this paper centers on a new concept, seminal behaviors. Enlightened by the existence of strong correlations among program behaviors, we propose a regressionbased framework to automatically identify a small set of behaviors that can lead to accurate prediction of other behaviors in a program. We call these seminal behaviors. By applying statistical learning techniques, the framework constructs predictive models that map from seminal behaviors to other behaviors, enabling proactive and cross-input adaptive prediction of program behaviors. The prediction helps a commercial compiler, the IBM XL C compiler, generate code that runs up to 45 % faster (5%–13 % on average), demonstrating the large potential of correlation-based techniques for program optimizations.
Applying support vector machines to discover method-specific compilation strategies
, 2010
"... Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author’s prior written permission. Examining Committee
U N I V E R
"... Parallel Computing has become pervasive and the number of processors placed in computers will further increase in the future. However, software developers are struggling to efficiently exploit the computational resources provided by parallel architectures. It is thus inevitable to investigate the be ..."
Abstract
- Add to MetaCart
Parallel Computing has become pervasive and the number of processors placed in computers will further increase in the future. However, software developers are struggling to efficiently exploit the computational resources provided by parallel architectures. It is thus inevitable to investigate the behaviour of parallel programs and develop methods that help improving their performance. The software developer should not have to worry about how to map parallelism to the underlying architecture, but should instead concentrate on exposing the parallelism and leave the mapping task to the runtime system. In this project, the behaviour of parallel programs in the presence of workload is investigated. It is shown that choosing the right number of threads for an application is crucial to achieve the best performance possible when there is other workload running on the system. The default policy of creating as many threads as there are cores is rarely optimal in this situation and using the optimal number of threads reduces the runtime by 22.5 % on average w.r.t. the default policy. Determining the optimal number
Using Support Vector Machines to Learn How to Compile a Method
"... The question addressed in this paper is what subset of code transformations should be attempted for a given method in a Just-in-Time compilation environment. The solution proposed is to use a Support Vector Machine (SVM) to learn a model based on method features and on the measured compilation and e ..."
Abstract
- Add to MetaCart
The question addressed in this paper is what subset of code transformations should be attempted for a given method in a Just-in-Time compilation environment. The solution proposed is to use a Support Vector Machine (SVM) to learn a model based on method features and on the measured compilation and execution times of the methods. An extensive exploration phase collects a set of example compilations to be used by the SVM to train the model. This paper reports on a work in progress. So far, linear-SVM models, applied to benchmarks from the SPECjvm98 suite, have not outperformed the compilation plans engineered by the development team over many years. However the models almost match that performance for the javac benchmark. 1
Automatic Selection of Machine Learning Models for WCET-aware Compiler Heuristic Generation ⋆
"... Abstract. Machine learning has shown its capabilities for an automatic generation of heuristics used by optimizing compilers. The advantages of these heuristics are that they can be easily adopted to a new environment and in some cases outperform hand-crafted compiler optimizations. However, this ap ..."
Abstract
- Add to MetaCart
Abstract. Machine learning has shown its capabilities for an automatic generation of heuristics used by optimizing compilers. The advantages of these heuristics are that they can be easily adopted to a new environment and in some cases outperform hand-crafted compiler optimizations. However, this approach shifts the effort from manual heuristic tuning to the model selection problem of machine learning – i.e., selecting learning algorithms and their respective parameters – which is a tedious task in its own right. In this paper, we tackle the model selection problem in a systematic way. As our experiments show, the right choice of a learning algorithm and its parameters can significantly affect the quality of the generated heuristics. We present a generic framework integrating machine learning into a compiler to enable an automatic search for the best learning algorithm. To find good settings for the learner parameters within the large search space, optimizations based on evolutionary algorithms are applied. In contrast to the majority of other approaches aiming at a reduction of the average-case execution time (ACET), our goal is the minimization of the worst-case execution time (WCET) which is a key parameter for embedded systems acting as real-time systems. A careful case study on the heuristic generation for the well-known optimization loop invariant code motion shows the challenges and benefits of our methods. 1
A Step Towards Transparent Integration of Input-Consciousness into Dynamic Program Optimizations
"... Dynamic program optimizations are critical for the efficiency of applications in managed programming languages and scripting languages. Recent studies have shown that exploitation of program inputs may enhance the effectiveness of dynamic optimizations significantly. However, current solutions for e ..."
Abstract
- Add to MetaCart
Dynamic program optimizations are critical for the efficiency of applications in managed programming languages and scripting languages. Recent studies have shown that exploitation of program inputs may enhance the effectiveness of dynamic optimizations significantly. However, current solutions for enabling the exploitation require either programmers’ annotations or intensive offline profiling, impairing the practical adoption of the techniques. This current work examines the basic feasibility of transparent integration of input-consciousness into dynamic program optimizations, particularly in managed execution environments. It uses transparent learning across production runs as the basic vehicle, and investigates the implications of cross-run learning on each main component of inputconscious
Keshav PingaliAtomic Block Formation for Explicit Data Graph Execution Architectures
"... I would like to thank the numerous people who have helped me along the way to completing this dissertation. I thank my committee, Kathryn McKinley, Doug Burger, Steve Keckler, Keshav Pingali, and Scott Mahlke, for their valuable feedback on my research. Their insight has unquestionably improved this ..."
Abstract
- Add to MetaCart
I would like to thank the numerous people who have helped me along the way to completing this dissertation. I thank my committee, Kathryn McKinley, Doug Burger, Steve Keckler, Keshav Pingali, and Scott Mahlke, for their valuable feedback on my research. Their insight has unquestionably improved this dissertation. I am particularly indebted to my advisors, Doug Burger and Kathryn McKinley, for their supervision of this work. I count myself lucky to have had scientists of their caliber to oversee my research. Doug and Kathryn have always challenged me to improve my ideas, yet also encouraged me that those ideas are worth pursuing. I hope that my career in computer science will make them proud. I thank Steve Keckler for his leadership of the TRIPS project along with Doug and Kathryn. My career thus far has been enriched by working on such an ambitious research project as a graduate student. I thank the TRIPS team without whom this work would have been impossible.
U N I V E R S I T
"... Parallel skeletons are a structured approach to parallel programming that provide programmers with a predefined set of algorithmic templates that can be combined, nested and parametrised with sequential code to produce complex programs. The implementation of these skeletons is currently a manual pro ..."
Abstract
- Add to MetaCart
Parallel skeletons are a structured approach to parallel programming that provide programmers with a predefined set of algorithmic templates that can be combined, nested and parametrised with sequential code to produce complex programs. The implementation of these skeletons is currently a manual process, requiring human expertise to choose suitable implementation parameters that provide good performance. This project explores the effectiveness and applicability of automatic techniques to tune six implementation parameters in the FastFlow parallel skeleton framework. The results demonstrate that there is scope for improving program performance over human expert chosen parameters, with an average speedup of 1.6 × across five platforms and ten programs. Exploratory data analysis on the results identifies a linear dependence between two of the parameters, and that another two parameters have little effect on performance. The applicability of a range of stochastic search techniques to the optimisation space is also explored. These include Hill-Climbing, Simulated Annealing and a Genetic Algorithm. The results show that a speedup of at least 1.5×
ThreadMarks: A Framework for Input-Aware Prediction of Parallel Application Behavior
"... Chip-multiprocessors (CMPs) are quickly becoming entrenched as the main-stream architectural platform in computer systems. One of the critical challenges facing CMPs is designing applications to effectively leverage the computational resources they provide. Modifying applications to effectively run ..."
Abstract
- Add to MetaCart
Chip-multiprocessors (CMPs) are quickly becoming entrenched as the main-stream architectural platform in computer systems. One of the critical challenges facing CMPs is designing applications to effectively leverage the computational resources they provide. Modifying applications to effectively run on CMPs requires understanding the bottlenecks in applications, which necessitates a detailed understanding of architectural features. Unfortunately, identifying bottlenecks is complex and often requires enumerating a wide range of behaviors. To assist in identifying bottlenecks, this paper presents a framework for developing analytical models based on dynamic program behaviors. That is, given a program and set of training inputs, the framework will generate several analytical models that accurately predict online program behaviors such as memory utilization and synchronization overhead, while taking program input into consideration. These models can prove invaluable for online optimization systems and input-specific analysis of program behavior. We demonstrate that this framework is practical and accurate on a wide range of synthetic and real-world parallel applications over various workloads. 1
Approximate Graph Clustering for Program Characterization
, 2012
"... An important aspect of system optimization research is the discovery of program traits or behaviors. In this paper, we present an automated method of program characterization which is able to examine and cluster program graphs, i.e., dynamic data graphs or control flow graphs. Our novel approximate ..."
Abstract
- Add to MetaCart
An important aspect of system optimization research is the discovery of program traits or behaviors. In this paper, we present an automated method of program characterization which is able to examine and cluster program graphs, i.e., dynamic data graphs or control flow graphs. Our novel approximate graph clustering technology allows users to find groups of program fragments which contain similar code idioms or patterns in data reuse, control flow, and context. Patterns of this nature have several potential applications including development of new static or dynamic optimizations to be implemented in software or in hardware. For the SPEC CPU 2006 suite of benchmarks, our results show that approximate graph clustering is effective at grouping behaviorally similar functions. Graph based clustering also produces clusters that are more homogeneous than previously proposed non-graph based clustering methods. Further qualitative analysis of the clustered functions shows that our approach is also able to identify some frequent unexploited program behaviors. These results suggest that our approximate graph clustering methods could be very useful for program characterization.

