Results 1 -
9 of
9
Software profiling for hot path prediction: less is more
- SIGPLAN Not
"... Recently, there has been a growing interest in exploiting profile information in adaptive systems such as just-in-time compilers, dynamic optimizers and, binary translators. In this paper, we show that sophisticated software profiling schemes that provide highly accurate information in an offline se ..."
Abstract
-
Cited by 61 (0 self)
- Add to MetaCart
Recently, there has been a growing interest in exploiting profile information in adaptive systems such as just-in-time compilers, dynamic optimizers and, binary translators. In this paper, we show that sophisticated software profiling schemes that provide highly accurate information in an offline setting are ill-suited for these dynamic code generation systems. We experimentally demonstrate that hot path predictions must be made early in order to control the rising cost of missed opportunity that result from the prediction delay. We also show that existing sophisticated path profiling schemes, if used in an online setting, offer no prediction advantages over simpler schemes that exhibit much lower runtime overheads. Based on these observation we developed a new low-overhead software profiling scheme for hot path prediction. Using an abstract metric we compare our scheme to path profile based prediction and show that our scheme achieves comparable prediction quality. In our second set of experiments we include runtime overhead and evaluate the performance of our scheme in a realistic application: Dynamo, a dynamic optimization system. The results show that our prediction scheme clearly outperforms path profile based prediction and thus confirm that less profiling as exhibited in our scheme will actually lead to more effective hot path prediction. 1.
Context-Aware Statistical Debugging: From Bug Predictors to Faulty Control Flow Paths
- ASE'07
, 2007
"... Effective bug localization is important for realizing automated debugging. One attractive approach is to apply statistical techniques on a collection of evaluation profiles of program properties to help localize bugs. Previous research has proposed various specialized techniques to isolate certain p ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
Effective bug localization is important for realizing automated debugging. One attractive approach is to apply statistical techniques on a collection of evaluation profiles of program properties to help localize bugs. Previous research has proposed various specialized techniques to isolate certain program predicates as bug predictors. However, because many bugs may not be directly associated with these predicates, these techniques are often ineffective in localizing bugs. Relevant control flow paths that may contain bug locations are more informative than stand-alone predicates for discovering and understanding bugs. In this paper, we propose an approach to automatically generate such faulty control flow paths that link many bug predictors together for revealing bugs. Our approach combines feature selection (to accurately select failure-related predicates as bug predictors), clustering (to group correlated predicates), and control flow graph traversal in a novel way to help generate the paths. We have evaluated our approach on code including the Siemens test suite and rhythmbox (a large music management application for GNOME). Our experiments show that the faulty control flow paths are accurate, useful for localizing many bugs, and helped to discover previously unknown errors in rhythmbox.
Boolean Formula-based Branch Prediction for Future Technologies
- In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques
, 2001
"... Accurate branch prediction is essential to sustaining the performance of deeply pipelined wide-issue microarchitectures. However, as clock rates increase and feature sizes decrease, wire delay severely restricts the size of branch prediction tables. In this paper, we present a new method for branch ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Accurate branch prediction is essential to sustaining the performance of deeply pipelined wide-issue microarchitectures. However, as clock rates increase and feature sizes decrease, wire delay severely restricts the size of branch prediction tables. In this paper, we present a new method for branch prediction that encodes in the branch instruction a formula, chosen by profiling, that is used to perform history-based branch prediction. Our method replaces the large table of current dynamic branch predictors with a small and fast circuit that interprets the encoded Boolean formula. By using a special class of Boolean formulas, our encoding is extremely concise. Moreover, the prediction accuracy of our method outperforms dynamic schemes. In current technologies, the accuracy of our predictor matches or exceeds that of dynamic schemes. In a projected 70 nm technology and an aggressive clock rate of about 5 GHz, a modest implementation of our method that uses an 8-bit formula encoding has a...
A Novel Probabilistic Data Flow Framework
- In International Conference on Compiler Construction (CC 2001), Lecture Notes in Computer Science (LNCS
"... . Classical data flow analysis determines whether a data flow fact may hold or does not hold at some program point. Probabilistic data flow systems compute a range, i.e. a probability, with which a data flow fact will hold at some program point. In this paper we develop a novel, practicable fram ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
. Classical data flow analysis determines whether a data flow fact may hold or does not hold at some program point. Probabilistic data flow systems compute a range, i.e. a probability, with which a data flow fact will hold at some program point. In this paper we develop a novel, practicable framework for probabilistic data flow problems. In contrast to other approaches, we utilize execution history for calculating the probabilities of data flow facts. In this way we achieve significantly better results. Effectiveness and efficiency of our approach are shown by compiling and running the SPECint95 benchmark suite. 1 Introduction Classical data flow analysis determines whether a data flow fact may hold or does not hold at some program point. For generating highly optimized code, however, it is often necessary to know the probability with which a data flow fact will hold during program execution (cf. [10, 11]). In probabilistic data flow systems control flow graphs annotated with...
Performance of runtime optimization on blast
- In Proceedings of the international symposium on Code generation and optimization
, 2005
"... ..."
Overcoming the Challenges to Feedback-Directed Optimization
"... Feedback-directed optimization (FDO) is a general term used to describe any technique that alters a program ~ execution based on tendencies observed in its present or past runs. This paper reviews the current state of affairs in FDO and discusses the challenges inhibiting further acceptance of these ..."
Abstract
- Add to MetaCart
Feedback-directed optimization (FDO) is a general term used to describe any technique that alters a program ~ execution based on tendencies observed in its present or past runs. This paper reviews the current state of affairs in FDO and discusses the challenges inhibiting further acceptance of these techniques. It also argues that current trends in hardware and software technology have resulted in an execution environment where immutable executables and traditional static optimizations are no longer sufficient. It explains how we can improve the effectiveness of our optimizers by increasing our understanding of program behavior, and it provides examples of temporal behavior that we can (or could in the future) exploit during optimization. 1
Boolean Formula-based Branch Prediction for Future Technologies
, 2001
"... We present a new method for branch prediction that encodes in the branch instruction a formula, chosen by profiling, that is used to perform history-based prediction. By using a special class of Boolean formulas, our encoding is extremely concise. By replacing the large tables found in current predi ..."
Abstract
- Add to MetaCart
We present a new method for branch prediction that encodes in the branch instruction a formula, chosen by profiling, that is used to perform history-based prediction. By using a special class of Boolean formulas, our encoding is extremely concise. By replacing the large tables found in current predictors with a small, fast circuit, our scheme is ideally suited to future technologies that will have large wire delays. In a projected 70 nm technology and an aggressive clock rate of about 5 GHz, an implementation of our method that uses an 8-bit formula encoding has a misprediction rate of 6.0%, 42% lower than that of the best gshare predictor implementable in that same technology. In today's technology, a 16-bit version of our predictor can replace bias bits in an 8K-entry agree predictor to achieve a 2.86% misprediction rate, which is slightly lower than the 2.93% misprediction rate of the Alpha 21264 hybrid predictor, even though the Alpha predictor has almost twice the hardware budget. Our predictor also consumes much less power than table-based predictors. This paper describes our predictor, explains our profiling algorithm, and presents experimental results using the SPEC 2000 integer benchmarks. 1
Speeding Up Control-Dominated Applications through Microarchitectural Customizations in Embedded Processors
, 2001
"... We present a methodology for microarchitectural customization of embedded processors by exploiting application information, thus attaining the twin benefits of processor standardization and applicationspecific customization. Such powerful techniques enable increased application fragments to be place ..."
Abstract
- Add to MetaCart
We present a methodology for microarchitectural customization of embedded processors by exploiting application information, thus attaining the twin benefits of processor standardization and applicationspecific customization. Such powerful techniques enable increased application fragments to be placed on the processor, with no sacrifice in system requirements, thus reducing the custom hardware and the concomitant area requirements in SOCs. We illustrate these ideas through the branch resolution problem, known to impose severe performance degradation on control-dominated embedded applications. A low-cost late customizable hardware that uses application information to fold out a set of frequently executed branches is described. Experimental results show that for a representative set of control dominated applications a reduction in the range of 7%-22% in processor cycles can be achieved, thus extending the scope of low-cost embedded processors in complex co-designs for control intensive systems.
The HALT Library
"... HALT, the Harvard Atom-Like Tool, is a library used for studying program behavior and the performance of computer hardware. HALT works by instrumentation, i.e., by mechanically changing a program's code so that it collects information about its own operation. You can use it to profile a program, ..."
Abstract
- Add to MetaCart
HALT, the Harvard Atom-Like Tool, is a library used for studying program behavior and the performance of computer hardware. HALT works by instrumentation, i.e., by mechanically changing a program's code so that it collects information about its own operation. You can use it to profile a program, e.g., to fugure out which parts of the code consume most of the running time. But you can also use HALT to investigate how e#ciently a memory caching scheme supports the memory behavior of a program when applied to a range of benchmarks.

