Results 1 
8 of
8
Exploiting hardware performance counters with flow and context sensitive profiling
 ACM Sigplan Notices
, 1997
"... A program pro le attributes runtime costs to portions of a program's execution. Most pro ling systems su er from two major de ciencies: rst, they only apportion simple metrics, such as execution frequency or elapsed time to static, syntactic units, such as procedures or statements; second, they agg ..."
Abstract

Cited by 203 (9 self)
 Add to MetaCart
A program pro le attributes runtime costs to portions of a program's execution. Most pro ling systems su er from two major de ciencies: rst, they only apportion simple metrics, such as execution frequency or elapsed time to static, syntactic units, such as procedures or statements; second, they aggressively reduce the volume of information collected and reported, although aggregation can hide striking di erences in program behavior. This paper addresses both concerns by exploiting the hardware counters available in most modern processors and by incorporating two concepts from data ow analysis { ow and context sensitivity{to report more context for measurements. This paper extends our previous work on e cient path pro ling to ow sensitive pro ling, which associates hardware performance metrics with a path through a procedure. In addition, it describes a data structure, the calling context tree, that e ciently captures calling contexts for procedurelevel measurements. Our measurements show that the SPEC95 benchmarks execute a small number (3{28) of hot paths that account for 9{98 % of their L1 data cache misses. Moreover, these hot paths are concentrated in a few routines, which have complex dynamic behavior. 1
Lazy Code Motion
, 1992
"... We present a bitvector algorithm for the optimal and economical placement of computations within flow graphs, which is as efficient as standard unidirectional analyses. The point of our algorithm is the decomposition of the bidirectional structure of the known placement algorithms into a sequenc ..."
Abstract

Cited by 157 (20 self)
 Add to MetaCart
We present a bitvector algorithm for the optimal and economical placement of computations within flow graphs, which is as efficient as standard unidirectional analyses. The point of our algorithm is the decomposition of the bidirectional structure of the known placement algorithms into a sequence of a backward and a forward analysis, which directly implies the efficiency result. Moreover, the new compositional structure opens the algorithm for modification: two further unidirectional analysis components exclude any unnecessary code motion. This laziness of our algorithm minimizes the register pressure, which has drastic effects on the runtime behaviour of the optimized programs in practice, where an economical use of registers is essential.
The Interprocedural Coincidence Theorem
 In Int. Conf. on Comp. Construct
, 1992
"... We present an interprocedural generalization of the wellknown (intraprocedural) Coincidence Theorem of Kam and Ullman, which provides a sufficient condition for the equivalence of the meet over all paths (MOP ) solution and the maximal fixed point (MFP ) solution to a data flow analysis problem. Th ..."
Abstract

Cited by 87 (11 self)
 Add to MetaCart
We present an interprocedural generalization of the wellknown (intraprocedural) Coincidence Theorem of Kam and Ullman, which provides a sufficient condition for the equivalence of the meet over all paths (MOP ) solution and the maximal fixed point (MFP ) solution to a data flow analysis problem. This generalization covers arbitrary imperative programs with recursive procedures, global and local variables, and formal value parameters. In the absence of procedures, it reduces to the classical intraprocedural version. In particular, our stackbased approach generalizes the coincidence theorems of Barth and Sharir/Pnueli for the same setup, which do not properly deal with local variables of recursive procedures. 1 Motivation Data flow analysis is a classical method for the static analysis of programs that supports the generation of efficient object code by "optimizing" compilers (cf. [He, MJ]). For imperative languages, it provides information about the program states that may occur at s...
Parallelism for Free: Efficient and Optimal Bitvector Analyses for Parallel Programs
, 1994
"... In this paper we show how to construct optimal bitvector analysis algorithms for parallel programs with shared memory that are as efficient as their purely sequential counterparts, and which can easily be implemented. Whereas the complexity result is rather obvious, our optimality result is a conseq ..."
Abstract

Cited by 46 (3 self)
 Add to MetaCart
In this paper we show how to construct optimal bitvector analysis algorithms for parallel programs with shared memory that are as efficient as their purely sequential counterparts, and which can easily be implemented. Whereas the complexity result is rather obvious, our optimality result is a consequence of a new Kam/Ullmanstyle Coincidence Theorem. Thus, the important merits of sequential bitvector analyses survive the introduction of parallel statements. Keywords Parallelism, interleaving semantics, synchronization, program optimization, data flow analysis, bitvector problems, definitionuse chains, partial redundancy elimination, partial dead code elimination. Contents 1 Motivation 1 2 Sequential Programs 2 2.1 Representation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2 2.2 Data Flow Analysis : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2 2.2.1 The MOPSolution of a DFA : : : : : : : : : : : : : : : : : : : : : : 2 2.2.2 The MFPSolution o...
Towards a Tool Kit for the Automatic Generation of Interprocedural Data Flow Analyses
, 1996
"... this article, the classical application of DFA. In this context, designers of a DFA are typically faced with the problem of how to construct an algorithm that determines the set of program points of an argument program which satisfy a certain property of interest. Though this problem has been studie ..."
Abstract

Cited by 15 (5 self)
 Add to MetaCart
this article, the classical application of DFA. In this context, designers of a DFA are typically faced with the problem of how to construct an algorithm that determines the set of program points of an argument program which satisfy a certain property of interest. Though this problem has been studied in detail for the intraprocedural case, the construction of interprocedural analyses is still
Interprocedural Analyses of Fortran Programs
 Parallel Computing
, 1997
"... Interprocedural analyses (IPA) are becoming more and more common in commercial compilers. But research on the analysis of Fortran programs is still going on, as a number of problems are not yet satisfactorily solved and others are emerging with new language dialects. This paper presents a survey ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Interprocedural analyses (IPA) are becoming more and more common in commercial compilers. But research on the analysis of Fortran programs is still going on, as a number of problems are not yet satisfactorily solved and others are emerging with new language dialects. This paper presents a survey of the main interprocedural analysis techniques, with an emphasis on the suitability of the analysis framework for the characteristics of the original semantic problem. Our experience with the pips interprocedural compiler workbench is then described. pips includes a makelike mechanism, PipsMake, which takes care of the interleavings between topdown and bottomup analyses and allows a quick prototyping of new interprocedural analyses. Intensive summarization is used to reduce storage requirements and achieve reasonable analysis times when dealing with reallife applications. The speed/accuracy tradeoffs made for pips are discussed in the light of other interprocedural tools. Key ...
Optimization of Data Remapping in DataParallel Languages
, 1998
"... The usercontrolled mapping of data across the local memories of processing nodes is one of the central features of dataparallel languages like High Performance Fortran (HPF). Since many scientific applications typically consist of different computational phases owning each a best data mapping, dyn ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
The usercontrolled mapping of data across the local memories of processing nodes is one of the central features of dataparallel languages like High Performance Fortran (HPF). Since many scientific applications typically consist of different computational phases owning each a best data mapping, dynamic remappings have proven useful in maintaining good data locality and workload balance. HPF supports remappings by procedure calls and by executing redistribute/realigndirectives. But remappings can be quite expensive as communication is required to migrate the array elements to their new owning processors and can significantly degrade a program's performance. Hence, elimination of unnecessary remappings is of key importance. It is essential because even wellwritten HPF programs may result in unnecessary remappings. In this thesis we optimize the overall time spent for dynamic data remappings in a program run by reducing the number of executed remappings. Elimination of redundant and d...