Results 1  10
of
30
Analysis of Benchmark Characteristics and Benchmark Performance Prediction
 ACM Transactions on Computer Systems
, 1992
"... Standard benchmarking provides the run times for given programs on given machines, but fails to provide insight as to why those results were obtained (either in terms of machine or program characteristics), and fails to provide run times for that program on some other machine, or some other programs ..."
Abstract

Cited by 111 (4 self)
 Add to MetaCart
Standard benchmarking provides the run times for given programs on given machines, but fails to provide insight as to why those results were obtained (either in terms of machine or program characteristics), and fails to provide run times for that program on some other machine, or some other programs on that machine. We have developed a machineindependent model of program execution to characterize both machine performance and program execution. By merging these machine and program characterizations, we can estimate execution time for arbitrary machine/program combinations. Our technique allows us to identify those operations, either on the machine or in the programs, which dominate the benchmark results. This information helps designers in improving the performance of future machines, and users in tuning their applications to better utilize the performance of existing machines. Here we apply our methodology to characterize benchmarks and predict their execution times. We present extensi...
Symbolic Analysis for Parallelizing Compilers
, 1994
"... Symbolic Domain The objects in our abstract symbolic domain are canonical symbolic expressions. A canonical symbolic expression is a lexicographically ordered sequence of symbolic terms. Each symbolic term is in turn a pair of an integer coefficient and a sequence of pairs of pointers to program va ..."
Abstract

Cited by 105 (4 self)
 Add to MetaCart
Symbolic Domain The objects in our abstract symbolic domain are canonical symbolic expressions. A canonical symbolic expression is a lexicographically ordered sequence of symbolic terms. Each symbolic term is in turn a pair of an integer coefficient and a sequence of pairs of pointers to program variables in the program symbol table and their exponents. The latter sequence is also lexicographically ordered. For example, the abstract value of the symbolic expression 2ij+3jk in an environment that i is bound to (1; (( " i ; 1))), j is bound to (1; (( " j ; 1))), and k is bound to (1; (( " k ; 1))) is ((2; (( " i ; 1); ( " j ; 1))); (3; (( " j ; 1); ( " k ; 1)))). In our framework, environment is the abstract analogous of state concept; an environment is a function from program variables to abstract symbolic values. Each environment e associates a canonical symbolic value e x for each variable x 2 V ; it is said that x is bound to e x. An environment might be represented by...
Determining Average Program Execution Times and their Variance
, 1989
"... This paper presents a general framework for determining average program execution times and their variance, based on the program's interval structure and control dependence graph. Average execution times and variance values are computed using frequency information from an optimized counterbased exe ..."
Abstract

Cited by 87 (0 self)
 Add to MetaCart
This paper presents a general framework for determining average program execution times and their variance, based on the program's interval structure and control dependence graph. Average execution times and variance values are computed using frequency information from an optimized counterbased execution profile of the program. 1 Introduction It is important for a compiler to obtain estimates of execution times for subcomputations of an input program, if it is to attempt optimizations related to overhead values in the target architecture. In earlier work [SH86a, SH86b, Sar87, Sar89], we used estimates of execution times to facilitate the automatic partitioning and scheduling of programs written in the singleassignment language, Sisal, for parallel execution on multiprocessors. In this paper, we present a general framework for estimating average execution times in a program. This approach is based on the interval structure [ASU86] and the control dependence relation [FOW87], both of w...
Static Branch Frequency and Program Profile Analysis
 In 27th International Symposium on Microarchitecture
, 1994
"... : Program profiles identify frequently executed portions of a program, which are the places at which optimizations offer programmers and compilers the greatest benefit. Compilers, however, infrequently exploit program profiles, because profiling a program requires a programmer to instrument and run ..."
Abstract

Cited by 71 (1 self)
 Add to MetaCart
: Program profiles identify frequently executed portions of a program, which are the places at which optimizations offer programmers and compilers the greatest benefit. Compilers, however, infrequently exploit program profiles, because profiling a program requires a programmer to instrument and run the program. An attractive alternative is for the compiler to statically estimate program profiles. . This paper presents several new techniques for static branch prediction and profiling. The first technique combines multiple predictions of a branch's outcome into a prediction of the probability that the branch is taken. Another technique uses these predictions to estimate the relative execution frequency (i.e., profile) of basic blocks and controlflow edges within a procedure. A third algorithm uses local frequency estimates to predict the global frequency of calls, procedure invocations, and basic block and controlflow edge executions. Experiments on the SPEC92 integer benchmarks and Uni...
Task Granularity Analysis in Logic Programs
, 1990
"... While logic programming languages o#er a great deal of scope for parallelism, there is usually some overhead associated with the execution of goals in parallel because of the work involved in task creation and scheduling. In practice, therefore, the "granularity" of a goal, i.e. an estimate of the w ..."
Abstract

Cited by 63 (28 self)
 Add to MetaCart
While logic programming languages o#er a great deal of scope for parallelism, there is usually some overhead associated with the execution of goals in parallel because of the work involved in task creation and scheduling. In practice, therefore, the "granularity" of a goal, i.e. an estimate of the work available under it, should be taken into account when deciding whether or not to execute a goal concurrently as a separate task. This paper describes a method for estimating the granularity of a goal at compile time. The runtime overhead associated with our approach is usually quite small, and the performance improvements resulting from the incorporation of grainsize control can be quite good. This is shown by means of experimental results.
Complexity Analysis for a Lazy HigherOrder Language
 In Proceedings of the 3rd European Symposium on Programming
, 1990
"... This paper is concerned with the timeanalysis of functional programs. Techniques which enable us to reason formally about a program's execution costs have had relatively little attention in the study of functional programming. We concentrate here on the construction of equations which compute t ..."
Abstract

Cited by 47 (2 self)
 Add to MetaCart
This paper is concerned with the timeanalysis of functional programs. Techniques which enable us to reason formally about a program's execution costs have had relatively little attention in the study of functional programming. We concentrate here on the construction of equations which compute the timecomplexity of expressions in a lazy higherorder language. The problem with higherorder functions is that complexity is dependent on the cost of applying functional parameters. Structures called costclosures are introduced to allow us to model both functional parameters and the cost of their application. The problem with laziness is that complexity is dependent on context. Projections are used to characterise the context in which an expression is evaluated, and costequations are parameterised by this contextdescription to give a compositional timeanalysis. Using this form of context information we introduce two types of timeequation: sufficienttime equations and nece...
A Naïve Time Analysis and its Theory of Cost Equivalence
 Journal of Logic and Computation
, 1995
"... Techniques for reasoning about extensional properties of functional programs are well understood, but methods for analysing the underlying intensional or operational properties have been much neglected. This paper begins with the development of a simple but useful calculus for time analysis of nons ..."
Abstract

Cited by 39 (7 self)
 Add to MetaCart
Techniques for reasoning about extensional properties of functional programs are well understood, but methods for analysing the underlying intensional or operational properties have been much neglected. This paper begins with the development of a simple but useful calculus for time analysis of nonstrict functional programs with lazy lists. One limitation of this basic calculus is that the ordinary equational reasoning on functional programs is not valid. In order to buy back some of these equational properties we develop a nonstandard operational equivalence relation called cost equivalence, by considering the number of computation steps as an `observable' component of the evaluation process. We define this relation by analogy with Park's definition of bisimulation in CCS. This formulation allows us to show that cost equivalence is a contextual congruence (and thus is substitutive with respect to the basic calculus) and provides useful proof techniques for establishing costequivalen...
Symbolic Analysis: A Basis for Parallelization, Optimization, and Scheduling of Programs
 In Proceedings of the Sixth Workshop on Languages and Compilers for Parallel Computing
, 1993
"... This paper presents an abstract interpretation framework for parallelizing compilers. Within this framework, symbolic analysis is used to solve various flow analysis problems in a unified way. Symbolic analysis also serves as a basis for code generation optimizations and a tool for derivation of com ..."
Abstract

Cited by 36 (0 self)
 Add to MetaCart
This paper presents an abstract interpretation framework for parallelizing compilers. Within this framework, symbolic analysis is used to solve various flow analysis problems in a unified way. Symbolic analysis also serves as a basis for code generation optimizations and a tool for derivation of computation cost estimates. A loop scheduling strategy that utilizes symbolic timing information is also presented. 1 Introduction Empirical results indicate that existing parallelizing compilers cause insignificant improvements on the performance of many real application programs [9, 5]. The speedups obtained by manual transformation of these applications [9] show the potential for significantly advancing parallelizing compiler technology. The poor performance of current restructuring compilers can be attributed to two causes: imprecise analysis and inappropriate performancewise transformations. The causes are not completely independent; namely, imprecise information results in inappropriate...
Automatic Inference of Upper Bounds for Recurrence Relations in Cost Analysis
 In SAS, LNCS
"... Abstract. The classical approach to automatic cost analysis consists of two phases. Given a program and some measure of cost, we first produce recurrence relations (RRs) which capture the cost of our program in terms of the size of its input data. Second, we convert such RRs into closed form (i.e., ..."
Abstract

Cited by 34 (10 self)
 Add to MetaCart
Abstract. The classical approach to automatic cost analysis consists of two phases. Given a program and some measure of cost, we first produce recurrence relations (RRs) which capture the cost of our program in terms of the size of its input data. Second, we convert such RRs into closed form (i.e., without recurrences). Whereas the first phase has received considerable attention, with a number of cost analyses available for a variety of programming languages, the second phase has received comparatively little attention. In this paper we first study the features of RRs generated by automatic cost analysis and discuss why existing computer algebra systems are not appropriate for automatically obtaining closed form solutions nor upper bounds of them. Then we present, to our knowledge, the first practical framework for the fully automatic generation of reasonably accurate upper bounds of RRs originating from cost analysis of a wide range of programs. It is based on the inference of ranking functions and loop invariants and on partial evaluation. 1
A Monadic Calculus for Parallel Costing of a Functional Language of Arrays
 EuroPar'97 Parallel Processing, volume 1300 of Lecture Notes in Computer Science
, 1997
"... . Vec is a higherorder functional language of nested arrays, which includes a general folding operation. Static computation of the shape of its programs is used to support a compositional cost calculus based on a cost monad. This, in turn, is based on a cost algebra, whose operations may be customi ..."
Abstract

Cited by 25 (9 self)
 Add to MetaCart
. Vec is a higherorder functional language of nested arrays, which includes a general folding operation. Static computation of the shape of its programs is used to support a compositional cost calculus based on a cost monad. This, in turn, is based on a cost algebra, whose operations may be customized to handle different cost regimes, especially for parallel programming. We present examples based on sequential costing and on the PRAM model of parallel computation. The latter has been implemented in Haskell, and applied to some linear algebra examples. 1 Introduction Secondorder combinators such as map, fold and zip provide programmers with a concise, abstract language for writing skeletons for implicitly parallel programs, as in [Ski94], but there is a hitch. These combinators are defined for list programs (see [BW88]), but efficient implementations (which is the point of parallelism, after all) are based on arrays. This disparity becomes acute when working with nested arrays, which...