Results 1 -
5 of
5
Optimally Profiling and Tracing Programs
- ACM Transactions on Programming Languages and Systems
, 1994
"... copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others ..."
Abstract
-
Cited by 256 (17 self)
- Add to MetaCart
copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications
Branch Prediction For Free
, 1993
"... Many compilers rely on branch prediction to improve program performance by identifying frequently executed regions and by aiding in scheduling instructions. Profile-based predictors require a time-consuming and inconvenient compile-profile-compile cycle in order to make predictions. We present a pro ..."
Abstract
-
Cited by 144 (8 self)
- Add to MetaCart
Many compilers rely on branch prediction to improve program performance by identifying frequently executed regions and by aiding in scheduling instructions. Profile-based predictors require a time-consuming and inconvenient compile-profile-compile cycle in order to make predictions. We present a program-based branch predictor that performs well for a large and diverse set of programs written in C and Fortran. In addition to using natural loop analysis to predict branches that control the iteration of loops, we focus on heuristics for predicting non-loop branches, which dominate the dynamic branch count of many programs. The heuristics are simple and require little program analysis, yet they are effective in terms of coverage and miss rate. Although program-based prediction does not equal the accuracy of profile-based prediction, we believe it reaches a sufficiently high level to be useful. Additional type and semantic information available to a compiler would enhance our heuristics. #...
Normalized Performance Indices For Message Passing Parallel Programs
- Proc. Int. Conf. on Supercomputing
, 1994
"... Existing tools for locating performance bottlenecks of message passing parallel programs either provide visualizations or profiles of program executions only; they do not highlight the cause of poor program performance. Identifying the cause of poor performance necessitates the need to expose how we ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Existing tools for locating performance bottlenecks of message passing parallel programs either provide visualizations or profiles of program executions only; they do not highlight the cause of poor program performance. Identifying the cause of poor performance necessitates the need to expose how well the underlying algorithm has been mapped onto the parallel machine. From the perspective of the application, the location and cause of performance problems in terms of procedures, processors and data structures are all important. In this paper, we present a suite of normalized performance indices that provide a convenient mechanism for focusing on a location with poor performance. These indices are complemented by additional indices that highlight the cause of the performance failure in terms of processors, procedures and data structure interactions. With the help of examples from the NAS benchmark suite, we show that the automatically generated indices help detect potential causes of poo...
Behavioral Profiling Based High Level Power Estimation Methodologies for VLSI ASIC and FPGA Synthesis
"... This work addresses the problem of estimating power consumption at higher levels of design abstraction namely, behavioral and architectural (RTL) levels. The techniques are employed in a high level synthesis environment known as Profile-Driven Synthesis System (PDSS), to synthesize low power designs ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This work addresses the problem of estimating power consumption at higher levels of design abstraction namely, behavioral and architectural (RTL) levels. The techniques are employed in a high level synthesis environment known as Profile-Driven Synthesis System (PDSS), to synthesize low power designs. The contributions of this work are: a behavioral profiler, a behavioral power estimator, architectural power estimator architectural power simulation and a low power behavioral synthesis environment -- PDSS. Broadly, for any power estimation technique proposed in this work, there are two distinct steps : (a) Power Characterization of Module Library; and (b) Power estimation for a given design. The first step involves characterizing the RTL parameterized module library (for datapaths) and PLAs (for controllers) for typical power consumption on an input event, as a function of parameters such bit-size, input size, state-size etc. The second step involves using the profile data gathered at ...
Analyzing Data-Structure Movements in . . .
, 1994
"... In this paper we show that the analysis of interprocessor data movement in terms of sourcelevel data structures can be effective in performance debugging. We present a method for the low overhead run time monitoring of interprocessor communication in terms of data structures. We show how performa ..."
Abstract
- Add to MetaCart
In this paper we show that the analysis of interprocessor data movement in terms of sourcelevel data structures can be effective in performance debugging. We present a method for the low overhead run time monitoring of interprocessor communication in terms of data structures. We show how performance indices based on postmortem analysis of the collected trace data can guide the user directly to the causes of poor performance. One of the most important decisions a programmer has to make in writing parallel programs is with regard to data structure distributions and alignments. Even so, there are very few performance tools which attempt to provide statistics or views of programs in terms of the data structure interactions resulting from those alignments. Current tools for message passing programs provide mechanisms for studying performance from the processor and function perspectives only. We demonstrate that our approach, based on postmortem analysis of trace files augmented wi...

