Results 1 - 10
of
14
Low-overhead call path profiling of unmodified, optimized code
- In Proc. 19th Annual International Conference on Supercomputing
, 2005
"... Call path profiling associates resource consumption with the calling context in which resources were consumed. We describe the design and implementation of a low-overhead call path profiler based on stack sampling. The profiler uses a novel sample-driven strategy for collecting frequency counts for ..."
Abstract
-
Cited by 17 (5 self)
- Add to MetaCart
Call path profiling associates resource consumption with the calling context in which resources were consumed. We describe the design and implementation of a low-overhead call path profiler based on stack sampling. The profiler uses a novel sample-driven strategy for collecting frequency counts for call graph edges without instrumenting every procedure’s code to count them. The data structures and algorithms used are efficient enough to construct the complete calling context tree exposed during sampling. The profiler leverages information recorded by compilers for debugging or exception handling to record call path profiles even for highly-optimized code. We describe an implementation for the Tru64/Alpha platform. Experiments profiling the SPEC CPU2000 benchmark suite demonstrate the low (2%-7%) overhead of this profiler. A comparison with instrumentation-based profilers, such as gprof, showsthat for call-intensive programs, our sampling-based strategy for call path profiling has over an order of magnitude lower overhead. 1.
Cache-Aware Cross-Profiling for Java Processors
- CASES'08
, 2008
"... Performance evaluation of embedded software is essential in an early development phase so as to ensure that the software will run on the embedded device’s limited computing resources. Prevailing approaches either require the deployment of the software on the embedded target, which can be tedious and ..."
Abstract
-
Cited by 5 (5 self)
- Add to MetaCart
Performance evaluation of embedded software is essential in an early development phase so as to ensure that the software will run on the embedded device’s limited computing resources. Prevailing approaches either require the deployment of the software on the embedded target, which can be tedious and may be impossible in an early development phase, or rely on simulation, which can be very slow. In this paper, we introduce a customizable cross-profiling framework for embedded Java processors, including processors featuring a method cache. The developer profiles the embedded software in the host environment, completely decoupled from the target system, on any standard Java Virtual Machine, but the generated profiles represent the execution time metric of the target system. Our cross-profiling framework is based on bytecode instrumentation. We identify several pointcuts in the execution of bytecode that need to be instrumented in order to estimate the CPU cycle consumption on the target system. An evaluation using the JOP embedded Java processor as target confirms that our approach reconciles high profile accuracy with moderate overhead. Our cross-profiling framework also enables the rapid evaluation of the performance impact of possible optimizations, such as different caching strategies.
Efficient, Context-Sensitive Detection of Semantic Attacks
"... Software developers are increasingly choosing memory-safe languages such as Java because they help deploy higher-quality software faster. As a result, semantic vulnerabilities—omitted security checks, misconfigured security policies, and other software design errors—are supplanting memory-corruption ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Software developers are increasingly choosing memory-safe languages such as Java because they help deploy higher-quality software faster. As a result, semantic vulnerabilities—omitted security checks, misconfigured security policies, and other software design errors—are supplanting memory-corruption exploits as the primary cause of security violations. We present PECAN, a precise, efficient defense against semantic attacks based on dynamic anomaly detection. We show that detection of semantic exploits requires both context and history sensitivity. PECAN supports very efficient run-time tracking of calling contexts and histories, and thus enables accurate detection of unusual behaviors associated with security violations. We evaluate our approach on several real-world semantic exploits that target subtle bugs in real Java applications and libraries. Our sample attacks are representative of common types of semantic vulnerabilities. All were successfully detected by PECAN. The run-time overhead of our approach on standard benchmarks is 5% on average and at most 9%. The efficiency of PECAN is a qualitative advance in the state of the art: unlike many existing methods, PECAN can be deployed in a production system with a minimal performance penalty. Furthermore, we investigate the tradeoff between sensitivity and accuracy, and empirically demonstrate that PECAN achieves high sensitivity with few false positives. 1.
Accurate, Efficient, and Adaptive Calling Context Profiling
- PLDI '06
, 2006
"... Calling context profiles are used in many inter-procedural code optimizations and in overall program understanding. Unfortunately, the collection of profile information is highly intrusive due to the high frequency of method calls in most applications. Previously proposed calling-context profiling m ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Calling context profiles are used in many inter-procedural code optimizations and in overall program understanding. Unfortunately, the collection of profile information is highly intrusive due to the high frequency of method calls in most applications. Previously proposed calling-context profiling mechanisms consequently suffer from either low accuracy, high overhead, or both. We have developed a new approach for building the calling context tree at runtime, called adaptive bursting. By selectively inhibiting redundant profiling, this approach dramatically reduces overhead while preserving profile accuracy. We first demonstrate the drawbacks of previously proposed calling context profiling mechanisms. We show that a low-overhead solution using sampled stack-walking alone is less than 50 % accurate, based on degree of overlap with a complete calling-context tree. We also show that a static bursting approach collects a highly accurate profile, but causes an unacceptable application slowdown. Our adaptive solution achieves 85 % degree of overlap and provides an 88% hot-edge coverage when using a 0.1 hot-edge threshold, while dramatically reducing overhead compared to the static bursting approach.
Breadcrumbs: Efficient Context Sensitivity for Dynamic Bug Detection Analyses ∗
"... Calling context—the set of active methods on the stack—is critical for understanding the dynamic behavior of large programs. Dynamic program analysis tools, however, are almost exclusively context insensitive because of the prohibitive cost of representing calling contexts at run time. Deployable dy ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Calling context—the set of active methods on the stack—is critical for understanding the dynamic behavior of large programs. Dynamic program analysis tools, however, are almost exclusively context insensitive because of the prohibitive cost of representing calling contexts at run time. Deployable dynamic analyses, in particular, are limited to reporting only static program locations. This paper presents Breadcrumbs, an efficient technique for recording and reporting dynamic calling contexts. It builds on an existing technique for computing a compact (one word) encoding of each calling context that client analyses can use in place of a program location. The key feature of our system is a search algorithm that can reconstruct a calling context from its encoding using only a static call graph and a small amount of dynamic information collected in cold methods. Breadcrumbs requires no offline training or program modifications, and handles all language features, including dynamic class loading. On average, it adds 10% to 20 % overhead to existing dynamic analyses, depending on how much additional information it collects: more information slows down execution, but improves the decoding algorithm. We use Breadcrumbs to add context sensitivity to two dynamic analyses: a race detector and an analysis that identifies the origins of null pointer exceptions. Our system can reconstruct nearly all of the contexts for the reported bugs in a few seconds. These calling contexts are non-trivial, and they significantly improve both the precision of the analyses and the quality of the bug reports. 1.
Precise Calling Context Encoding
- In ACM International Conference on Software Engineering
, 2010
"... Calling contexts are very important for a wide range of applications such as profiling, debugging, and event logging. Most applications perform expensive stack walking to recover contexts. The resulting contexts are often explicitly represented as a sequence of call sites and hence bulky. We propose ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Calling contexts are very important for a wide range of applications such as profiling, debugging, and event logging. Most applications perform expensive stack walking to recover contexts. The resulting contexts are often explicitly represented as a sequence of call sites and hence bulky. We propose a technique to encode the current calling context of any point during an execution. In particular, an acyclic call path is encoded into one number through only integer additions. Recursive call paths are divided into acyclic subsequences and encoded independently. We leverage stack depth in a safe way to optimize encoding: if a calling context can be safely and uniquely identified by its stack depth, we do not perform encoding. We propose an algorithm to seamlessly fuse encoding and stack depth based identification. The algorithm is safe because different contexts are guaranteed to have different IDs. It also ensures contexts can be faithfully decoded. Our experiments show that our technique incurs negligible overhead (1.89 % on average). For most medium-sized programs, it can encode all contexts with just one number. For large programs, we are able to encode most calling contexts to a few numbers. 1.
Inferred Call Path Profiling
, 2009
"... Prior work has found call path profiles to be useful for optimizers and programmer-productivity tools. Unfortunately, previous approaches for collecting path profiles are expensive: they need to either execute additional instructions (to track calls and returns) or they need to walk the stack. The s ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Prior work has found call path profiles to be useful for optimizers and programmer-productivity tools. Unfortunately, previous approaches for collecting path profiles are expensive: they need to either execute additional instructions (to track calls and returns) or they need to walk the stack. The state-of-the-art techniques for call path profiling slow down the program by 7 % (for C programs) and 20 % (for Java programs). This paper describes an innovative technique that collects minimal information from the running program and later (offline) infers the full call paths from this information. The key insight behind our approach is that readily available information during program execution—the height of the call stack and the identity of the current executing function—are good indicators of calling context. We call this pair a context identifier. Because more than one call path may have the same context identifier, we show how to disambiguate context identifiers by changing the sizes of function activation records. This disambiguation has no overhead in terms of executed instructions. We evaluate our approach on the SPEC CPU 2006 C++ and C benchmarks. We show that collecting context identifiers slows down programs by 0.17 % (geometric mean). We can map these context identifiers to the correct unique call path 80 % of the time for C++ programs and 95 % of the time for C programs.
hpsgprof: A New Profiling Tool for Large–Scale Parallel Scientific Codes
- UK PERFORMANCE ENGINEERING WORKSHOP 2009 (UKPEW09)
, 2009
"... Contemporary High Performance Computing (HPC) applications can exhibit unacceptably high overheads when existing instrumentation–based performance analysis tools are applied. Our experience shows that for some sections of these codes, existing instrumentation–based tools can cause, on average, a fiv ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Contemporary High Performance Computing (HPC) applications can exhibit unacceptably high overheads when existing instrumentation–based performance analysis tools are applied. Our experience shows that for some sections of these codes, existing instrumentation–based tools can cause, on average, a fivefold increase in runtime. Our experience has been that, in a performance modelling context, these less representative runs can misdirect the modelling process. We present an approach to recording call paths for optimised HPC application binaries, without the need for instrumentation. A a result, a new tool has been developed which complements our work on analytical – and simulation–based performance modelling. The utility of this approach, in terms of low and consistent runtime overhead, is demonstrated by a comparative evaluation against existing tools for a range of recognised HPC benchmark codes.

