Results 21 - 30
of
150
Procedure Placement Using Temporal-Ordering Information
- ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS
, 1997
"... ..."
Using Hardware Performance Monitors to Understand the Behavior of Java Applications
- IN PROC. OF THE THIRD USENIX VIRTUAL MACHINE RESEARCH AND TECHNOLOGY SYMP
, 2004
"... Modern Java programs, such as middleware and application servers, include many complex software components. Improving the performance of these Java applications requires a better understanding of the interactions between the application, virtual machine, operating system, and architecture. Hardware ..."
Abstract
-
Cited by 33 (8 self)
- Add to MetaCart
Modern Java programs, such as middleware and application servers, include many complex software components. Improving the performance of these Java applications requires a better understanding of the interactions between the application, virtual machine, operating system, and architecture. Hardware performance monitors, which are available on most modern processors, provide facilities to obtain detailed performance measurements of long-running applications in real time. However, interpreting the data collected using hardware performance monitors is difficult because of the low-level nature of the data. We have
Interprocedural Path Profiling
, 1999
"... In path profiling, a program is instrumented with code that counts the number of times particular path fragments of the program are executed. This paper extends the intraprocedural path-profiling technique of Ball and Larus to collect information about interprocedural paths (i.e., paths that may cro ..."
Abstract
-
Cited by 32 (2 self)
- Add to MetaCart
In path profiling, a program is instrumented with code that counts the number of times particular path fragments of the program are executed. This paper extends the intraprocedural path-profiling technique of Ball and Larus to collect information about interprocedural paths (i.e., paths that may cross procedure boundaries).
An Architectural Framework for Run-Time Optimization
- IEEE Transactions on Computers
, 2001
"... Wide-issue processors continue to achieve higher performance by exploiting greater instruction-level parallelism. Dynamic techniques such as out-of-order execution and hardware speculation have proven effective at increasing instruction throughput. Run-time optimization promises to provide an even ..."
Abstract
-
Cited by 30 (2 self)
- Add to MetaCart
Wide-issue processors continue to achieve higher performance by exploiting greater instruction-level parallelism. Dynamic techniques such as out-of-order execution and hardware speculation have proven effective at increasing instruction throughput. Run-time optimization promises to provide an even higher level of performance by adaptively applying aggressive code transformations on a larger scope. This paper presents a new hardware mechanism for generating and deploying run-time optimized code. The mechanism can be viewed as a filtering system, that resides in the retirement stage of the processor pipeline, accepts an instruction execution stream as input, and produces instruction profiles and sets of linked, optimized traces as output. The code deployment mechanism uses an extension to the branch prediction mechanism to migrate execution into the new code without modifying the original code. These new components do not add delay to the execution of the program except during short bursts of reoptimization. This technique provides a strong platform for run-time optimization because the hot execution regions are extracted, optimized, and written to main memory for execution and because these regions persist across context switches. The current design of the framework supports a suite of optimizations including partial function inlining (even into shared libraries), code straightening optimizations, loop unrolling, and peephole optimizations. 1
Rapid Profiling via Stratified Sampling
, 2001
"... Sophisticated binary translators and dynamic optimizers demand a program profiler with low overhead, high accuracy, and the ability to collect a variety of profile types. A profiling scheme that achieves these goals is proposed. Conceptually, the hardware compresses a stream of profile data by count ..."
Abstract
-
Cited by 28 (3 self)
- Add to MetaCart
Sophisticated binary translators and dynamic optimizers demand a program profiler with low overhead, high accuracy, and the ability to collect a variety of profile types. A profiling scheme that achieves these goals is proposed. Conceptually, the hardware compresses a stream of profile data by counting identical events; the compressed profile data is passed to software for analysis. Compressing the high-bandwidth event stream greatly reduces software overhead. Because optimizations can tolerate some profiling errors, we allow the stream compressor to be lossy, thereby enabling a low-cost sampling-based hardware design. Because the hardware compressor is insensitive to the event content, it supports various profile types and can process multiple types simultaneously. Basic components of our framework are periodic and random samplers, counters, and hash functions. These components are composed to form a variety of stream compressors. One design is both simple and very effective: the input stream is hash-split into multiple substreams, each of which is fed into a simple periodic sampler that selects every kth event. This stratified periodic sampler performs better than conventional random sampling because it biases each substream towards a small number of unique events, thereby reducing sampling error, and allowing faster convergence to an accurate profile. For example, convergence to a given level of accuracy is about twice as fast for gcc. When sampling overhead is considered, the stratified periodic profiler achieves less than 3% error while incurring an overhead of only 3.5% for gcc.
Profiling deployed software: Assessing strategies and testing opportunities
- IEEE TRANS. SOFTW. ENG
, 2005
"... An understanding of how software is employed in the field can yield many opportunities for quality improvements. Profiling released software can provide such an understanding. However, profiling released software is difficult due to the potentially large number of deployed sites that must be profil ..."
Abstract
-
Cited by 28 (8 self)
- Add to MetaCart
An understanding of how software is employed in the field can yield many opportunities for quality improvements. Profiling released software can provide such an understanding. However, profiling released software is difficult due to the potentially large number of deployed sites that must be profiled, the transparency requirements at a user’s site, and the remote data collection and deployment management process. Researchers have recently proposed various approaches to tap into the opportunities offered by profiling deployed systems and overcome those challenges. Initial studies have illustrated the application of these approaches and have shown their feasibility. Still, the proposed approaches, and the tradeoffs between overhead, accuracy, and potential benefits for the testing activity have been barely quantified. This paper aims to overcome those limitations. Our analysis of 1,200 user sessions on a 155 KLOC deployed system substantiates the ability of field data to support test suite improvements, assesses the efficiency of profiling techniques for released software, and the effectiveness of testing efforts that leverage profiled field data.
Achieving High Performance via Co-Designed Virtual Machines
- In International Workshop on Innovative Architecture
, 1999
"... Introduction Today's virtual machines use a layer of software that allows programs compiled in one instruction set to be executed on a processor executing a (different) native instruction set. Virtual machines have become popular in recent years for providing platform independence; however, virtual ..."
Abstract
-
Cited by 25 (6 self)
- Add to MetaCart
Introduction Today's virtual machines use a layer of software that allows programs compiled in one instruction set to be executed on a processor executing a (different) native instruction set. Virtual machines have become popular in recent years for providing platform independence; however, virtual machines also open many new opportunities for enhancing performance. The co-design of virtual machine software and the underlying hardware microarchitecture will enable enhanced instruction level parallelism and more adaptable performance mechanisms than are possible when hardware and application software are separated by instruction set architectures as is traditionally done. In future high performance computers, a virtual instruction set architecture (V-ISA) will be the level for maintaining architectural compatibility. The V-ISA will be implemented with a virtual machine that blends software and hardware in a symbiotic manner via co-design. The hardware will support an implementationdep
Using Paths to Measure, Explain, and Enhance Program Behavior
, 2000
"... Paths can reveal a program’s dynamic behavior and uncover patterns of path locality that can be exploited to increase program performance. The authors explore several methods for doing so. ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
Paths can reveal a program’s dynamic behavior and uncover patterns of path locality that can be exploited to increase program performance. The authors explore several methods for doing so.
Adaptive Online Context-Sensitive Inlining
- In First Annual IEEE/ACM Interational Conference on Code Generation and Optimization
, 2003
"... As current trends in software development move toward more complex object-oriented programming, inlining has become a vital optimization that provides substantial performance improvements to C++ and Java programs. Yet, the aggressiveness of the inlining algorithm must be carefully monitored to effec ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
As current trends in software development move toward more complex object-oriented programming, inlining has become a vital optimization that provides substantial performance improvements to C++ and Java programs. Yet, the aggressiveness of the inlining algorithm must be carefully monitored to effectively balance performance and code size. The state-of-the-art is to use profile information (associated with call edges) to guide inlining decisions. In the presence of virtual method calls, profile information for one call edge may not be sufficient for making effectual inlining decisions. Therefore, we explore the use of profiling data with additional levels of context sensitivity. In addition to exploring fixed levels of context sensitivity, we explore several adaptive schemes that attempt to find the ideal degree of context sensitivity for each call site. Our techniques are evaluated on the basis of runtime performance, code size and dynamic compilation time. On average, we found that with minimal impact on performance (+/-1%) context sensitivity can enable 10% reductions in compiled code space and compile time. Performance on individual programs varied from -4.2% to 5.3% while reductions in compile time and code space of up to 33.0% and 56.7% respectively were obtained. 1.
Self-adapting numerical software for next generation applications
- Int. J. High Perf. Comput. Appl
, 2002
"... The challenge for the development of next generation software is the successful management of the complex grid environment while delivering to the scientist the full power of flexible compositions of the available algorithmic alternatives. Self-Adapting Numerical Software (SANS) systems are intended ..."
Abstract
-
Cited by 23 (7 self)
- Add to MetaCart
The challenge for the development of next generation software is the successful management of the complex grid environment while delivering to the scientist the full power of flexible compositions of the available algorithmic alternatives. Self-Adapting Numerical Software (SANS) systems are intended to meet this significant challenge. A SANS system comprises intelligent next generation numerical software that domain scientists – with disparate levels of knowledge of algorithmic and programmatic complexities of the underlying numerical software – can use to easily express and efficiently solve their problem. The components of a SANS system are: • A SANS agent with: – An intelligent component that automates method selection based on data, algorithm and system attributes. – A system component that provides intelligent management of and access to the computational grid. – A history database that records relevant information generated by the intelligent component and maintains past performance data of the interaction (e.g., algorithmic, hardware specific, etc.) between SANS components. • A simple scripting language that allows a structured multilayered implementation of the SANS while ensuring portability and extensibility of the user interface and underlying libraries. • An XML/CCA-based vocabulary of metadata to describe behavioural properties of both data and algorithms. • System components, including a runtime adaptive scheduler, and prototype libraries that automate the process of architecture-dependent tuning to optimize performance on different platforms. A SANS system can dramatically improve the ability of computational scientists to model complex, interdisciplinary phenomena with maximum efficiency and a minimum of extra-domain expertise. SANS innovations (and their generalizations) will provide to the scientific and engineering community a dynamic computational environment in which the most effective library components are automatically selected based on the problem characteristics, data attributes, and the state of the grid. 1

