Results 1 - 10
of
23
VPC3: A Fast and Effective Trace-Compression Algorithm
, 2004
"... Trace files are widely used in research and academia to study the behavior of programs. They are simple to process and guarantee repeatability. Unfortunately, they tend to be very large. This paper describes vpc3, a fundamentally new approach to compressing program traces. Vpc3 employs value predict ..."
Abstract
-
Cited by 27 (8 self)
- Add to MetaCart
Trace files are widely used in research and academia to study the behavior of programs. They are simple to process and guarantee repeatability. Unfortunately, they tend to be very large. This paper describes vpc3, a fundamentally new approach to compressing program traces. Vpc3 employs value predictors to bring out and amplify patterns in the traces so that conventional compressors can compress them more effectively. In fact, our approach not only results in much higher compression rates but also provides faster compression and decompression. For example, compared to bzip2, vpc3's geometric mean compression rate on SPECcpu2000 store address traces is 18.4 times higher, compression is ten times faster, and decompression is three times faster.
Whole Execution Traces
, 2004
"... Different types of program profiles (control flow, value, address, and dependence) have been collected and extensively studied by researchers to identify program characteristics that can then be exploited to develop more effective compilers and architectures. Due to the large amounts of profile data ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
Different types of program profiles (control flow, value, address, and dependence) have been collected and extensively studied by researchers to identify program characteristics that can then be exploited to develop more effective compilers and architectures. Due to the large amounts of profile data produced by realistic program runs, most work has focused on separately collecting and compressing different types of profiles. In this paper we present a unified representation of profiles called Whole Execution Trace (WET) which includes the complete information contained in each of the above types of traces. Thus WETs provide a basis for a next generation software tool that will enable mining of program profiles to identify program characteristics that require understanding of relationships among various types of profiles. The key features of our WET representation are: WET is constructed by labeling a static program representation with profile information such that relavent and related profile information can be directly accessed by analysis algorithms as they traverse the representation; a highly effective two tier strategy is used to significantly compress the WET; and compression techniques are designed such that they do not adversely affect the ability to rapidly traverse WET for extracting subsets of information corresponding to individual profile types as well as a combination of profile types (e.g., in form of dynamic slices of WETs). Our experimentation shows that on an average execution traces resulting from execution of 647 Million statements can be stored in 331 Megabytes of storage after compression. The compression factors range from 16 to 83. Moreover the rates at which different types of profiles can be individually or simultaneously extracted are high.
Efficient program execution indexing
- in PLDI (2008
"... Execution indexing uniquely identifies a point in an execution. Desirable execution indices reveal correlations between points in an execution and establish correspondence between points across multiple executions. Therefore, execution indexing is essential for a wide variety of dynamic program anal ..."
Abstract
-
Cited by 14 (7 self)
- Add to MetaCart
Execution indexing uniquely identifies a point in an execution. Desirable execution indices reveal correlations between points in an execution and establish correspondence between points across multiple executions. Therefore, execution indexing is essential for a wide variety of dynamic program analyses, for example, it can be used to organize program profiles; it can precisely identify the point in a re-execution that corresponds to a given point in an original execution and thus facilitate debugging or dynamic instrumentation. In this paper, we formally define the concept of execution index and propose an indexing scheme based on execution structure and program state. We present a highly optimized online implementation of the technique. We also perform a client study, which targets producing a failure inducing schedule for a data race by verifying the two alternative happens-before orderings of a racing pair. Indexing is used to precisely locate corresponding points across multiple executions in the presence of non-determinism so that no heavyweight tracing/replay system is needed.
Exploiting Streams in Instruction and Data Address Trace Compression
- In Proceedings of IEEE 6th Annual Workshop on Workload Characterization
, 2003
"... Novel research ideas in computer architecture are frequently evaluated using trace-driven simulation. The large size of traces incited different techniques for trace reduction. These techniques often combine standard compression algorithms with trace-specific solutions, taking into account the trade ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
Novel research ideas in computer architecture are frequently evaluated using trace-driven simulation. The large size of traces incited different techniques for trace reduction. These techniques often combine standard compression algorithms with trace-specific solutions, taking into account the tradeoff between reduction in the trace size and simulation slowdown due to decompression. This paper introduces SBC, a new algorithm for instruction and data address trace compression based on instruction streams. The proposed technique significantly reduces trace size and simulation time, and can be successfully combined with general compression algorithms. The SBC technique combined with gzip reduces the size of SPEC CPU2000 traces 59-97930 times, and combined with Sequitur 65-185599 times. 1.
Fast Lossless Compression of Scientific Floating-Point Data
- Proc. Data Compression Conf. (DCC ’06
, 2006
"... In scientific computing environments, large amounts of floating-point data often need to be transferred between computers as well as to and from storage devices. Compression can reduce the number of bits that need to be transferred and stored. However, the runtime overhead due to compression may be ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
In scientific computing environments, large amounts of floating-point data often need to be transferred between computers as well as to and from storage devices. Compression can reduce the number of bits that need to be transferred and stored. However, the runtime overhead due to compression may be undesirable in high-performance settings where short communication latencies and high bandwidths are essential. This paper describes and evaluates a new compression algorithm that is tailored to such environments. It typically compresses numeric floating-point values better and faster than other algorithms do. On our data sets, it achieves compression ratios between 1.2 and 4.2 as well as compression and decompression throughputs between 2.8 and 5.9 million 64-bit double-precision numbers per second on a 3GHz Pentium 4 machine. 1.
Automatic Generation of High-Performance Trace Compressors
"... Program execution traces are frequently used in industry and academia. Yet, most trace-compression algorithms have to be re-implemented every time the trace format is changed, which takes time, is error prone, and often results in inefficient solutions. This paper describes and evaluates TCgen, a to ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
Program execution traces are frequently used in industry and academia. Yet, most trace-compression algorithms have to be re-implemented every time the trace format is changed, which takes time, is error prone, and often results in inefficient solutions. This paper describes and evaluates TCgen, a tool that automatically generates portable, customized, high-performance trace compressors. All the user has to do is provide a description of the trace format and select one or more predictors to compress the fields in the trace records. TCgen translates this specification into C source code and optimizes it for the specified trace format and predictor algorithms. On average, the generated code is faster and compresses better than the six other compression algorithms we have tested. For example, a comparison with SBC, one of the best trace-compression algorithms in the current literature, shows that TCgen’s synthesized code compresses SPECcpu2000 address traces 23 % more, decompresses them 24 % faster, and compresses them 1029 % faster.
The VPC trace-compression algorithms
- IEEE Transactions on Computers
, 2005
"... Abstract—Execution traces, such as are used to study and analyze program behavior, are often so large that they need to be stored in compressed form.This paper describes the design and implementation of four value prediction-based compression (VPC) algorithms for traces that record the PC as well as ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
Abstract—Execution traces, such as are used to study and analyze program behavior, are often so large that they need to be stored in compressed form.This paper describes the design and implementation of four value prediction-based compression (VPC) algorithms for traces that record the PC as well as other information about executed instructions.VPC1 directly compresses traces using value predictors, VPC2 adds a second compression stage, and VPC3 utilizes value predictors to convert traces into streams that can be compressed better and more quickly than the original traces.VPC4 introduces further algorithmic enhancements and is automatically synthesized.Of the 55 SPECcpu2000 traces we evaluate, VPC4 compresses 36 better, decompresses 26 faster, and compresses 53 faster than BZIP2, MACHE, PDATS II, SBC, and SEQUITUR.It delivers the highest geometric-mean compression rate, decompression speed, and compression speed because of the predictors ’ simplicity and their ability to exploit local value locality.Most other compression algorithms can only exploit global value locality. Index Terms—Data compaction and compression, performance analysis and design aids. 1
Stream-Based Trace Compression
- Computer Architecture Letters
, 2003
"... Abstract — Trace-driven simulation has long been used in both processor and memory studies. The large size of traces motivated different techniques for trace reduction. These techniques often combine standard compression algorithms with trace-specific solutions, taking into account the tradeoff betw ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Abstract — Trace-driven simulation has long been used in both processor and memory studies. The large size of traces motivated different techniques for trace reduction. These techniques often combine standard compression algorithms with trace-specific solutions, taking into account the tradeoff between reduction in the trace size and simulation slowdown due to decompression. This paper introduces SBC, a new algorithm for instruction and data address trace compression based on instruction streams. The proposed technique significantly reduces trace size and simulation time, and it is orthogonal to general compression algorithms. When combined with gzip, SBC reduces the size of SPEC CPU2000 traces 94-71968 times. Index Terms —simulation, instruction and address trace, trace
Whole execution traces and their applications
- ACM Transactions on Architecture and Code Optimization
, 2005
"... Different types of program profiles (control flow, value, address, and dependence) have been collected and extensively studied by researchers to identify program characteristics that can then be exploited to develop more effective compilers and architectures. Because of the large amounts of profile ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Different types of program profiles (control flow, value, address, and dependence) have been collected and extensively studied by researchers to identify program characteristics that can then be exploited to develop more effective compilers and architectures. Because of the large amounts of profile data produced by realistic program runs, most work has focused on separately collecting and compressing different types of profiles. In this paper, we present a unified representation of profiles called Whole Execution Trace (WET), which includes the complete information contained in each of the above types of traces. Thus, WETs provide a basis for a next-generation software tool that will enable mining of program profiles to identify program characteristics that require understanding of relationships among various types of profiles. The key features of our WET representation are: WET is constructed by labeling a static program representation with profile information such that relevant and related profile information can be directly accessed by analysis algorithms as they traverse the representation; a highly effective two-tier strategy is used to significantly compress the WET; and compression techniques are designed such that they minimally affect the ability to rapidly traverse WET for extracting subsets of information corresponding to individual profile types as well as a combination of profile types. Our experimentation shows that on, an
Runtime Compression of MPI Messages to Improve the Performance and Scalability
- of Parallel Applications.” High-Performance Computing, Networking and Storage Conference
, 2004
"... Communication-intensive parallel applications spend a significant amount of their total execution time exchanging data between processes, which leads to poor performance in many cases. In this paper, we investigate message compression in the context of large-scale parallel message-passing systems to ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
Communication-intensive parallel applications spend a significant amount of their total execution time exchanging data between processes, which leads to poor performance in many cases. In this paper, we investigate message compression in the context of large-scale parallel message-passing systems to reduce the communication time of individual messages and to improve the bandwidth of the overall system. We implement and evaluate the cMPI message-passing library, which quickly compresses messages on-the-fly with a low enough overhead that a net execution time reduction can be obtained. Our re-sults on six large-scale benchmark applications show that execution speed improves by up to 98 % when message compres-sion is enabled. 1.

