Results 1 - 10
of
35
Improving Software Diagnosability via Log Enhancement
"... Diagnosing software failures in the field is notoriously difficult, in part due to the fundamental complexity of trouble-shooting any complex software system, but further exacerbated by the paucity of information that is typically available in the production setting. Indeed, for reasons of both over ..."
Abstract
-
Cited by 38 (4 self)
- Add to MetaCart
Diagnosing software failures in the field is notoriously difficult, in part due to the fundamental complexity of trouble-shooting any complex software system, but further exacerbated by the paucity of information that is typically available in the production setting. Indeed, for reasons of both overhead and privacy, it is common that only the run-time log generated by a system (e.g., syslog) can be shared with the developers. Unfortunately, the ad-hoc nature of such reports are frequently insufficient for detailed failure diagnosis. This paper seeks to improve this situation within the rubric of existing practice. We describe a tool, LogEnhancer that automatically “enhances ” existing logging code to aid in future post-failure debugging. We evaluate LogEnhancer on eight large, real-world applications and demonstrate that it can dramatically reduce the set of potential root failure causes that must be considered during diagnosis while imposing negligible overheads. D.2.5 [Testing and Debug-
Finding low-utility data structures
- In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI
, 2010
"... Many opportunities for easy, big-win, program optimizations are missed by compilers. This is especially true in highly layered Java applications. Often at the heart of these missed optimization opportunities lie computations that, with great expense, produce data values that have little impact on th ..."
Abstract
-
Cited by 37 (15 self)
- Add to MetaCart
(Show Context)
Many opportunities for easy, big-win, program optimizations are missed by compilers. This is especially true in highly layered Java applications. Often at the heart of these missed optimization opportunities lie computations that, with great expense, produce data values that have little impact on the program’s final output. Constructing a new date formatter to format every date, or populating a large set full of expensively constructed structures only to check its size: these involve costs that are out of line with the benefits gained. This disparity between the formation costs and accrued benefits of data structures is at the heart of much runtime bloat. We introduce a run-time analysis to discover these low-utility data structures. The analysis employs dynamic thin slicing, which naturally associates costs with value flows rather than raw data flows. It constructs a model of the incremental, hop-to-hop, costs
Efficient checkpointing of java software using context-sensitive capture and replay
- In ESEC-FSE ’07: Proceedings of the 6th joint meeting of the european software engineering conference and the 14th ACM SIGSOFT symposium on Foundations of software engineering
, 2007
"... Checkpointing and replaying is an attractive technique that has been used widely at the operating/runtime system level to provide fault tolerance. Applying such a technique at the application level can benefit a range of software engineering tasks such as testing of long-running programs, automated ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
(Show Context)
Checkpointing and replaying is an attractive technique that has been used widely at the operating/runtime system level to provide fault tolerance. Applying such a technique at the application level can benefit a range of software engineering tasks such as testing of long-running programs, automated debugging, and dynamic slicing. We propose a checkpointing/replaying technique for Java that operates purely at the language level, without the need for JVM-level or OS-level support. At the core of our approach are static analyses that select, at certain program points, a safe subset of the program state to capture and replay. Irrelevant statements before the checkpoint are eliminated using control-dependencebased slicing; the remaining statements together with the captured run-time values are used to indirectly recreate the call stack of the original program at the checkpoint. At the checkpoint itself and at certain subsequent program points, the replaying version restores parts of the program state that are necessary for execution of the surrounding method. Our experimental studies indicate that the proposed static and dynamic analyses have the potential to reduce significantly the execution time for replaying, with low run-time overhead for checkpointing.
Penumbra: automatically identifying failure-relevant inputs using dynamic tainting
- In Proc. of Symposium on Software Testing and Analysis (2009), ACM
"... Most existing automated debugging techniques focus on reducing the amount of code to be inspected and tend to ignore an important component of software failures: the inputs that cause the failure to manifest. In this paper, we present a new technique based on dynamic tainting for automatically ident ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
(Show Context)
Most existing automated debugging techniques focus on reducing the amount of code to be inspected and tend to ignore an important component of software failures: the inputs that cause the failure to manifest. In this paper, we present a new technique based on dynamic tainting for automatically identifying subsets of a program’s inputs that are relevant to a failure. The technique (1) marks program inputs when they enter the application, (2) tracks them as they propagate during execution, and (3) identifies, for an observed failure, the subset of inputs that are potentially relevant for debugging that failure. To investigate feasibility and usefulness of our technique, we created a prototype tool, penumbra, and used it to evaluate our technique on several failures in real programs. Our results are promising, as they show that penumbra can point developers to inputs that are actually relevant for investigating a failure and can be more practical than existing alternative approaches.
Deriving Input Syntactic Structure from Execution
, 2008
"... Program input syntactic structure is essential for a wide range of applications such as test case generation, software debugging and network security. However, such important information is often not available (e.g., most malware programs make use of secret protocols to communicate) or not directly ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
Program input syntactic structure is essential for a wide range of applications such as test case generation, software debugging and network security. However, such important information is often not available (e.g., most malware programs make use of secret protocols to communicate) or not directly usable by machines (e.g., many programs specify their inputs in plain text or other random formats). Furthermore, many programs claim they accept inputs with a published format, but their implementations actually support a subset or a variant. Based on the observations that input structure is manifested by the way input symbols are used during execution and most programs take input with top-down or bottom-up grammars, we devise two dynamic analyses, one for each grammar category. Our evaluation on a set of real-world programs shows that our technique is able to precisely reverse engineer input syntactic structure from execution.
Behavior based software theft detection
- in Proc. 16th ACM Conf. Comput. and Commun. Security (CCS ’09
, 2009
"... ABSTRACT Along with the burst of open source projects, software theft (or plagiarism) has become a very serious threat to the healthiness of software industry. Software birthmark, which represents the unique characteristics of a program, can be used for software theft detection. We propose a system ..."
Abstract
-
Cited by 16 (8 self)
- Add to MetaCart
(Show Context)
ABSTRACT Along with the burst of open source projects, software theft (or plagiarism) has become a very serious threat to the healthiness of software industry. Software birthmark, which represents the unique characteristics of a program, can be used for software theft detection. We propose a system call dependence graph based software birthmark called SCDG birthmark, and examine how well it reflects unique behavioral characteristics of a program. To our knowledge, our detection system based on SCDG birthmark is the first one that is capable of detecting software component theft where only partial code is stolen. We demonstrate the strength of our birthmark against various evasion techniques, including those based on different compilers and different compiler optimization levels as well as two state-of-the-art obfuscation tools. Unlike the existing work that were evaluated through small or toy software, we also evaluate our birthmark on a set of large software. Our results show that SCDG birthmark is very practical and effective in detecting software theft that even adopts advanced evasion techniques.
Dynamic Slicing of Multithreaded Programs for Race Detection
- In ICSM’08
"... Prior work has shown that computing dynamic slices of erroneous program values can greatly assist in locating the root cause of erroneous behavior by identifying faulty statements in sequential programs. These dynamic slices represent backward transitive closure over exercised read-afterwrite data d ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
(Show Context)
Prior work has shown that computing dynamic slices of erroneous program values can greatly assist in locating the root cause of erroneous behavior by identifying faulty statements in sequential programs. These dynamic slices represent backward transitive closure over exercised read-afterwrite data dependences and control dependences. However, for a multithreaded program executing on a processor, data races represent an additional source of errors which are not captured by dynamic slices. We present an extended form of dynamic slice for multithreaded programs which can assist in locating faults, including those caused by data races. We demonstrate the effectiveness of our approach via case studies and also describe an efficient algorithm for computing dynamic slices. 1.
Precise Calling Context Encoding
- In ACM International Conference on Software Engineering
, 2010
"... Calling contexts are very important for a wide range of applications such as profiling, debugging, and event logging. Most applications perform expensive stack walking to recover contexts. The resulting contexts are often explicitly represented as a sequence of call sites and hence bulky. We propose ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
(Show Context)
Calling contexts are very important for a wide range of applications such as profiling, debugging, and event logging. Most applications perform expensive stack walking to recover contexts. The resulting contexts are often explicitly represented as a sequence of call sites and hence bulky. We propose a technique to encode the current calling context of any point during an execution. In particular, an acyclic call path is encoded into one number through only integer additions. Recursive call paths are divided into acyclic subsequences and encoded independently. We leverage stack depth in a safe way to optimize encoding: if a calling context can be safely and uniquely identified by its stack depth, we do not perform encoding. We propose an algorithm to seamlessly fuse encoding and stack depth based identification. The algorithm is safe because different contexts are guaranteed to have different IDs. It also ensures contexts can be faithfully decoded. Our experiments show that our technique incurs negligible overhead (1.89 % on average). For most medium-sized programs, it can encode all contexts with just one number. For large programs, we are able to encode most calling contexts to a few numbers. 1.
Cachetor: Detecting Cacheable Data to Remove Bloat
"... Modern object-oriented software commonly suffers from runtime bloat that significantly affects its performance and scalability. Studies have shown that one important pattern of bloat is the work repeatedly done to compute the same data values. Very often the cost of computation is very high and it i ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
(Show Context)
Modern object-oriented software commonly suffers from runtime bloat that significantly affects its performance and scalability. Studies have shown that one important pattern of bloat is the work repeatedly done to compute the same data values. Very often the cost of computation is very high and it is thus beneficial to memoize the invariant data values for later use. While this is a common practice in real-world development, manually finding invariant data values is a daunting task during development and tuning. To help the developers quickly find such optimization opportunities for performance improvement, we propose a novel run-time profiling tool, called Cachetor, which uses a combination of dynamic dependence profiling and value profiling to identify and report operations that keep generating identical data values. The major challenge in the design of Cachetor is that both dependence and value profiling are extremely expensive techniques that cannot scale to large, real-world applications for which optimizations are important. To overcome this challenge, we propose a series of novel abstractions that are applied to run-time instruction instances during profiling, yielding significantly improved analysis time and scalability. We have implemented Cachetor in Jikes Research Virtual Machine and evaluated it on a set of 14 large Java applications. Our experimental results suggest that Cachetor is effective in exposing caching opportunities and substantial performance gains can be achieved by modifying a program to cache the reported data.
Language-based replay via data flow cut.
- In Proc. of FSE.
, 2010
"... ABSTRACT A replay tool aiming to reproduce a program's execution interposes itself at an appropriate replay interface between the program and the environment. During recording, it logs all non-deterministic side effects passing through the interface from the environment and feeds them back dur ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
ABSTRACT A replay tool aiming to reproduce a program's execution interposes itself at an appropriate replay interface between the program and the environment. During recording, it logs all non-deterministic side effects passing through the interface from the environment and feeds them back during replay. The replay interface is critical for correctness and recording overhead of replay tools. iTarget is a novel replay tool that uses programming language techniques to automatically seek a replay interface that both ensures correctness and minimizes recording overhead. It performs static analysis to extract data flows, estimates their recording costs via dynamic profiling, computes an optimal replay interface that minimizes the recording overhead, and instruments the program accordingly for interposition. Experimental results show that iTarget can successfully replay complex C programs, including Apache web server and Berkeley DB, and that it can reduce the log size by up to two orders of magnitude and slowdown by up to 50%.