Results 1 - 10
of
28
Locating causes of program failures
- 27th International Conference on Software Engineering (ICSE
, 2005
"... Which is the defect that causes a software failure? By comparing the program states of a failing and a passing run, we can identify the state differences that cause the failure. However, these state differences can occur all over the program run. Therefore, we focus in space on those variables and v ..."
Abstract
-
Cited by 106 (8 self)
- Add to MetaCart
Which is the defect that causes a software failure? By comparing the program states of a failing and a passing run, we can identify the state differences that cause the failure. However, these state differences can occur all over the program run. Therefore, we focus in space on those variables and values that are relevant for the failure, and in time on those moments where cause transitions occur—moments where new relevant variables begin being failure causes: “Initially, variable argc was 3; therefore, at shell sort(), variable a[2] was 0, and therefore, the program failed. ” In our evaluation, cause transitions locate the failureinducing defect twice as well as the best methods known so far.
Fault Localization with Nearest Neighbor Queries
, 2003
"... We present a method for performing fault localization using similar program spectra. Our method assumes the existence of a faulty run and a larger number of correct runs. It then selects according to a distance criterion the correct run that most resembles the faulty run, compares the spectra corres ..."
Abstract
-
Cited by 101 (1 self)
- Add to MetaCart
We present a method for performing fault localization using similar program spectra. Our method assumes the existence of a faulty run and a larger number of correct runs. It then selects according to a distance criterion the correct run that most resembles the faulty run, compares the spectra corresponding to these two runs, and produces a report of "suspicious" parts of the program. Our method is widely applicable because it does not require any knowledge of the program input and no more information from the user than a classification of the runs as either "correct" or "faulty". To experimentally validate the viability of the method, we implemented it in a tool, WHITHER using basic block profiling spectra. We experimented with two different similarity measures and the Siemens suite of 132 programs with injected bugs. To measure the success of the tool, we developed a generic method for establishing the quality of a report. The method is based on the way an "ideal user" would navigate the program using the report to save effort during debugging. The best results we obtained were, on average, above 50%, meaning that our ideal user would avoid looking at half of the program.
Active Learning for Automatic Classification of Software
, 2004
"... A program's behavior is ultimately the collection of all its executions. This collection is diverse, unpredictable, and generally unbounded. Thus it is especially suited to statistical analysis and machine learning techniques. The primary focus of this paper is on the automatic classification of pro ..."
Abstract
-
Cited by 55 (6 self)
- Add to MetaCart
A program's behavior is ultimately the collection of all its executions. This collection is diverse, unpredictable, and generally unbounded. Thus it is especially suited to statistical analysis and machine learning techniques. The primary focus of this paper is on the automatic classification of program behavior using execution data. Prior work on classifiers for software engineering adopts a classical batch-learning approach. In contrast, we explore an active-learning paradigm for behavior classification. In active learning, the classifier is trained incrementally on a series of labeled data elements. Secondly, we explore the thesis that certain features of program behavior are stochastic processes that exhibit the Markov property, and that the resultant Markov models of individual program executions can be automatically clustered into effective predictors of program behavior. We present a technique that models program executions as Markov models, and a clustering method for Markov models that aggregates multiple program executions into effective behavior classifiers. We evaluate an application of active learning to the efficient refinement of our classifiers by conducting three empirical studies that explore a scenario illustrating automated test plan augmentation.
Locating faulty code using failure-inducing chops
- In 20th IEEE/ACM International Conference on Automated Software Engineering (ASE 2005
, 2005
"... Software debugging is the process of locating and correcting faulty code. Prior techniques to locate faulty code either use program analysis techniques such as backward dynamic program slicing or exclusively use delta debugging to analyze the state changes during program execution. In this paper, we ..."
Abstract
-
Cited by 45 (11 self)
- Add to MetaCart
Software debugging is the process of locating and correcting faulty code. Prior techniques to locate faulty code either use program analysis techniques such as backward dynamic program slicing or exclusively use delta debugging to analyze the state changes during program execution. In this paper, we present a new approach that integrates the potential of delta debugging algorithm with the benefit of forward and backward dynamic program slicing to narrow down the search for faulty code. Our approach is to use delta debugging algorithm to identify a minimal failure-inducing input, use this input to compute a forward dynamic slice and then intersect the statements in this forward dynamic slice with the statements in the backward dynamic slice of the erroneous output to compute a failure-inducing chop. We implemented our technique and conducted experiments with faulty versions of several programs from the Siemens suite to evaluate our technique. Our experiments show that failure-inducing chops can greatly reduce the size of search space compared to the dynamic slices without significantly compromising the capability to locate the faulty code. We also applied our technique to several programs with known memory related bugs such as buffer overflow bugs. The failure-inducing chop in several of these cases contained only 2 to 4 statements which included the code causing memory corruption.
Improved error reporting for software that uses black-box components
- In Proceedings of the Conference on Programming Language Design and Implementation 2007
, 2007
"... An error occurs when software cannot complete a requested action as a result of some problem with its input, configuration, or environment. A high-quality error report allows a user to understand and correct the problem. Unfortunately, the quality of error reports has been decreasing as software bec ..."
Abstract
-
Cited by 22 (6 self)
- Add to MetaCart
An error occurs when software cannot complete a requested action as a result of some problem with its input, configuration, or environment. A high-quality error report allows a user to understand and correct the problem. Unfortunately, the quality of error reports has been decreasing as software becomes more complex and layered. End-users take the cryptic error messages given to them by programs and struggle to fix their problems using search engines and support websites. Developers cannot improve their error messages when they receive an ambiguous or otherwise insufficient error indicator from a black-box software component. We introduce Clarify, a system that improves error reporting by classifying application behavior. Clarify uses minimally invasive monitoring to generate a behavior profile, which is a summary of the program’s execution history. A machine learning classifier uses the behavior profile to classify the application’s behavior,
Augmenting automatically generated unit-test suites with regression oracle checking
- In ECOOP
, 2006
"... Abstract. A test case consists of two parts: a test input to exercise the program under test and a test oracle to check the correctness of the test execution. A test oracle is often in the form of executable assertions such as in the JUnit testing framework. Manually generated test cases are valuabl ..."
Abstract
-
Cited by 17 (6 self)
- Add to MetaCart
Abstract. A test case consists of two parts: a test input to exercise the program under test and a test oracle to check the correctness of the test execution. A test oracle is often in the form of executable assertions such as in the JUnit testing framework. Manually generated test cases are valuable in exposing program faults in the current program version or regression faults in future program versions. However, manually generated test cases are often insufficient for assuring high software quality. We can then use an existing test-generation tool to generate new test inputs to augment the existing test suite. However, without specifications these automatically generated test inputs often do not have test oracles for exposing faults. In this paper, we have developed an automatic approach and its supporting tool, called Orstra, for augmenting an automatically generated unit-test suite with regression oracle checking. The augmented test suite has an improved capability of guarding against regression faults. In our new approach, Orstra first executes the test suite and collects the class under test’s object states exercised by the test suite. On collected object states, Orstra creates assertions for asserting behavior of the object states. On executed observer methods (public methods with non-void returns), Orstra also creates assertions for asserting their return values. Then later when the class is changed, the augmented test suite is executed to check whether assertion violations are reported. We have evaluated Orstra on augmenting automatically generated tests for eleven subjects taken from a variety of sources. The experimental results show that an automatically generated test suite’s fault-detection capability can be effectively improved after being augmented by Orstra. 1
Statistical debugging: A hypothesis testing-based approach
- IEEE TRANSACTION ON SOFTWARE ENGINEERING
, 2006
"... Manual debugging is tedious, as well as costly. The high cost has motivated the development of fault localization techniques, which help developers search for fault locations. In this paper, we propose a new statistical method, called SOBER, which automatically localizes software faults without any ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
Manual debugging is tedious, as well as costly. The high cost has motivated the development of fault localization techniques, which help developers search for fault locations. In this paper, we propose a new statistical method, called SOBER, which automatically localizes software faults without any prior knowledge of the program semantics. Unlike existing statistical approaches that select predicates correlated with program failures, SOBER models the predicate evaluation in both correct and incorrect executions and regards a predicate as fault-relevant if its evaluation pattern in incorrect executions significantly diverges from that in correct ones. Featuring a rationale similar to that of hypothesis testing, SOBER quantifies the fault relevance of each predicate in a principled way. We systematically evaluate SOBER under the same setting as previous studies. The result clearly demonstrates the effectiveness: SOBER could help developers locate 68 out of the 130 faults in the Siemens suite by examining no more than 10 percent of the code, whereas the Cause Transition approach proposed by Holger et al. [6] and the statistical approach by Liblit et al. [12] locate 34 and 52 faults, respectively. Moreover, the effectiveness of SOBER is also evaluated in an “imperfect world,” where the test suite is either inadequate or only partially labeled. The experiments indicate that SOBER could achieve competitive quality under these harsh circumstances. Two case studies with grep 2.2 and bc 1.06 are reported, which shed light on the applicability of SOBER on reasonably large programs.
Checking inside the black box: Regression testing based on value spectra differences
- In Proceedings of the 20th IEEE International Conference on Software Maintenance
, 2004
"... Comparing behaviors of program versions has become an important task in software maintenance and regression testing. Traditional regression testing strongly focuses on black-box comparison of program outputs. Program spectra have recently been proposed to characterize a program’s behavior inside the ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Comparing behaviors of program versions has become an important task in software maintenance and regression testing. Traditional regression testing strongly focuses on black-box comparison of program outputs. Program spectra have recently been proposed to characterize a program’s behavior inside the black box. Comparing program spectra of program versions offers insights into the internal behavior differences between versions. In this paper, we present a new class of program spectra, value spectra, which enriches the existing program spectra family. We compare the value spectra of an old version and a new version to detect internal behavior deviations in the new version. We use a deviation-propagation call tree to present the deviation details. Based on the deviation-propagation call tree, we propose two heuristics to locate deviation roots, which are program locations that trigger the behavior deviations. We have conducted an experiment on seven C programs to evaluate our approach. The results show that our approach can effectively expose program behavior differences between versions even when their program outputs are the same, and our approach reports deviation roots with high accuracy for most programs. 1.
An empirical evaluation of test case filtering techniques based on exercising complex information flows
- In Proceedings of the 27th International Conference on Software Engineering (ICSE 2005
, 2005
"... Some software defects trigger failures only when certain complex information flows occur within the software. Profiling and analyzing such flows therefore provides a potentially important basis for filtering test cases. We report the results of an empirical evaluation of several test case filtering ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Some software defects trigger failures only when certain complex information flows occur within the software. Profiling and analyzing such flows therefore provides a potentially important basis for filtering test cases. We report the results of an empirical evaluation of several test case filtering techniques that are based on exercising complex information flows. Both coverage-based and profile-distribution-based filtering techniques are considered. They are compared to filtering techniques based on exercising basic blocks, branches, function calls, and def-use pairs, with respect to their effectiveness for revealing defects.
Checking inside the black box: Regression testing by comparing value spectra
- IEEE Transactions on Software Engineering
, 2005
"... Abstract—Comparing behaviors of program versions has become an important task in software maintenance and regression testing. Black-box program outputs have been used to characterize program behaviors and they are compared over program versions in traditional regression testing. Program spectra have ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Abstract—Comparing behaviors of program versions has become an important task in software maintenance and regression testing. Black-box program outputs have been used to characterize program behaviors and they are compared over program versions in traditional regression testing. Program spectra have recently been proposed to characterize a program’s behavior inside the black box. Comparing program spectra of program versions offers insights into the internal behavioral differences between versions. In this paper, we present a new class of program spectra, value spectra, that enriches the existing program spectra family. We compare the value spectra of a program’s old version and new version to detect internal behavioral deviations in the new version. We use a deviationpropagation call tree to present the deviation details. Based on the deviation-propagation call tree, we propose two heuristics to locate deviation roots, which are program locations that trigger the behavioral deviations. We also use path spectra (previously proposed program spectra) to approximate the program states in value spectra. We then similarly compare path spectra to detect behavioral deviations and locate deviation roots in the new version. We have conducted an experiment on eight C programs to evaluate our spectra-comparison approach. The results show that both value-spectra-comparison and path-spectra-comparison approaches can effectively expose program behavioral differences between program versions even when their program outputs are the same, and our value-spectra-comparison approach reports deviation roots with high accuracy for most programs.

