Results 11 -
19 of
19
Learning to Combine Multiple Ranking Metrics for Fault Localization
"... Abstract—Fault localization is an inevitable step in software debugging. Spectrum-based fault localization applies a ranking metric to identify faulty source code. Existing empirical studies on fault localization show that there is no optimal ranking metric for all the faults in practice. In this pa ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Abstract—Fault localization is an inevitable step in software debugging. Spectrum-based fault localization applies a ranking metric to identify faulty source code. Existing empirical studies on fault localization show that there is no optimal ranking metric for all the faults in practice. In this paper, we propose MULTRIC, a learning-based approach to combining multiple ranking metrics for effective fault localization. In MULTRIC, a suspiciousness score of a program entity is a combination of existing ranking metrics. MULTRIC consists two major phases: learning and ranking. Based on training faults, MULTRIC builds a ranking model by learning from pairs of faulty and non-faulty source code. When a new fault appear, MULTRIC computes the final ranking with the learned model. Experiments are conducted on 5386 seeded faults in ten open-source Java programs. We empirically compare MULTRIC against four widely-studied metrics and three recently-proposed metrics. Our experimental results show that MULTRIC localizes faults more effectively than state-of-art metrics, such as Tarantula, Ochiai, and Ample. I.
An Empirical Study on the Scalability of Selective Mutation Testing
"... Abstract—Software testing plays an important role in ensur-ing software quality by running a program with test suites. Mutation testing is designed to evaluate whether a test suite is adequate in detecting faults. Due to the expensive cost of mutation testing, selective mutation testing was proposed ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract—Software testing plays an important role in ensur-ing software quality by running a program with test suites. Mutation testing is designed to evaluate whether a test suite is adequate in detecting faults. Due to the expensive cost of mutation testing, selective mutation testing was proposed to select a subset of mutants whose effectiveness is similar to the whole set of generated mutants. Although selective mutation testing has been widely investigated in recent years, many people still doubt whether it can suit well for large programs. To study the scalability of selective mutation testing, we systematically explore how the program size impacts selective mutation testing through four projects (including 12 versions all together). Based on the empirical study, for programs smaller than 16 KLOC, selective mutation testing has surprisingly good scalability. In particular, for a program whose number of lines of executable code is E, the number of mutants used in selective mutation testing is proportional to Ec, where c is a constant whose value is between 0.05 and 0.25. I.
Is XML-based Test Case Prioritization for Validating WS-BPEL Evolution Effective in both Average and Adverse Scenarios?
"... AbstractIn real life, a tester can only afford to apply one test case prioritization technique to one test suite against a service-oriented workflow application once in the regression testing of the application, even if it results in an adverse scenario such that the actual performance in the test s ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
AbstractIn real life, a tester can only afford to apply one test case prioritization technique to one test suite against a service-oriented workflow application once in the regression testing of the application, even if it results in an adverse scenario such that the actual performance in the test session is far below the average. It is unclear whether the factors of test case prioritization techniques known to be significant in terms of average performance can be extrapolated to adverse scenarios. In this paper, we examine whether such a factor or technique may consistently affect the rate of fault detection in both the average and adverse scenarios. The factors studied include prioritization strategy, artifacts to provide coverage data, ordering direction of a strategy, and the use of executable and non-executable artifacts. The results show that only a minor portion of the 10 studied techniques, most of which are based on the iterative strategy, are consistently effective in both average and adverse scenarios. To the best of our know-ledge, this paper presents the first piece of empirical evidence regarding the consistency in the effectiveness of test case prioritization techniques and factors of service-oriented workflow applications between average and adverse scenarios. Keywords—XML-based factor; WS-BPEL; adaptation; adverse I.
Unsupervised Outlier Detection in Software
, 2014
"... Gothenburg the non-exclusive right to publish the Work electronically and in a ..."
Abstract
- Add to MetaCart
Gothenburg the non-exclusive right to publish the Work electronically and in a
Hypervolume-based Search for Test Case
"... Abstract. Test case prioritization (TCP) is aimed at finding an ideal ordering for executing the available test cases to reveal faults earlier. To solve this problem greedy algorithms and meta-heuristics have been widely investigated, but in most cases there is no statistically significant differenc ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. Test case prioritization (TCP) is aimed at finding an ideal ordering for executing the available test cases to reveal faults earlier. To solve this problem greedy algorithms and meta-heuristics have been widely investigated, but in most cases there is no statistically significant difference between them in terms of effectiveness. The fitness function used to guide meta-heuristics condenses the cumulative coverage scores achieved by a test case ordering using the Area Under Curve (AUC) met-ric. In this paper we notice that the AUC metric represents a simplified version of the hypervolume metric used in many objective optimization and we propose HGA, a Hypervolume-based Genetic Algorithm, to solve the TCP problem when using multiple test criteria. The results shows that HGA is more cost-effective than the additional greedy algorithm on large systems and on average requires 36 % of the execution time required by the additional greedy algorithm.
To Be Optimal Or Not in Test-Case Prioritization
"... Abstract—Software testing aims to assure the quality of software under test. To improve the efficiency of software testing, especially regression testing, test-case prioritization is proposed to schedule the execution order of test cases in software testing. Among various test-case prioritization te ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Software testing aims to assure the quality of software under test. To improve the efficiency of software testing, especially regression testing, test-case prioritization is proposed to schedule the execution order of test cases in software testing. Among various test-case prioritization techniques, the simple additional coverage-based technique, which is a greedy strategy, achieves surprisingly competitive empirical results. To investigate how much difference there is between the order produced by the additional technique and the optimal order in terms of coverage, we conduct a study on various empirical properties of optimal coverage-based test-case prioritization. To enable us to achieve the optimal order in acceptable time for our object programs, we formulate optimal coverage-based test-case prioritization as an integer linear programming (ILP) problem. Then we conduct an empirical study for comparing the optimal technique with the simple additional coverage-based technique. From this empirical study, the optimal technique can only slightly outperform the additional coverage-based technique with no statistically significant difference in terms of coverage, and the latter significantly outperforms the former in terms of either fault detection or execution time. As the optimal technique schedules the execution order of test cases based on their structural coverage rather than detected faults, we further implement the ideal optimal test-case prioritization technique, which schedules the execution order of test cases based on their detected faults. Taking this ideal technique as the upper bound of test-case prioritization, we conduct another empirical study for comparing the optimal technique and the simple additional technique with this ideal technique. From
IEEE TRANSACTIONS ON SERVICES COMPUTING, TSC-2013-07-0112.R2 1 A Subsumption Hierarchy of Test Case
"... Abstract—Many composite workflow services utilize non-imperative XML technologies such as WSDL, XPath, XML schema, and XML messages. Regression testing should assure the services against regression faults that appear in both the workflows and these artifacts. In this paper, we propose a refinement-o ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Many composite workflow services utilize non-imperative XML technologies such as WSDL, XPath, XML schema, and XML messages. Regression testing should assure the services against regression faults that appear in both the workflows and these artifacts. In this paper, we propose a refinement-oriented level-exploration strategy and a multilevel coverage model that captures progressively the coverage of different types of artifacts by the test cases. We show that by using them, the test case prioritization techniques initialized on top of existing greedy-based test case prioritization strategy form a subsumption hierarchy such that a technique can produce more test suite permutations than a technique that subsumes it. Our experimental study of a model instance shows that a technique generally achieves a higher fault detection rate than a subsumed technique, which validates that the proposed hierarchy and model have the potential to improve the cost-effectiveness of test case prioritization techniques. Index Terms—Test case prioritization, service orientation, XPath, WSDL, XML messages.
PORA: Proportion-Oriented Randomized Algorithm for Test Case Prioritization
"... Abstract—Effective testing is essential for assuring software quality. While regression testing is time-consuming, the fault detection capability may be compromised if some test cases are discarded. Test case prioritization is a viable solution. To the best of our knowledge, the most effective test ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Effective testing is essential for assuring software quality. While regression testing is time-consuming, the fault detection capability may be compromised if some test cases are discarded. Test case prioritization is a viable solution. To the best of our knowledge, the most effective test case prioritization approach is still the additional greedy algorithm, and existing search-based algorithms have been shown to be visually less effective than the former algorithms in previous empirical studies. This paper proposes a novel Proportion-Oriented Randomized Algorithm (PORA) for test case prioritization. PORA guides test case prioritization by optimizing the distance between the priori-tized test suite and a hierarchy of distributions of test input data. Our experiment shows that PORA test case prioritization tech-niques are as effective as, if not more effective than, the total greedy, additional greedy, and ART techniques, which use code coverage information. Moreover, the experiment shows that PORA techniques are more stable in effectiveness than the others. Index Terms—Test case prioritization, randomized algorithm, proportional sampling strategy, multi-objective optimization I. OVERVIEW A.
A Study of the Rates of Fault Detection
"... Many existing studies measure the effectiveness of test case prioritization techniques using the average performance on a set of test suites. However, in each regression test session, a real-world developer may only afford to apply one prioritization technique to one test suite to test a service onc ..."
Abstract
- Add to MetaCart
(Show Context)
Many existing studies measure the effectiveness of test case prioritization techniques using the average performance on a set of test suites. However, in each regression test session, a real-world developer may only afford to apply one prioritization technique to one test suite to test a service once, even if this application results in an adverse scenario such that the actual perfor-mance in this test session is far below the average result achievable by the same technique over the same test suite for the same application. It indicates that assessing the average performance of such a technique cannot provide adequate confidence for developers to apply the technique. The authors ask a couple of questions: To what extent does the effectiveness of prioritization techniques in average scenarios correlate with that in adverse scenarios? Moreover, to what extent may a design factor of this class of techniques affect the effectiveness of prioritization in different types of scenarios? To the best of their knowledge, the authors report in this paper the first controlled experiment to study these two new research questions through more than 300 million APFD and HMFD data points produced from 19 techniques, eight WS-BPEL benchmarks and 1000 test suites prioritized by each technique 1000 times. A main result reveals a strong and linear correlation between the effectiveness in the average scenarios and that in the adverse scenarios. Another interesting result is that many pairs of levels of the same design factors significantly change their relative