Results 1 - 10
of
68
From uncertainty to belief: Inferring the specification within
- In ”Proceedings of the Seventh Symposium on Operating Systems Design and Implemetation
, 2006
"... Automatic tools for finding software errors require a set of specifications before they can check code: if they do not know what to check, they cannot find bugs. This paper presents a novel framework based on factor graphs for automatically inferring specifications directly from programs. The key st ..."
Abstract
-
Cited by 44 (0 self)
- Add to MetaCart
Automatic tools for finding software errors require a set of specifications before they can check code: if they do not know what to check, they cannot find bugs. This paper presents a novel framework based on factor graphs for automatically inferring specifications directly from programs. The key strength of the approach is that it can incorporate many disparate sources of evidence, allowing us to squeeze significantly more information from our observations than previously published techniques. We illustrate the strengths of our approach by applying it to the problem of inferring what functions in C programs allocate and release resources. We evaluated its effectiveness on five codebases: SDL, OpenSSH, GIMP, and the OS kernels for Linux and Mac OS X (XNU). For each codebase, starting with zero initially provided annotations, we observed an inferred annotation accuracy of 80-90%, with often near perfect accuracy for functions called as little as five times. Many of the inferred allocator and deallocator functions are functions for which we both lack the implementation and are rarely called — in some cases functions with at most one or two callsites. Finally, with the inferred annotations we quickly found both missing and incorrect properties in a specification used by a commercial static bug-finding tool. 1
MAPO: Mining API Usages from Open Source Repositories
- In MSR ’06: Proceedings of the 2006 international workshop on Mining software repositories
, 2006
"... To improve software productivity, when constructing new software systems, developers often reuse existing class libraries or frameworks by invoking their APIs. Those APIs, however, are often complex and not well documented, posing barriers for developers to use them in new client code. To get famili ..."
Abstract
-
Cited by 38 (4 self)
- Add to MetaCart
To improve software productivity, when constructing new software systems, developers often reuse existing class libraries or frameworks by invoking their APIs. Those APIs, however, are often complex and not well documented, posing barriers for developers to use them in new client code. To get familiar with how those APIs are used, developers may search the Web using a general search engine to find relevant documents or code examples. Developers can also use a source code search engine to search open source repositories for source files that use the same APIs. Nevertheless, the number of returned source files is often large. It is difficult for developers to learn API usages from a large number of returned results. In order to help developers understand API usages and write API client code more effectively, we have developed an API usage mining framework and its supporting tool called MAPO (for Mining API usages from Open source repositories). Given a query that describes a method, class, or package for an API, MAPO leverages the existing source code search engines to gather relevant source files and conducts data mining. The mining leads to a short list of frequent API usages for developers to inspect. MAPO currently consists of five components: a code search engine, a source code analyzer, a sequence preprocessor, a frequent sequence miner, and a frequent sequence postprocessor. We have examined the effectiveness of MAPO using a set of various queries. The preliminary results show that the framework is practical for providing informative and succinct API usage patterns.
A Survey on Software Clone Detection Research
- SCHOOL OF COMPUTING TR 2007-541, QUEEN’S UNIVERSITY
, 2007
"... Code duplication or copying a code fragment and then reuse by pasting with or without any modifications is a well known code smell in software maintenance. Several studies show that about 5 % to 20 % of a software systems can contain duplicated code, which is basically the results of copying existin ..."
Abstract
-
Cited by 32 (7 self)
- Add to MetaCart
Code duplication or copying a code fragment and then reuse by pasting with or without any modifications is a well known code smell in software maintenance. Several studies show that about 5 % to 20 % of a software systems can contain duplicated code, which is basically the results of copying existing code fragments and using then by pasting with or without minor modifications. One of the major shortcomings of such duplicated fragments is that if a bug is detected in a code fragment, all the other fragments similar to it should be investigated to check the possible existence of the same bug in the similar fragments. Refactoring of the duplicated code is another prime issue in software maintenance although several studies claim that refactoring of certain clones are not desirable and there is a risk of removing them. However, it is also widely agreed that clones should at least be detected. In this paper, we survey the state of the art in clone detection research. First, we describe the clone terms commonly used in the literature along with their corresponding mappings to the commonly used clone types. Second, we provide a review of the existing
Mining API patterns as partial orders from source code: From usage scenarios to specifications
- In Proc. 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE
, 2007
"... A software system interacts with third-party libraries through various APIs. Using these library APIs often needs to follow certain usage patterns. Furthermore, ordering rules (specifications) exist between APIs, and these rules govern the secure and robust operation of the system using these APIs. ..."
Abstract
-
Cited by 31 (9 self)
- Add to MetaCart
A software system interacts with third-party libraries through various APIs. Using these library APIs often needs to follow certain usage patterns. Furthermore, ordering rules (specifications) exist between APIs, and these rules govern the secure and robust operation of the system using these APIs. But these patterns and rules may not be well documented by the API developers. Previous approaches mine frequent association rules, itemsets, or subsequences that capture API call patterns shared by API client code. However, these frequent API patterns cannot completely capture some useful orderings shared by APIs, especially when multiple APIs are involved across different procedures. In this paper, we present a framework to automatically extract usage scenarios among user-specified APIs as partial orders, directly from the source code (API client code). We adapt a model checker to generate interprocedural control-flow-sensitive static traces related to the APIs of interest. Different API usage scenarios are extracted from the static traces by our scenario extraction algorithm and fed to a miner. The miner summarizes different usage scenarios as compact partial orders. Specifications are extracted from the frequent partial orders using our specification extraction algorithm. Our experience of applying the framework on 72 X11 clients with 200K LOC in total has shown that the extracted API partial orders are useful in assisting effective API reuse and checking.
Detecting Object Usage Anomalies
, 2007
"... Interacting with objects often requires following a protocol—for instance, a specific sequence of method calls. These protocols are not always documented, and violations can lead to subtle problems. Our approach takes code examples to automatically infer legal sequences of method calls. The resultin ..."
Abstract
-
Cited by 30 (2 self)
- Add to MetaCart
Interacting with objects often requires following a protocol—for instance, a specific sequence of method calls. These protocols are not always documented, and violations can lead to subtle problems. Our approach takes code examples to automatically infer legal sequences of method calls. The resulting patterns can then be used to detect anomalies such as “Before calling next(), one normally calls hasNext()”. To our knowledge, this is the first fully automatic defect detection approach that learns and checks method call sequences. Our JADET prototype has detected yet undiscovered defects and code smells in five popular open-source programs, including two new defects in ASPECTJ.
Deckard: Scalable and accurate tree-based detection of code clones
- In ICSE
, 2007
"... Detecting code clones has many software engineering applications. Existing approaches either do not scale to large code bases or are not robust against minor code modifications. In this paper, we present an efficient algorithm for identifying similar subtrees and apply it to tree representations of ..."
Abstract
-
Cited by 29 (3 self)
- Add to MetaCart
Detecting code clones has many software engineering applications. Existing approaches either do not scale to large code bases or are not robust against minor code modifications. In this paper, we present an efficient algorithm for identifying similar subtrees and apply it to tree representations of source code. Our algorithm is based on a novel characterization of subtrees with numerical vectors in the Euclidean space R n and an efficient algorithm to cluster these vectors w.r.t. the Euclidean distance metric. Subtrees with vectors in one cluster are considered similar. We have implemented our tree similarity algorithm as a clone detection tool called DECKARD and evaluated it on large code bases written in C and Java including the Linux kernel and JDK. Our experiments show that DECKARD is both scalable and accurate. It is also language independent, applicable to any language with a formally specified grammar. 1.
Have Things Changed Now? An Empirical Study of Bug Characteristics in Modern Open Source Software
- Proc. of 1st Workshop on Architectural and System Support for Improving Software Dependability
, 2006
"... Software errors are a major cause for system failures. To effectively design tools and support for detecting and recovering from software failures requires a deep understanding of bug 1 characteristics. Recently, software and its development process have significantly changed in many ways, including ..."
Abstract
-
Cited by 29 (4 self)
- Add to MetaCart
Software errors are a major cause for system failures. To effectively design tools and support for detecting and recovering from software failures requires a deep understanding of bug 1 characteristics. Recently, software and its development process have significantly changed in many ways, including more help from bug detection tools, shift towards multi-threading architecture, the opensource development paradigm and increasing concerns about security and user-friendly interface. Therefore, results from previous studies may not be applicable to present software. Furthermore, many new aspects such as security, concurrency and open-sourcerelated characteristics have not well studied. Additionally, previous studies were based on a small number of bugs, which may lead to non-representative results. To investigate the impacts of the new factors on software errors,
Mining specifications of malicious behavior
- In Proceedings of the 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on Foundations of Software Engineering
, 2007
"... Malware detectors require a specification of malicious behavior. Typically, these specifications are manually constructed by investigating known malware. We present an automatic technique to overcome this laborious manual process. Our technique derives such a specification by comparing the execution ..."
Abstract
-
Cited by 26 (7 self)
- Add to MetaCart
Malware detectors require a specification of malicious behavior. Typically, these specifications are manually constructed by investigating known malware. We present an automatic technique to overcome this laborious manual process. Our technique derives such a specification by comparing the execution behavior of a known malware against the execution behaviors of a set of benign programs. In other words, we mine the malicious behavior present in a known malware that is not present in a set of benign programs. The output of our algorithm can be used by malware detectors to detect malware variants. Since our algorithm provides a succinct description of malicious behavior present in a malware, it can also be used by security analysts for understanding the malware. We have implemented a prototype based on our algorithm and tested it on several malware programs. Experimental results obtained from our prototype indicate that our algorithm is effective in extracting malicious behaviors that can be used to detect malware variants.
Muvi: Automatically inferring multi-variable access correlations and detecting related semantic and concurrency bugs
- In Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP07
, 2007
"... Software defects significantly reduce system dependability. Among various types of software bugs, semantic and concurrency bugs are two of the most difficult to detect. This paper proposes a novel method, called MUVI, that detects an important class of semantic and concurrency bugs. MUVI automatical ..."
Abstract
-
Cited by 25 (6 self)
- Add to MetaCart
Software defects significantly reduce system dependability. Among various types of software bugs, semantic and concurrency bugs are two of the most difficult to detect. This paper proposes a novel method, called MUVI, that detects an important class of semantic and concurrency bugs. MUVI automatically infers commonly existing multi-variable access correlations through code analysis and then detects two types of related bugs: (1) inconsistent updates—correlated variables are not updated in a consistent way, and (2) multivariable concurrency bugs—correlated accesses are not protected in the same atomic sections in concurrent programs. We evaluate MUVI on four large applications: Linux, Mozilla, MySQL, and PostgreSQL. MUVI automatically infers more than 6000 variable access correlations with high accuracy (83%). Based on the inferred correlations, MUVI detects 39 new inconsistent update semantic bugs from the latest versions of these applications, with 17 of them recently confirmed by the developers based on our reports. We also implemented MUVI multi-variable extensions to two representative data race bug detection methods (lockset and happens-before). Our evaluation on five real-world multi-variable concurrency bugs from Mozilla and MySQL shows that the MUVI-extension correctly identifies the root causes of four out of the five multi-variable concurrency bugs with 14 % additional overhead on average. Interestingly, MUVI also helps detect four new multi-variable concurrency bugs in Mozilla that have never been reported before. None of the nine bugs can be identified correctly by the original race detectors without our MUVI extensions.
DySy: Dynamic symbolic execution for invariant inference
- In Proc. 30th ACM/IEEE International Conference on Software Engineering (ICSE). ACM
, 2007
"... Dynamically discovering likely program invariants from concrete test executions has emerged as a highly promising software engineering technique. Dynamic invariant inference has the advantage of succinctly summarizing both “expected” program inputs and the subset of program behaviors that is normal ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
Dynamically discovering likely program invariants from concrete test executions has emerged as a highly promising software engineering technique. Dynamic invariant inference has the advantage of succinctly summarizing both “expected” program inputs and the subset of program behaviors that is normal under those inputs. In this paper, we introduce a technique that can drastically increase the relevance of inferred invariants, or reduce the size of the test suite required to obtain good invariants. Instead of falsifying invariants produced by pre-set patterns, we determine likely program invariants by combining the concrete execution of actual test cases with a simultaneous symbolic execution of conditions over program variables that the concrete tests satisfy during their execution. In this way, we obtain the benefits of dynamic inference tools like Daikon: the inferred invariants correspond to the observed program behaviors. At the same time, however, our inferred invariants are much more suited to the program at hand than Daikon’s hardcoded invariant patterns. The symbolic invariants are literally derived from the program text itself, with appropriate value substitutions as dictated by symbolic execution. We implemented our technique in the DySy tool, which utilizes a powerful symbolic execution and simplification engine. The results confirm the benefits of our approach. In Daikon’s prime example benchmark, we infer the majority of the interesting Daikon invariants, while eliminating invariants that a human user is likely to consider irrelevant.

