Results 1 - 10
of
48
Finding Application Errors and Security Flaws Using PQL: a Program Query Language
, 2005
"... A number of effective error detection tools have been built in recent years to check if a program conforms to certain design rules. An important class of design rules deals with sequences of events associated with a set of related objects. This paper presents a language called PQL (Program Query Lan ..."
Abstract
-
Cited by 99 (6 self)
- Add to MetaCart
A number of effective error detection tools have been built in recent years to check if a program conforms to certain design rules. An important class of design rules deals with sequences of events associated with a set of related objects. This paper presents a language called PQL (Program Query Language) that allows programmers to express such questions easily in an application-specific context. A query looks like a code excerpt corresponding to the shortest amount of code that would violate a design rule. Details of the target application's precise implementation are abstracted away. The programmer may also specify actions to perform when a match is found, such as recording relevant information or even correcting an erroneous execution on the fly.
Dynamine: Finding Common Error Patterns by Mining Software Revision Histories
- In ESEC/FSE
, 2005
"... A great deal of attention has lately been given to addressing software bugs such as errors in operating system drivers or security bugs. However, there are many other lesser known errors specificto individual applications or APIs and these violations of applicationspecific coding rules are responsib ..."
Abstract
-
Cited by 96 (10 self)
- Add to MetaCart
A great deal of attention has lately been given to addressing software bugs such as errors in operating system drivers or security bugs. However, there are many other lesser known errors specificto individual applications or APIs and these violations of applicationspecific coding rules are responsible for a multitude of errors. In this paper we propose DynaMine, a tool that analyzes source code check-ins to find highly correlated method calls as well as common bug fixes in order to automatically discover application-specific coding patterns. Potential patterns discovered through mining are passed to a dynamic analysis tool for validation; finally, the results of dynamic analysis are presented to the user. The combination of revision history mining and dynamic analysis techniques leveraged in DynaMine proves effective for both discovering new application-specific patterns and for finding errors when applied to very large applications with many man-years of development and debugging effort behind them. We have analyzed Eclipse and jEdit, two widely-used, mature, highly extensible applications consisting of more than 3,600,000 lines of code combined. By mining revision histories, we have discovered 56 previously unknown, highly application-specific patterns. Out of these, 21 were dynamically confirmed as very likely valid patterns and a total of 263 pattern violations were found.
Perracotta: mining temporal API rules from imperfect traces
- Ohio University
, 2006
"... Dynamic inference techniques have been demonstrated to provide useful support for various software engineering tasks including bug finding, test suite evaluation and improvement, and specification generation. To date, however, dynamic inference has only been used effectively on small programs under ..."
Abstract
-
Cited by 85 (2 self)
- Add to MetaCart
Dynamic inference techniques have been demonstrated to provide useful support for various software engineering tasks including bug finding, test suite evaluation and improvement, and specification generation. To date, however, dynamic inference has only been used effectively on small programs under controlled conditions. In this paper, we identify reasons why scaling dynamic inference techniques has proven difficult, and introduce solutions that enable a dynamic inference technique to scale to large programs and work effectively with the imperfect traces typically available in industrial scenarios. We describe our approximate inference algorithm, present and evaluate heuristics for winnowing the large number of inferred properties to a manageable set of interesting properties, and report on experiments using inferred properties. We evaluate our techniques on JBoss and the Windows kernel. Our tool is able to infer many of the properties checked by the Static Driver Verifier and leads us to discover a previously unknown bug in Windows.
From uncertainty to belief: Inferring the specification within
- In ”Proceedings of the Seventh Symposium on Operating Systems Design and Implemetation
, 2006
"... Automatic tools for finding software errors require a set of specifications before they can check code: if they do not know what to check, they cannot find bugs. This paper presents a novel framework based on factor graphs for automatically inferring specifications directly from programs. The key st ..."
Abstract
-
Cited by 44 (0 self)
- Add to MetaCart
Automatic tools for finding software errors require a set of specifications before they can check code: if they do not know what to check, they cannot find bugs. This paper presents a novel framework based on factor graphs for automatically inferring specifications directly from programs. The key strength of the approach is that it can incorporate many disparate sources of evidence, allowing us to squeeze significantly more information from our observations than previously published techniques. We illustrate the strengths of our approach by applying it to the problem of inferring what functions in C programs allocate and release resources. We evaluated its effectiveness on five codebases: SDL, OpenSSH, GIMP, and the OS kernels for Linux and Mac OS X (XNU). For each codebase, starting with zero initially provided annotations, we observed an inferred annotation accuracy of 80-90%, with often near perfect accuracy for functions called as little as five times. Many of the inferred allocator and deallocator functions are functions for which we both lack the implementation and are rarely called — in some cases functions with at most one or two callsites. Finally, with the inferred annotations we quickly found both missing and incorrect properties in a specification used by a commercial static bug-finding tool. 1
Mining API patterns as partial orders from source code: From usage scenarios to specifications
- In Proc. 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE
, 2007
"... A software system interacts with third-party libraries through various APIs. Using these library APIs often needs to follow certain usage patterns. Furthermore, ordering rules (specifications) exist between APIs, and these rules govern the secure and robust operation of the system using these APIs. ..."
Abstract
-
Cited by 31 (9 self)
- Add to MetaCart
A software system interacts with third-party libraries through various APIs. Using these library APIs often needs to follow certain usage patterns. Furthermore, ordering rules (specifications) exist between APIs, and these rules govern the secure and robust operation of the system using these APIs. But these patterns and rules may not be well documented by the API developers. Previous approaches mine frequent association rules, itemsets, or subsequences that capture API call patterns shared by API client code. However, these frequent API patterns cannot completely capture some useful orderings shared by APIs, especially when multiple APIs are involved across different procedures. In this paper, we present a framework to automatically extract usage scenarios among user-specified APIs as partial orders, directly from the source code (API client code). We adapt a model checker to generate interprocedural control-flow-sensitive static traces related to the APIs of interest. Different API usage scenarios are extracted from the static traces by our scenario extraction algorithm and fed to a miner. The miner summarizes different usage scenarios as compact partial orders. Specifications are extracted from the frequent partial orders using our specification extraction algorithm. Our experience of applying the framework on 72 X11 clients with 200K LOC in total has shown that the extracted API partial orders are useful in assisting effective API reuse and checking.
Detecting Object Usage Anomalies
, 2007
"... Interacting with objects often requires following a protocol—for instance, a specific sequence of method calls. These protocols are not always documented, and violations can lead to subtle problems. Our approach takes code examples to automatically infer legal sequences of method calls. The resultin ..."
Abstract
-
Cited by 30 (2 self)
- Add to MetaCart
Interacting with objects often requires following a protocol—for instance, a specific sequence of method calls. These protocols are not always documented, and violations can lead to subtle problems. Our approach takes code examples to automatically infer legal sequences of method calls. The resulting patterns can then be used to detect anomalies such as “Before calling next(), one normally calls hasNext()”. To our knowledge, this is the first fully automatic defect detection approach that learns and checks method call sequences. Our JADET prototype has detected yet undiscovered defects and code smells in five popular open-source programs, including two new defects in ASPECTJ.
SMArTIC: Toward building an accurate, robust and scalable specification miner
- In SIGSOFT FSE
, 2006
"... Improper management of software evolution, compounded by imprecise, and changing requirements, along with the “short time to market ” requirement, commonly leads to a lack of up-to-date specifications. This can result in software that is characterized by bugs, anomalies and even security threats. So ..."
Abstract
-
Cited by 29 (12 self)
- Add to MetaCart
Improper management of software evolution, compounded by imprecise, and changing requirements, along with the “short time to market ” requirement, commonly leads to a lack of up-to-date specifications. This can result in software that is characterized by bugs, anomalies and even security threats. Software specification mining is a new technique to address this concern by inferring specifications automatically. In this paper, we propose a novel API specification mining architecture called SMArTIC (Specification Mining Architecture with Trace fIltering and Clustering) to improve the accuracy, robustness and scalability of specification miners. This architecture is constructed based on two hypotheses: (1) Erroneous traces should be pruned from the input traces to a miner, and (2) Clustering related traces will localize inaccuracies and reduce over-generalizationin learning. Correspondingly, SMArTIC comprises four components: an erroneous-trace filtering block, a related-trace clustering block, a learner, and a merger. We show through experiments that the quality of specification mining can be significantly improved using SMArTIC.
Mining specifications of malicious behavior
- In Proceedings of the 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on Foundations of Software Engineering
, 2007
"... Malware detectors require a specification of malicious behavior. Typically, these specifications are manually constructed by investigating known malware. We present an automatic technique to overcome this laborious manual process. Our technique derives such a specification by comparing the execution ..."
Abstract
-
Cited by 26 (7 self)
- Add to MetaCart
Malware detectors require a specification of malicious behavior. Typically, these specifications are manually constructed by investigating known malware. We present an automatic technique to overcome this laborious manual process. Our technique derives such a specification by comparing the execution behavior of a known malware against the execution behaviors of a set of benign programs. In other words, we mine the malicious behavior present in a known malware that is not present in a set of benign programs. The output of our algorithm can be used by malware detectors to detect malware variants. Since our algorithm provides a succinct description of malicious behavior present in a malware, it can also be used by security analysts for understanding the malware. We have implemented a prototype based on our algorithm and tested it on several malware programs. Experimental results obtained from our prototype indicate that our algorithm is effective in extracting malicious behaviors that can be used to detect malware variants.
Static Specification Mining Using Automata-based Abstractions
- Proc. of ISSTA’07, ACM
, 2007
"... We present a novel approach to client-side mining of temporal API specifications based on static analysis. Specifically, we present an interprocedural analysis over a combined domain that abstracts both aliasing and event sequences for individual objects. The analysis uses a new family of automata-b ..."
Abstract
-
Cited by 26 (1 self)
- Add to MetaCart
We present a novel approach to client-side mining of temporal API specifications based on static analysis. Specifically, we present an interprocedural analysis over a combined domain that abstracts both aliasing and event sequences for individual objects. The analysis uses a new family of automata-based abstractions to represent unbounded event sequences, designed to disambiguate distinct usage patterns and merge similar usage patterns. Additionally, our approach includes an algorithm that summarizes abstract traces based on automata clusters, and effectively rules out spurious behaviors. We show experimental results mining specifications from a number of Java clients and APIs. The results indicate that effective static analysis for client-side mining requires fairly precise treatment of aliasing and abstract event sequences. Based on the results, we conclude that static client-side specification mining shows promise as a complement or alternative to dynamic approaches.
Muvi: Automatically inferring multi-variable access correlations and detecting related semantic and concurrency bugs
- In Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP07
, 2007
"... Software defects significantly reduce system dependability. Among various types of software bugs, semantic and concurrency bugs are two of the most difficult to detect. This paper proposes a novel method, called MUVI, that detects an important class of semantic and concurrency bugs. MUVI automatical ..."
Abstract
-
Cited by 25 (6 self)
- Add to MetaCart
Software defects significantly reduce system dependability. Among various types of software bugs, semantic and concurrency bugs are two of the most difficult to detect. This paper proposes a novel method, called MUVI, that detects an important class of semantic and concurrency bugs. MUVI automatically infers commonly existing multi-variable access correlations through code analysis and then detects two types of related bugs: (1) inconsistent updates—correlated variables are not updated in a consistent way, and (2) multivariable concurrency bugs—correlated accesses are not protected in the same atomic sections in concurrent programs. We evaluate MUVI on four large applications: Linux, Mozilla, MySQL, and PostgreSQL. MUVI automatically infers more than 6000 variable access correlations with high accuracy (83%). Based on the inferred correlations, MUVI detects 39 new inconsistent update semantic bugs from the latest versions of these applications, with 17 of them recently confirmed by the developers based on our reports. We also implemented MUVI multi-variable extensions to two representative data race bug detection methods (lockset and happens-before). Our evaluation on five real-world multi-variable concurrency bugs from Mozilla and MySQL shows that the MUVI-extension correctly identifies the root causes of four out of the five multi-variable concurrency bugs with 14 % additional overhead on average. Interestingly, MUVI also helps detect four new multi-variable concurrency bugs in Mozilla that have never been reported before. None of the nine bugs can be identified correctly by the original race detectors without our MUVI extensions.

