Results 1 - 10
of
39
Automated Whitebox Fuzz Testing
"... Fuzz testing is an effective technique for finding security vulnerabilities in software. Traditionally, fuzz testing tools apply random mutations to well-formed inputs of a program and test the resulting values. We present an alternative whitebox fuzz testing approach inspired by recent advances in ..."
Abstract
-
Cited by 102 (12 self)
- Add to MetaCart
Fuzz testing is an effective technique for finding security vulnerabilities in software. Traditionally, fuzz testing tools apply random mutations to well-formed inputs of a program and test the resulting values. We present an alternative whitebox fuzz testing approach inspired by recent advances in symbolic execution and dynamic test generation. Our approach records an actual run of the program under test on a well-formed input, symbolically evaluates the recorded trace, and gathers constraints on inputs capturing how the program uses these. The collected constraints are then negated one by one and solved with a constraint solver, producing new inputs that exercise different control paths in the program. This process is repeated with the help of a code-coverage maximizing heuristic designed to find defects as fast as possible. We have implemented this algorithm in SAGE (Scalable, Automated, Guided Execution), a new tool employing x86 instruction-level tracing and emulation for whitebox fuzzing of arbitrary file-reading Windows applications. We describe key optimizations needed to make dynamic test generation scale to large input files and long execution traces with hundreds of millions of instructions. We then present detailed experiments with several Windows applications. Notably, without any format-specific knowledge, SAGE detects the MS07-017 ANI vulnerability, which was missed by extensive blackbox fuzzing and static analysis tools. Furthermore, while still in an early stage of development, SAGE has already discovered 30+ new bugs in large shipped Windows applications including image processors, media players, and file decoders. Several of these bugs are potentially exploitable memory access violations.
DSD-Crasher: A hybrid analysis tool for bug finding
- In ISSTA
, 2006
"... DSD-Crasher is a bug finding tool that follows a three-step approach to program analysis: D. Capture the program’s intended execution behavior with dynamic invariant detection. The derived invariants exclude many unwanted values from the program’s input domain. S. Statically analyze the program with ..."
Abstract
-
Cited by 36 (3 self)
- Add to MetaCart
DSD-Crasher is a bug finding tool that follows a three-step approach to program analysis: D. Capture the program’s intended execution behavior with dynamic invariant detection. The derived invariants exclude many unwanted values from the program’s input domain. S. Statically analyze the program within the restricted input domain to explore many paths. D. Automatically generate test cases that focus on reproducing the predictions of the static analysis. Thereby confirmed results are feasible. This three-step approach yields benefits compared to past two-step combinations in the literature. In our evaluation with third-party applications, we demonstrate higher precision over tools that lack a dynamic step and higher efficiency over tools that lack a static step.
Feature Identification: A Novel Approach and a Case Study
- ICSM 2005
, 2005
"... Feature identification is a well-known technique to identify subsets of a program source code activated when exercising a functionality. Several approaches have been proposed to identify features. We present an approach to feature identification and comparison for large object-oriented multi-threade ..."
Abstract
-
Cited by 32 (7 self)
- Add to MetaCart
Feature identification is a well-known technique to identify subsets of a program source code activated when exercising a functionality. Several approaches have been proposed to identify features. We present an approach to feature identification and comparison for large object-oriented multi-threaded programs using both static and dynamic data. We use processor emulation, knowledge filtering, and probabilistic ranking to overcome the difficulties of collecting dynamic data, i.e., imprecision and noise. We use model transformations to compare and to visualise identified features. We compare our approach with a naive approach and a concept analysis-based approach using a case study on a real-life large object-oriented multi-threaded program, Mozilla, to show the advantages of our approach. We also use the case study to compare processor emulation with statistical profiling.
A systematic survey of program comprehension through dynamic analysis
, 2008
"... Program comprehension is an important activity in software maintenance, as software must be sufficiently understood before it can be properly modified. The study of a program’s execution, known as dynamic analysis, has become a common technique in this respect and has received substantial attention ..."
Abstract
-
Cited by 22 (9 self)
- Add to MetaCart
Program comprehension is an important activity in software maintenance, as software must be sufficiently understood before it can be properly modified. The study of a program’s execution, known as dynamic analysis, has become a common technique in this respect and has received substantial attention from the research community, particularly over the last decade. These efforts have resulted in
Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval
- IEEE Trans. Software Eng
, 2007
"... Abstract—This paper recasts the problem of feature location in source code as a decision-making problem in the presence of uncertainty. The solution to the problem is formulated as a combination of the opinions of different experts. The experts in this work are two existing techniques for feature lo ..."
Abstract
-
Cited by 15 (7 self)
- Add to MetaCart
Abstract—This paper recasts the problem of feature location in source code as a decision-making problem in the presence of uncertainty. The solution to the problem is formulated as a combination of the opinions of different experts. The experts in this work are two existing techniques for feature location: a scenario-based probabilistic ranking of events and an information retrieval-based technique that uses latent semantic indexing. The combination of these two experts is empirically evaluated through several case studies, which use the source code of the Mozilla Web browser and the Eclipse integrated development environment. The results show that the combination of experts significantly improves the effectiveness of feature location when compared to each of the experts used independently. Index Terms—program understanding, feature identification, concept location, dynamic and static analyses, information retrieval, Latent Semantic Indexing, scenario-based probabilistic ranking, open source software.
Mutually enhancing test generation and specification inference
- In Proc. 3rd International Workshop on Formal Approaches to Testing of Software, volume 2931 of LNCS
, 2003
"... Abstract. Generating effective tests and inferring likely program specifications are both difficult and costly problems. We propose an approach in which we can mutually enhance the tests and specifications that are generated by iteratively applying each in a feedback loop. In particular, we infer li ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
Abstract. Generating effective tests and inferring likely program specifications are both difficult and costly problems. We propose an approach in which we can mutually enhance the tests and specifications that are generated by iteratively applying each in a feedback loop. In particular, we infer likely specifications from the executions of existing tests and use these specifications to guide automatic test generation. Then the existing tests, as well as the new tests, are used to infer new specifications in the subsequent iteration. The iterative process continues until there is no new test that violates specifications inferred in the previous iteration. Inferred specifications can guide test generation to focus on particular program behavior, reducing the scope of analysis; and newly generated tests can improve the inferred specifications. During each iteration, the generated tests that violate inferred specifications are collected to be inspected. These violating tests are likely to have a high probability of exposing faults or exercising new program behavior. Our hypothesis is that such a feedback loop can mutually enhance test generation and specification inference. 1
Hybrid Static-Dynamic Attacks against Software Protection Mechanisms
, 2005
"... Advances in reverse engineering and program analyses have made software extremely vulnerable to malicious host attacks. These attacks typically take the form of intellectual property violations, against which the software needs to be protected. The intellectual property that needs to be protected ca ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Advances in reverse engineering and program analyses have made software extremely vulnerable to malicious host attacks. These attacks typically take the form of intellectual property violations, against which the software needs to be protected. The intellectual property that needs to be protected can take on di#erent forms. The software might, e.g., consist itself of proprietary algorithms and datastructures or it could provide controlled access to copyrighted material. Therefore, in recent years, a number of techniques have been explored to protect software. Many of these techniques provide a reasonable level of security against static-only attacks. Many of them however fail to address the problem of dynamic or hybrid static-dynamic attacks. While this type of attack is already commonly used by black-hats, this is one of the first scientific papers to discuss the potential of these attacks through which an attacker can analyze, control and modify a program extensively. The concepts are illustrated through a case study of a recently proposed algorithm for software watermarking [6].
A Language for Specifying and Comparing Table Recognition Strategies
, 2004
"... Table recognition algorithms may be described by models of table location and struc-ture, and decisions made relative to these models. These algorithms are usually defined informally as a sequence of decisions with supporting data observations and transformations. In this investigation, we formalize ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Table recognition algorithms may be described by models of table location and struc-ture, and decisions made relative to these models. These algorithms are usually defined informally as a sequence of decisions with supporting data observations and transformations. In this investigation, we formalize these algorithms as strategies in an imitation game, where the goal of the game is to match table interpretations from a chosen procedure as closely as possible. The chosen procedure may be a person or persons producing ‘ground truth, ’ or an algorithm. To describe table recognition strategies we have defined the Recognition Strat-egy Language (RSL). RSL is a simple functional language for describing strategies as sequences of abstract decision types whose results are determined by any suit-able decision method. RSL defines and maintains interpretation trees, a simple data structure for describing recognition results. For each interpretation in an interpreta-tion tree, we annotate hypothesis histories which capture the creation, revision, and rejection of individual hypotheses, such as the logical type and structure of regions. We present a proof-of-concept using two strategies from the literature. We demon-strate how RSL allows strategies to be specified at the level of decisions rather than ii algorithms, and we compare results of our strategy implementations using new tech-niques. In particular, we introduce historical recall and precision metrics. Con-ventional recall and precision characterize hypotheses accepted after a strategy has finished. Historical recall and precision provide additional information by describing all generated hypotheses, including any rejected in the final result. iii
Library Miniaturization Using Static and Dynamic Information
- IN PROCEEDINGS OF IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE
, 2003
"... Moving to smaller libraries can be considered as a relevant task when porting software systems to limited-resource devices (e.g., hand-helds). Library miniaturization will be particularly effective if based on both dynamic (keeping into account dependencies exploited during application execution in ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
Moving to smaller libraries can be considered as a relevant task when porting software systems to limited-resource devices (e.g., hand-helds). Library miniaturization will be particularly effective if based on both dynamic (keeping into account dependencies exploited during application execution in a given user profile) and static (keeping into account all possible dependencies) information. This
Combined Static and Dynamic Automated Test Generation
"... In an object-oriented program, a unit test often consists of a sequence of method calls that create and mutate objects, then use them as arguments to a method under test. It is challenging to automatically generate sequences that are legal and behaviorally-diverse, that is, reaching as many differen ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
In an object-oriented program, a unit test often consists of a sequence of method calls that create and mutate objects, then use them as arguments to a method under test. It is challenging to automatically generate sequences that are legal and behaviorally-diverse, that is, reaching as many different program states as possible. This paper proposes a combined static and dynamic automated test generation approach to address these problems, for code without a formal specification. Our approach first uses dynamic analysis to infer a call sequence model from a sample execution, then uses static analysis to identify method dependence relations based on the fields they may read or write. Finally, both the dynamicallyinferred model (which tends to be accurate but incomplete) and the statically-identified dependence information (which tends to be conservative) guide a random test generator to create legal and behaviorally-diverse tests. Our Palus tool implements this testing approach. We compared its effectiveness with a pure random approach, a dynamic-random approach (without a static phase), and a static-random approach (without a dynamic phase) on several popular open-source Java programs. Tests generated by Palus achieved higher structural coverage and found more bugs. Palus is also internally used in Google. It has found 22 previouslyunknown bugs in four well-tested Google products.

