## An empirical evaluation of automated theorem provers in software certification (2004)

### Cached

### Download Links

Venue: | International Journal of AI tools |

Citations: | 14 - 7 self |

### BibTeX

@ARTICLE{Denney04anempirical,

author = {Ewen Denney and Bernd Fischer and Johann Schumann},

title = {An empirical evaluation of automated theorem provers in software certification},

journal = {International Journal of AI tools},

year = {2004},

pages = {2006}

}

### Years of Citing Articles

### OpenURL

### Abstract

We describe a system for the automated certification of safety properties of NASA software. The system uses Hoare-style program verification technology to generate proof obligations which are then processed by an automated first-order theorem prover (ATP). We discuss the unique requirements this application places on the ATPs, focusing on automation, proof checking, and usability. For full automation, however, the obligations must be aggressively preprocessed and simplified, and we demonstrate how the individual simplification stages, which are implemented by rewriting, influence the ability of the ATPs to solve the proof tasks. Our results are based on 13 certification experiments that lead to more than 25,000 proof tasks which have each been attempted by Vampire, Spass, e-setheo, and Otter. The proofs found by Otter have been proof-checked by IVY. 1

### Citations

1091 | Proof-Carrying Code
- Necula
- 1997
(Show Context)
Citation Context ...g framework result in a substantially larger experimental basis than reported before. A shorter version of this paper appears as [5]. Related Work Our approach is related to proof-carrying code (PCC) =-=[21]-=-. PCC works on the machine-code level instead of the source-code level (as we do) and concentrates on very simple safety policies (mainly array-bounds safety) which leads to comparatively simple proof... |

518 | Extended Static Checking for Java
- Flanagan, Leino, et al.
- 2002
(Show Context)
Citation Context ... can use different ATPs but relies heavily on term rewriting and user guidance. Sunrise [15] is a fully automatic system but uses custom-designed tactics in HOL to discharge the obligations. ESC/Java =-=[10]-=- is an automatic verification system but relies on the user to provide additional information on the program, e.g., loop invariants. Houdini [9] is an automatic annotation assistant which guesses inva... |

268 | The design and implementation of a certifying compiler
- Necula, Lee
- 1998
(Show Context)
Citation Context ... concentrates on very simple safety policies (mainly array-bounds safety) which leads to comparatively simple proof obligations. Hence, PCC is complementary to our approach, and a certifying compiler =-=[23]-=- could be used to ensure that the compilation step does not compromise the demonstrated safety policies. PCC also spawned an entire cottage industry of proof checkers, e.g., [1]; however, these use va... |

223 |
Automated Theorem Proving: A Logical Basis
- Loveland
- 1978
(Show Context)
Citation Context ... failing proof tasks (black) for the different simplification stages (prover: e-setheo-csp03F). N denotes the total number of proof tasks at each stage. ward algorithm similar to the one described in =-=[16]-=-. Flotter uses a highly elaborate conversion algorithm which performs many simplifications and avoids exponential increase in the number of generated clauses. This effect is most visible on the unsimp... |

172 |
A.: The design and implementation of VAMPIRE
- Riazanov, Voronkov
- 2002
(Show Context)
Citation Context ... instead of the clausifier provided by the TPTP toolset [30]. e-setheo-new is a recent development version with several improvements over the original e-setheo-csp03 version. Both versions of Vampire =-=[27]-=- have been taken directly “out of the box”—they are the versions which were used at CASC-19. Spass 2.1 was obtained from the developer’s website [31]. For comparison purposes, we also used Otter V3.2 ... |

158 | Towards a Mathematical Science of Computation
- McCarthy
- 1962
(Show Context)
Citation Context ...n, in particular array-expressions and conditional expressions, encoding the necessary parts of the language semantics. The rewrite system T array adds rewrite formulations of McCarthy’s array axioms =-=[17]-=-, i.e., sel(upd(a, i, v), j) ❀ i = j ? v : sel(a, j) for one-dimensional arrays and similar forms for higher-dimensional arrays. Some safety policies are formulated using arrays of a given dimensional... |

119 |
an annotation assistant for ESC/Java
- Houdini
- 2001
(Show Context)
Citation Context ...1: Certification system architecture relies on term rewriting and user guidance. Sunrise [14] is a fully automatic system but uses custom-designed tactics in HOL to discharge the obligations. Houdini =-=[9]-=- is a similar system. There the generated proof obligations are discharged by ESC/Java but again, this relies on a significant amount of user interaction. 2 System Architecture Our certification tool ... |

107 | An Industrial Strength Theorem Prover for a Logic Based on Common Lisp
- Kaufmann, Moore
- 1997
(Show Context)
Citation Context ...notable exception is the IVY system [18] that we used in our experiments. IVY combines a clausifier and the Otter theorem prover with a proof checker. Because IVY is implemented within the ACL2 logic =-=[15]-=-, both the clausifier and the proof checker have been verified. IVY thus provides the same functionality as a verified prover for first-order logic, but achieves relatively good performance by using O... |

59 |
Autobayes: a system for generating data analysis programs from statistical models
- Fischer, Schumann
(Show Context)
Citation Context ...ded the range of both algorithms and safety properties which can be certified; in particular, our approach is now fully integrated with the AUTOFILTER system [35] as well as with the AUTOBAYES system =-=[11]-=- and the certification process is now completely automated. We have also implemented a new generic VCG which can be customized for a given safety policy and which directly processes the internal code ... |

54 | SPASS & FLOTTER, version 0.42
- Weidenbach, Gaede, et al.
(Show Context)
Citation Context ...provers participated in the CASC-19 [29] prover competition in the FOL category. We used two versions of e-setheo [20] which were both derived from the CASC version. For e-setheo-csp03F, Flotter V2.1 =-=[31, 32]-=- was used to convert the formulas into a set of clauses instead of the clausifier provided by the TPTP toolset [30]. e-setheo-new is a recent development version with several improvements over the ori... |

34 | The KIV Approach to Software Verification
- Reif
- 1995
(Show Context)
Citation Context ...vely simple proof obligations. PCC also spawned an entire cottage industry of proof checkers, e.g., [1]; however, these use various higher-order logics and so are not applicable for our purposes. KIV =-=[24, 25]-=- is an interactive verification environment which can use ATPs but heavily 2Synthesis Specification Synthesizer Annotated Code Propagator Propagated Code Analysis Safety policy VCG VCs Simplifier SVC... |

32 | Correctness of Source-level Safety Policies
- Denney, Fischer
- 2003
(Show Context)
Citation Context ...of rules and auxiliary definitions which are specifically designed to show that programs satisfy the safety property of interest. The distinction between safety properties and policies is explored in =-=[3]-=-. We further distinguish between language-specific and domain-specific properties and policies. Language-specific properties can be expressed in the constructs of the underlying programming language i... |

29 | Synthesizing certified code
- Whalen, Schumann, et al.
- 2002
(Show Context)
Citation Context ...ection 5 then discusses the proof checking experiments, and Section 6 looks at traceability issues. Finally, Section 7 draws some conclusions. Conceptually, this paper continues the work described in =-=[33, 34]-=- but the actual implementation of the certification system has been completely revised and substantially extended. We have expanded the range of both algorithms and safety properties which can be cert... |

29 |
Automating the Implementation of Kalman Filter Algorithms
- Whittle, Schumann
(Show Context)
Citation Context ... and substantially extended. We have expanded the range of both algorithms and safety properties which can be certified; in particular, our approach is now fully integrated with the AUTOFILTER system =-=[35]-=- as well as with the AUTOBAYES system [11] and the certification process is now completely automated. We have also implemented a new generic VCG which can be customized for a given safety policy and w... |

27 | G.: Theorem proving in large theories
- Reif, Schellhorn
- 1998
(Show Context)
Citation Context ...vely simple proof obligations. PCC also spawned an entire cottage industry of proof checkers, e.g., [1]; however, these use various higher-order logics and so are not applicable for our purposes. KIV =-=[24, 25]-=- is an interactive verification environment which can use ATPs but heavily 2Synthesis Specification Synthesizer Annotated Code Propagator Propagated Code Analysis Safety policy VCG VCs Simplifier SVC... |

23 | Automated Theorem Proving in Software Engineering
- Schumann
- 2001
(Show Context)
Citation Context ...ations can be found at http://ase.arc.nasa.gov/autobayes/ijcar. 3.2 Simplification Proof task simplification is an important and integral part of our overall architecture. However, as observed before =-=[12, 8, 28]-=-, simplifications—even on the purely propositional level—can have a significant impact on the performance of a theorem prover. In order to evaluate this impact, we used six different rewrite-based sim... |

22 | Using Automated Theorem Provers to Certify Auto-generated Aerospace Software
- Denney, Fischer, et al.
- 2004
(Show Context)
Citation Context ...revious version. All these improvements and extensions to the underlying framework result in a substantially larger experimental basis than reported before. A shorter version of this paper appears as =-=[5]-=-. Related Work Our approach is related to proof-carrying code (PCC) [22]. PCC works on the machine-code level instead of the source-code level (as we do) and concentrates on very simple safety policie... |

21 | Deduction-based software component retrieval
- Fischer, Schumann, et al.
- 1998
(Show Context)
Citation Context ...ations can be found at http://ase.arc.nasa.gov/autobayes/ijcar. 3.2 Simplification Proof task simplification is an important and integral part of our overall architecture. However, as observed before =-=[12, 8, 28]-=-, simplifications—even on the purely propositional level—can have a significant impact on the performance of a theorem prover. In order to evaluate this impact, we used six different rewrite-based sim... |

21 | Trustworthy tools for trustworthy programs: A verified verification condition generator
- Homeier, Martin
- 1994
(Show Context)
Citation Context ...ysis Safety policy VCG VCs Simplifier SVCs Certification Theorem Prover Proof Proof Checker Certificate Figure 1: Certification system architecture relies on term rewriting and user guidance. Sunrise =-=[14]-=- is a fully automatic system but uses custom-designed tactics in HOL to discharge the obligations. Houdini [9] is a similar system. There the generated proof obligations are discharged by ESC/Java but... |

21 | an annotation assistant for Esc/Java - Flanagan, Leino, et al. - 2001 |

15 | Automated Deduction: A Basis for Applications - Schmidt - 1998 |

12 | Machine-checking the Java Specification: Proving Type-Safety - Oheimb, Nipkow - 1999 |

10 | What makes a Code Review Trustworthy
- Nelson, Schumann
- 2004
(Show Context)
Citation Context ...s important that domain experts can assess the evidence for successful checks of the safety properties and any places where it is violated. Safety checks are typically carried out during code reviews =-=[24]-=-, where reviewers look in detail at each line of the code and check the individual safety properties statement by statement. The successful outcome of a code review, therefore, 19Denney, Fischer, Sch... |

7 | Adding Assurance to Automatically Generated Code
- Denney, Fischer, et al.
- 2004
(Show Context)
Citation Context ... structural complexity. By combining rewriting with state-of-the-art automated theorem proving, we obtain a safety certification tool which compares favorably with tools based on static analysis (see =-=[4]-=- for a comparison). Our current efforts focus on extending the certification system in a number of areas. One aim is to develop a certificate management system, along the lines of the Programatica pro... |

7 | Automatic derivation of statistical data analysis algorithms: Planetary nebulae and beyond
- Fischer, Hajian, et al.
(Show Context)
Citation Context ...nsive use of matrix operations. The other two examples are AUTOBAYES specifications which are part of a more comprehensive analysis of planetary nebula images taken by the Hubble Space Telescope (see =-=[7, 10]-=- for more details). segm describes an image segmentation problem for which an iterative (numerical) statistical clustering algorithm is synthesized. Finally, gauss fits an image against a two-dimensio... |

6 | A generic software safety document generator
- Denney, Venkatesan
- 2004
(Show Context)
Citation Context ...work on clausal normal form, which usually looses the important location information. In general, useful information extracted from an ATP can be used for purposes of autogenerating documentation. In =-=[6]-=-, we describe a safety documentation tool, which generates a natural language description explaining the safety of a program, by converting the VCs into text. This could be extended by carrying out so... |

6 | Tracing the origins of verification conditions
- Fraer
- 1996
(Show Context)
Citation Context ...y which relates this detailed information back to the specification and the safety policy, while drawing attention to specific areas of concern. Existing techniques for addressing the tracing problem =-=[13]-=-, however, need to be extended for our purposes. The required information about code locations needs to be threaded through all stages of our certification architecture (cf. Figure 1). Only then can t... |

6 | AutoBayes/CC — combining program synthesis with automatic code certification (system description
- Whalen, Schumann, et al.
- 2002
(Show Context)
Citation Context ...ection 5 then discusses the proof checking experiments, and Section 6 looks at traceability issues. Finally, Section 7 draws some conclusions. Conceptually, this paper continues the work described in =-=[33, 34]-=- but the actual implementation of the certification system has been completely revised and substantially extended. We have expanded the range of both algorithms and safety properties which can be cert... |

5 | An Empirical Evaluation of Automated Theorem - Denney, Fischer, et al. - 2004 |

4 |
Applying autobayes to the analysis of planetary nebulae images
- Fischer, Schumann
(Show Context)
Citation Context ...nsive use of matrix operations. The other two examples are AUTOBAYES specifications which are part of a more comprehensive analysis of planetary nebula images taken by the Hubble Space Telescope (see =-=[7, 10]-=- for more details). segm describes an image segmentation problem for which an iterative (numerical) statistical clustering algorithm is synthesized. Finally, gauss fits an image against a two-dimensio... |

4 | System description: Ivy
- McCune, Shumsky
- 2000
(Show Context)
Citation Context ...parately; the more manageable simplified verification conditions (SVCs) which result are then processed by a first order theorem prover. The resulting proofs can be sent to a proof checker, e.g., Ivy =-=[18]-=-. The structure of a typical safety obligation (after substitution reduction and simplification) is given in Figure 2. It corresponds to the initialization safety of an assignment within a nested loop... |

1 |
Otter—The CADE-13
- McCune, Wos
- 1997
(Show Context)
Citation Context ... have been taken directly “out of the box”—they are the versions which were used at CASC-19. Spass 2.1 was obtained from the developer’s website [31]. For comparison purposes, we also used Otter V3.2 =-=[19]-=-, which has been essentially unchanged since 1996. In the experiments, we used the default parameter settings and none of the special features of the provers. The only exception is Otter, where the de... |