## Z-Ranking: Using Statistical Analysis to Counter the Impact of Static Analysis Approximations (2003)

### Cached

### Download Links

- [metacomp.stanford.edu]
- [www.stanford.edu]
- [www.cs.stanford.edu]
- [www.stanford.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proceedings of 10th Annual International Static Analysis Symposium |

Citations: | 38 - 2 self |

### BibTeX

@INPROCEEDINGS{Kremenek03z-ranking:using,

author = {Ted Kremenek and Dawson Engler},

title = {Z-Ranking: Using Statistical Analysis to Counter the Impact of Static Analysis Approximations},

booktitle = {In Proceedings of 10th Annual International Static Analysis Symposium},

year = {2003},

pages = {295--315},

publisher = {Springer}

}

### Years of Citing Articles

### OpenURL

### Abstract

This paper explores z-ranking, a technique to rank error reports emitted by static program checking analysis tools. Such tools often use approximate analysis schemes, leading to false error reports. These reports can easily render the error checker useless by hiding real errors amidst the false, and by potentially causing the tool to be discarded as irrelevant. Empirically, all tools that effectively find errors have false positive rates that can easily reach 30--100%. Z-ranking employs a simple statistical model to rank those error messages most likely to be true errors over those that are least likely. This paper demonstrates that z-ranking applies to a range of program checking problems and that it performs up to an order of magnitude better than randomized ranking. Further, it has transformed previously unusable analysis tools into e#ective program error finders.

### Citations

386 | Automatically validating temporal safety properties of interfaces
- Ball, Rajamani
- 2001
(Show Context)
Citation Context ...act of these (inevitable) analysis mistakes. Program checking takes on different forms, but generally analysis results can be conceived as reports emitted by the analysis tool that take on two forms: =-=(1)-=- locations in the program that satisfied a checked property and (2) locations that violated the checked property. In thisspaper the former will be referred to as successful checks and the latter as fa... |

357 | A Theory of Type Qualifiers
- Foster, Fähndrich, et al.
- 1999
(Show Context)
Citation Context ...errors amidst the false, and by potentially causing the tool to be discarded as irrelevant. Empirically, all tools that effectively find errors have false positive rates that can easily reach 30–100=-=% [8, 4,6, 7]-=-. This paper examines how to use statistical techniques to manage the impact of these (inevitable) analysis mistakes. Program checking takes on different forms, but generally analysis results can be c... |

338 | Checking system rules using system-specific, programmer-written compiler extensions
- Engler, Chelf, et al.
- 2000
(Show Context)
Citation Context ...errors amidst the false, and by potentially causing the tool to be discarded as irrelevant. Empirically, all tools that effectively find errors have false positive rates that can easily reach 30–100=-=% [8, 4,6, 7]-=-. This paper examines how to use statistical techniques to manage the impact of these (inevitable) analysis mistakes. Program checking takes on different forms, but generally analysis results can be c... |

336 | A first step towards automated detection of buffer overrun vulnerabilities
- WAGNER, FOSTER, et al.
- 2000
(Show Context)
Citation Context ...errors amidst the false, and by potentially causing the tool to be discarded as irrelevant. Empirically, all tools that effectively find errors have false positive rates that can easily reach 30–100=-=% [8, 4,6, 7]-=-. This paper examines how to use statistical techniques to manage the impact of these (inevitable) analysis mistakes. Program checking takes on different forms, but generally analysis results can be c... |

301 | Bugs as deviant behavior: A general approach to inferring errors in systems code
- ENGLER, CHEN, et al.
- 2001
(Show Context)
Citation Context ...king in previous work [11], but did not explore it thoroughly and provided no experimental validation; this paper does both. Furthermore, z-ranking shares many of the same insights as belief analysis =-=[12]-=-; the key difference here is z-ranking is designed explicitly for generic static analysis error ranking. More explicitly, this paper explores the following three hypotheses: weak hypothesis: error rep... |

250 | An empirical study of operating systems errors
- Chou, Yang, et al.
(Show Context)
Citation Context ...ed few errors (or failed checks) in total and (2) led to many successful checks. There are two reasons for this. First, code has relatively few errors — typical aggregate error rates are less than 5=-=% [9]-=-. We expect valid analysis facts to generate few error reports. Second, in our experience, analysis approximations that interact badly with code will often lead to explosions of (invalid) error report... |

244 | Type-based race detection for java
- Flanagan, Freund
- 2000
(Show Context)
Citation Context |

195 | A system and language for building system-specific, static analyses
- Hallem, Chelf, et al.
- 2002
(Show Context)
Citation Context ... such steps can be taken with some assurance that when the gamble goes bad, the resulting invalid errors can be relegated below true errors.sWe provided a cursory sketch of z-ranking in previous work =-=[11]-=-, but did not explore it thoroughly and provided no experimental validation; this paper does both. Furthermore, z-ranking shares many of the same insights as belief analysis [12]; the key difference h... |

112 | LCLint: A Tool for Using Specifications to Check Code
- Evans, Guttag, et al.
- 1994
(Show Context)
Citation Context ...rvations above. It works by (1) counting the number of successful checks versus unsuccessful checks; (2) computing a numeric value based on these frequency counts using the z-test statistic [10]; and =-=(3)-=- sorting error reports based on this number. Z-ranking works well in practice: on our measurements it performed better than randomized ranking 98.5% of the time. Moreover, within the first 10% of repo... |

80 |
The Statistical Analysis of Discrete Data
- Santner, Duffy
- 1989
(Show Context)
Citation Context ...tical or machine learning techniques. In both cases, one immediate approach would be to encode the prior using a Beta distribution [12], which is conjugate to the Binomial and Bernoulli distributions =-=[13]-=-. In this case, the prior would be represented by “imaginary” success/failure counts. These would then be combined directly with the observed success/failure counts and z-ranking could then be applied... |

40 | Path-sensitive program verification in polynomial time
- Das, Lerner, et al.
- 2002
(Show Context)
Citation Context ... on different forms, but generally analysis results can be conceived as reports emitted by the analysis tool that take on two forms: (1) locations in the program that satisfied a checked property and =-=(2)-=- locations that violated the checked property. In thisspaper the former will be referred to as successful checks and the latter as failed checks (i.e., error reports). The underlying observation of th... |

16 | Detecting races in relay ladder logic programs
- Aiken, Faehndrich, et al.
- 1998
(Show Context)
Citation Context ... checks, let s be the number of successes, f the number of failures, and b the number of real bugs. We define the error rate of a population of successes and failures as [9]: error rate = b/(s + f) . =-=(5)-=- The error rate corresponds to the ratio of the number of bugs found to the number of times a property was checked. Empirically we know that aggregatesAlgorithm 1 Z-Ranking Algorithm 1: APPLY: G to P ... |

1 |
Probability Models. Sixth edn
- Ross
- 1997
(Show Context)
Citation Context ...score. 1 More formally, each check is modeled as a Bernoulli random variable with probability pi of taking the value 1 (for a success). A sequence of checks is modeled using the Binomial distribution =-=[13]-=-. 2 In hypothesis testing, p0 is known as the null hypothesis.sNotationally we will denote the z-score for population PGi as PGi.z. With the above definitions, population ranking using z-scores become... |