Results 1 - 10
of
45
Cloning-based context-sensitive pointer alias analysis using binary decision diagrams
- In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation
, 2004
"... This paper presents the first scalable context-sensitive, inclusionbased pointer alias analysis for Java programs. Our approach to context sensitivity is to create a clone of a method for every context of interest, and run a context-insensitive algorithm over the expanded call graph to get context-s ..."
Abstract
-
Cited by 194 (11 self)
- Add to MetaCart
This paper presents the first scalable context-sensitive, inclusionbased pointer alias analysis for Java programs. Our approach to context sensitivity is to create a clone of a method for every context of interest, and run a context-insensitive algorithm over the expanded call graph to get context-sensitive results. For precision, we generate a clone for every acyclic path through a program’s call graph, treating methods in a strongly connected component as a single node. Normally, this formulation is hopelessly intractable as a call graph often has 10 14 acyclic paths or more. We show that these exponential relations can be computed efficiently using binary decision diagrams (BDDs). Key to the scalability of the technique is a context numbering scheme that exposes the commonalities across contexts. We applied our algorithm to the most popular applications available on Sourceforge, and found that the largest programs, with hundreds of thousands of Java bytecodes, can be analyzed in under 20 minutes. This paper shows that pointer analysis, and many other queries and algorithms, can be described succinctly and declaratively using Datalog, a logic programming language. We have developed a system called bddbddb that automatically translates Datalog programs into highly efficient BDD implementations. We used this approach to develop a variety of context-sensitive algorithms including side effect analysis, type analysis, and escape analysis.
A Schema for Interprocedural Modification Side-Effect Analysis With Pointer Aliasing
- In Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation
, 2001
"... The first interprocedural modification side-effects analysis for C (MOD_C) that obtains better than worst-case precision on programs with general-purpose pointer usage is presented with empirical results. The analysis consists of an algorithm schema corresponding to a family of MODC algorithms with ..."
Abstract
-
Cited by 126 (13 self)
- Add to MetaCart
The first interprocedural modification side-effects analysis for C (MOD_C) that obtains better than worst-case precision on programs with general-purpose pointer usage is presented with empirical results. The analysis consists of an algorithm schema corresponding to a family of MODC algorithms with two independent phases: one for determining pointer-induced aliases and a subsequent one for propagating interprocedural side effects. These MOD_C algorithms are parameterized by the aliasing method used. The empirical results compare the performance of two dissimilar MOD_C algorithms: MOD_C(FSAlias) uses a flow-sensitive, calling-context-sensitive interprocedural alias analysis [LR92]; MOD_C(FIAlias) uses a flow-insensitive, calling-context-insensitive alias analysis which is much faster, but less accurate. These two algorithms were profiled on 45 programs ranging in size from 250 to 30,000 lines of C code, and the results demonstrate dramatically the possible cost-precision tradeoffs. This first comparative implementation of MODC analyses offers insight into the differences between flow-/context-sensitive and flow-/context-insensitive analyses. The analysis cost versus precision tradeoffs in side-effect information obtained is reported. The results show surprisingly that the precision of flow-sensitive side-effect analysis is not always prohibitive in cost, and that the precision of flow-insensitive analysis is substantially better than worst-case estimates and seems sufficient for certain applications. On average MODC (FSAlias) for procedures and calls is in the range of 20% more precise than MODC (F IAlias); however, the performance was found to be at least an order of magnitude slower than MODC (F IAlias).
Scalable propagation-based call graph construction algorithms
- In Conference on Object-Oriented Programming Systems, Languages, and Applications
, 2000
"... ..."
Ultra-fast aliasing analysis using CLA: a million lines of C code in a second
, 2001
"... We describe the design and implementation of a system for very fast points-to analysis. On code bases of about a million lines of unpreprocessed C code, our system performs eldbased Andersen-style points-to analysis in less than a second and uses less than 10MB of memory. Our tw o main contributions ..."
Abstract
-
Cited by 104 (0 self)
- Add to MetaCart
We describe the design and implementation of a system for very fast points-to analysis. On code bases of about a million lines of unpreprocessed C code, our system performs eldbased Andersen-style points-to analysis in less than a second and uses less than 10MB of memory. Our tw o main contributions are a database-centric analysis architecture called compile-link-analyze (CLA), and a new algorithm for implementing dynamic transitive closure. Our points-to analysis system is built into a forward data-dependence analysis tool that is deployed within Lucent to help with consistent type modi cations to large legacy C code bases. 1.
Estimating the impact of scalable pointer analysis on optimization
- In Proceedings of the 8th International Static Analysis Symposium
, 2001
"... Abstract. This paper addresses the following question: Do scalable control-flow-insensitive pointer analyses provide the level of precision required to make them useful in compiler optimizations? We first describe alias frequency, a metric that measures the ability of a pointer analysis to determine ..."
Abstract
-
Cited by 52 (6 self)
- Add to MetaCart
Abstract. This paper addresses the following question: Do scalable control-flow-insensitive pointer analyses provide the level of precision required to make them useful in compiler optimizations? We first describe alias frequency, a metric that measures the ability of a pointer analysis to determine that pairs of memory accesses in C programs cannot be aliases. We believe that this kind of information is useful for a variety of optimizations, while remaining independent of a particular optimization. We show that control-flow and context insensitive analyses provide the same answer as the best possible pointer analysis on at least 95 % of all statically generated alias queries. In order to understand the potential run-time impact of the remaining 5 % queries, we weight the alias queries by dynamic execution counts obtained from profile data. Flow-insensitive pointer analyses are accurate on at least 95 % of the weighted alias queries as well. We then examine whether scalable pointer analyses are inaccurate on the remaining 5 % alias queries because they are context-insensitive. To this end, we have developed a new context-sensitive pointer analysis that also serves as a general engine for tracing the flow of values in C programs. To our knowledge, it is the first technique for performing context-sensitive analysis with subtyping that scales to millions of lines of code. We find that the new algorithm does not identify fewer aliases than the contextinsensitive analysis. 1
Demand-Driven Pointer Analysis
, 2001
"... Known algorithms for pointer analysis are "global" in the sense that they perform an exhaustive analysis of a program or program component. In this paper we introduce a demand-driven approach for pointer analysis. Specifically, we describe a demand-driven flow-insensitive, subset-based, context-inse ..."
Abstract
-
Cited by 48 (0 self)
- Add to MetaCart
Known algorithms for pointer analysis are "global" in the sense that they perform an exhaustive analysis of a program or program component. In this paper we introduce a demand-driven approach for pointer analysis. Specifically, we describe a demand-driven flow-insensitive, subset-based, context-insensitive points-to analysis. Given a list of pointer variables (a query), our analysis performs just enough computation to determine the points-to sets for these query variables. Using deductive reachability formulations of both the exhaustive and the demand-driven analyses, we prove that our algorithm is correct. We also show that our analysis is optimal in the sense that it does not do more work than necessary. We illustrate the feasibility and efficiency of our analysis with an implementation of demand-driven points-to analysis for computing the call-graphs of C programs with function pointers. The performance of our system varies substantially across benchmarks - the main factor is how much of the points-to graph must be computed to determine the call-graph. For some benchmarks, only a small part of the points-to graph is needed (e.g povray, emacs and gcc), and here we see more than a 10x speedup. For other benchmarks (e.g. burlap and gimp), we need to compute most (? 95%) of the points-to graph, and here the demanddriven algorithm is considerably slower, because using the demand-driven algorithm is a slow method of computing the full points-to graph.
Points-to and side-effect analyses for programs built with precompiled libraries
- INT. CONF. ON COMPILER CONSTRUCTION
, 2001
"... Large programs are typically built from separate modules. Traditional whole-program analysis cannot be used in the context of such modular development. In this paper we consider analysis for programs that combine client modules with precompiled library modules. We define separate analyses that allow ..."
Abstract
-
Cited by 29 (2 self)
- Add to MetaCart
Large programs are typically built from separate modules. Traditional whole-program analysis cannot be used in the context of such modular development. In this paper we consider analysis for programs that combine client modules with precompiled library modules. We define separate analyses that allow library modules and client modules to be analyzed separately from each other. Our target analyses are Andersen's points-to analysis for C [1] and a side-effect analysis based on it. We perform separate points-to and side-effect analyses of a library module by using worst-case assumptions about the rest of the program. We also show how to construct summary information about a library module and how to use it for separate analysis of client modules. Our empirical results show that the separate points-to analyses are practical even for large modules, and that the cost of constructing and storing library summaries is low. This work is a step toward incorporating practical points-to and side-effect analyses in realistic compilers and software productivity tools.
Staged information flow for JavaScript
- In ACM SIGPLAN Conference on Programming Language Design and Implementation
, 2009
"... Modern websites are powered by JavaScript, a flexible dynamic scripting language that executes in client browsers. A common paradigm in such websites is to include third-party JavaScript code in the form of libraries or advertisements. If this code were malicious, it could read sensitive information ..."
Abstract
-
Cited by 29 (2 self)
- Add to MetaCart
Modern websites are powered by JavaScript, a flexible dynamic scripting language that executes in client browsers. A common paradigm in such websites is to include third-party JavaScript code in the form of libraries or advertisements. If this code were malicious, it could read sensitive information from the page or write to the location bar, thus redirecting the user to a malicious page, from which the entire machine could be compromised. We present an information-flow based approach for inferring the effects that a piece of JavaScript has on the website in order to ensure that key security properties are not violated. To handle dynamically loaded and generated JavaScript, we propose a framework for staging information flow properties. Our framework propagates information flow through the currently known code in order to compute a minimal set of syntactic residual checks that are performed on the remaining code when it is dynamically loaded. We have implemented a prototype framework for staging information flow. We describe our techniques for handling some difficult features of JavaScript and evaluate our system’s performance on a variety of large realworld websites. Our experiments show that static information flow is feasible and efficient for JavaScript, and that our technique allows the enforcement of information-flow policies with almost no run-time overhead.
Making context-sensitive points-to analysis with heap cloning practical for the real world
- In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation
, 2007
"... Context-sensitive pointer analysis algorithms with full “heap cloning ” are powerful but are widely considered to be too expensive to include in production compilers. This paper shows, for the first time, that a context-sensitive, field-sensitive algorithm with full heap cloning (by acyclic call pat ..."
Abstract
-
Cited by 24 (3 self)
- Add to MetaCart
Context-sensitive pointer analysis algorithms with full “heap cloning ” are powerful but are widely considered to be too expensive to include in production compilers. This paper shows, for the first time, that a context-sensitive, field-sensitive algorithm with full heap cloning (by acyclic call paths) can indeed be both scalable and extremely fast in practice. Overall, the algorithm is able to analyze programs in the range of 100K-200K lines of C code in 1-3 seconds, takes less than 5 % of the time it takes for GCC to compile the code (which includes no whole-program analysis), and scales well across five orders of magnitude of code size. It is also able to analyze the Linux kernel (about 355K lines of code) in 3.1 seconds. The paper describes the major algorithmic and engineering design choices that are required to achieve these results, including (a) using flow-insensitive and unification-based analysis, which

