Compiler Transformations for HighPerformance Computing
 ACM Computing Surveys
, 1994
Cited by 365 (4 self)
In the last three decades a large number of compiler transformations for optimizing programs have been implemented. Most optimization for uniprocessors reduce the number of instructions executed by the program using transformations based on the analysis of scalar quantities and dataflow techniques. In contrast, optimization for
Lazy Code Motion
, 1992
Cited by 157 (20 self)
We present a bitvector algorithm for the optimal and economical placement of computations within flow graphs, which is as efficient as standard unidirectional analyses. The point of our algorithm is the decomposition of the bidirectional structure of the known placement algorithms into a sequence of a backward and a forward analysis, which directly implies the efficiency result. Moreover, the new compositional structure opens the algorithm for modification: two further unidirectional analysis components exclude any unnecessary code motion. This laziness of our algorithm minimizes the register pressure, which has drastic effects on the runtime behaviour of the optimized programs in practice, where an economical use of registers is essential.
A fast algorithm for finding dominators in a flowgraph
 ACM Transactions on Programming Languages and Systems
, 1979
Cited by 144 (3 self)
A fast algoritbm for finding dominators in a flowgraph is presented. The algorithm uses depthfirst search and an efficient method of computing functions defined on paths in trees. A simple implementation of the algorithm runs in O(m log n) time, where m is the number of edges and n is the number of vertices in the problem graph. A more sophisticated implementation runs in O(ma(m, n)) time, where a(m, n) is a functional inverse of Ackermann's function. Both versions of the algorithm were implemented in Algol W, a Stanford University version of Algol, and tested on an IBM 370/168. The programs were compared with an implementation by Purdom and Moore of a straightforward O(mn)time algorithm, and with ~a bit vector algorithm described by Aho and Ullman. The fast algorithm beat the straightforward algorithm and the bit vector algorithm on all but the smallest graphs tested.
A Randomized LinearTime Algorithm to Find Minimum Spanning Trees
, 1994
Cited by 115 (7 self)
We present a randomized lineartime algorithm to find a minimum spanning tree in a connected graph with edge weights. The algorithm uses random sampling in combination with a recently discovered lineartime algorithm for verifying a minimum spanning tree. Our computational model is a unitcost randomaccess machine with the restriction that the only operations allowed on edge weights are binary comparisons.
Optimal Code Motion: Theory and Practice
, 1993
Cited by 112 (18 self)
An implementation oriented algorithm for lazy code motion is presented that minimizes the number of computations in programs while suppressing any unnecessary code motion in order to avoid superfluous register pressure. In particular, this variant of the original algorithm for lazy code motion works on flowgraphs whose nodes are basic blocks rather than single statements, as this format is standard in optimizing compilers. The theoretical foundations of the modified algorithm are given in the first part, where trefined flowgraphs are introduced for simplifying the treatment of flowgraphs whose nodes are basic blocks. The second part presents the `basic block' algorithm in standard notation, and gives directions for its implementation in standard compiler environments. Keywords Elimination of partial redundancies, code motion, data flow analysis (bitvector, unidirectional, bidirectional), nondeterministic flowgraphs, trefined flow graphs, critical edges, lifetimes of registers, com...
Faster scaling algorithms for general graphmatching problems
 JOURNAL OF THE ACM
, 1991
Cited by 84 (2 self)
An algorithm for minimumcost matching on a general graph with integral edge costs is presented. The algorithm runs in time close to the fastest known bound for maximumcardinality matching. Specifically, let n, m, and N denote the number of vertices, number of edges, and largest magnitude of a cost, respectively. The best known time bound for maximumcardinal ity matching M 0 ( Am). The new algorithm for minimumcost matching has time bound 0 ( in a ( m, n)Iog n m log ( nN)). A slight modification of the new algorithm finds a maximumcardinality matching in 0 ( fire) time. Other applications of the new algorlthm are given, mchrding an efficient implementation of Christofides ’ traveling salesman approximation algorithm and efficient solutions to update problems that require the linear programming duals for matching.
Nearest Common Ancestors: A survey and a new distributed algorithm
, 2002
Cited by 76 (12 self)
Several papers describe linear time algorithms to preprocess a tree, such that one can answer subsequent nearest common ancestor queries in constant time. Here, we survey these algorithms and related results. A common idea used by all the algorithms for the problem is that a solution for complete balanced binary trees is straightforward. Furthermore, for complete balanced binary trees we can easily solve the problem in a distributed way by labeling the nodes of the tree such that from the labels of two nodes alone one can compute the label of their nearest common ancestor. Whether it is possible to distribute the data structure into short labels associated with the nodes is important for several applications such as routing. Therefore, related labeling problems have received a lot of attention recently.
Internet Packet Filter Management and Rectangle Geometry
, 2001
Cited by 69 (1 self)
We consider rule sets for internet packet routing and filtering, where each rule consists of a range of source addresses, a range of destination addresses, a priority, and an action. A given packet should be handled by the action from the maximum priority rule that matches its source and destination. We describe new data structures for quickly finding the rule matching an incoming packet, in nearlinear space, and a new algorithm for determining whether a rule set contains any conflicts, in time O(n 3/2 ). 1 Introduction The working of the current Internet and its posited evolution depend on efficient packet filtering mechanisms: databases of rules, maintained at various parts of the network, which use patterns to filter out sets of IP packets and specify actions to be performed on those sets. Typical filter patterns are based on packet header information such as the source or destination IP addresses. The actions to be performed depend on where the packet filtering is performed i...
Escape Analysis: Correctness Proof, Implementation and Experimental Results
 In Conference Record of the 25th Annual ACM Symposium on Principles of Programming Languages
, 1998
Cited by 61 (2 self)
We describe an escape analysis [32, 14], used to determine whether the lifetime of data exceeds its static scope. We give a new correctness proof starting directly from a semantics. Contrary to previous proofs, it takes into account all the features of functional languages, including imperative features and polymorphism. The analysis has been designed so that it can be implemented under the small complexity bound of O(n log 2 n) where n is the size of the analyzed program. We have included it in the Caml Special Light compiler (an implementation of ML), and applied it to very large programs. We plan to apply these techniques to the Java programming language. Escape analysis has been applied to stack allocation. We improve the optimization technique by determining minimal lifetime for stack allocated data, and using inlining. We manage to stack allocate 25% of data in the theorem prover Coq. We analyzed the effect of this optimization, and noticed that its main effect is to improve ...