Results 1  10
of
99
Compiler Transformations for HighPerformance Computing
 ACM COMPUTING SURVEYS
, 1994
"... In the last three decades a large number of compiler transformations for optimizing programs have been implemented. Most optimization for uniprocessors reduce the number of instructions executed by the program using transformations based on the analysis of scalar quantities and dataflow techniques. ..."
Abstract

Cited by 416 (3 self)
 Add to MetaCart
In the last three decades a large number of compiler transformations for optimizing programs have been implemented. Most optimization for uniprocessors reduce the number of instructions executed by the program using transformations based on the analysis of scalar quantities and dataflow techniques. In contrast, optimization for
A fast algorithm for finding dominators in a flowgraph
 ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS
, 1979
"... A fast algoritbm for finding dominators in a flowgraph is presented. The algorithm uses depthfirst search and an efficient method of computing functions defined on paths in trees. A simple implementation of the algorithm runs in O(m log n) time, where m is the number of edges and n is the number o ..."
Abstract

Cited by 184 (5 self)
 Add to MetaCart
A fast algoritbm for finding dominators in a flowgraph is presented. The algorithm uses depthfirst search and an efficient method of computing functions defined on paths in trees. A simple implementation of the algorithm runs in O(m log n) time, where m is the number of edges and n is the number of vertices in the problem graph. A more sophisticated implementation runs in O(ma(m, n)) time, where a(m, n) is a functional inverse of Ackermann's function. Both versions of the algorithm were implemented in Algol W, a Stanford University version of Algol, and tested on an IBM 370/168. The programs were compared with an implementation by Purdom and Moore of a straightforward O(mn)time algorithm, and with ~a bit vector algorithm described by Aho and Ullman. The fast algorithm beat the straightforward algorithm and the bit vector algorithm on all but the smallest graphs tested.
Lazy Code Motion
, 1992
"... We present a bitvector algorithm for the optimal and economical placement of computations within flow graphs, which is as efficient as standard unidirectional analyses. The point of our algorithm is the decomposition of the bidirectional structure of the known placement algorithms into a sequenc ..."
Abstract

Cited by 170 (20 self)
 Add to MetaCart
We present a bitvector algorithm for the optimal and economical placement of computations within flow graphs, which is as efficient as standard unidirectional analyses. The point of our algorithm is the decomposition of the bidirectional structure of the known placement algorithms into a sequence of a backward and a forward analysis, which directly implies the efficiency result. Moreover, the new compositional structure opens the algorithm for modification: two further unidirectional analysis components exclude any unnecessary code motion. This laziness of our algorithm minimizes the register pressure, which has drastic effects on the runtime behaviour of the optimized programs in practice, where an economical use of registers is essential.
Optimal Code Motion: Theory and Practice
, 1993
"... An implementation oriented algorithm for lazy code motion is presented that minimizes the number of computations in programs while suppressing any unnecessary code motion in order to avoid superfluous register pressure. In particular, this variant of the original algorithm for lazy code motion works ..."
Abstract

Cited by 115 (18 self)
 Add to MetaCart
An implementation oriented algorithm for lazy code motion is presented that minimizes the number of computations in programs while suppressing any unnecessary code motion in order to avoid superfluous register pressure. In particular, this variant of the original algorithm for lazy code motion works on flowgraphs whose nodes are basic blocks rather than single statements, as this format is standard in optimizing compilers. The theoretical foundations of the modified algorithm are given in the first part, where trefined flowgraphs are introduced for simplifying the treatment of flowgraphs whose nodes are basic blocks. The second part presents the `basic block' algorithm in standard notation, and gives directions for its implementation in standard compiler environments. Keywords Elimination of partial redundancies, code motion, data flow analysis (bitvector, unidirectional, bidirectional), nondeterministic flowgraphs, trefined flow graphs, critical edges, lifetimes of registers, com...
Faster scaling algorithms for general graphmatching problems
 JOURNAL OF THE ACM
, 1991
"... An algorithm for minimumcost matching on a general graph with integral edge costs is presented. The algorithm runs in time close to the fastest known bound for maximumcardinality matching. Specifically, let n, m, and N denote the number of vertices, number of edges, and largest magnitude of a cost ..."
Abstract

Cited by 111 (3 self)
 Add to MetaCart
(Show Context)
An algorithm for minimumcost matching on a general graph with integral edge costs is presented. The algorithm runs in time close to the fastest known bound for maximumcardinality matching. Specifically, let n, m, and N denote the number of vertices, number of edges, and largest magnitude of a cost, respectively. The best known time bound for maximumcardinal ity matching M 0 ( Am). The new algorithm for minimumcost matching has time bound 0 ( in a ( m, n)Iog n m log ( nN)). A slight modification of the new algorithm finds a maximumcardinality matching in 0 ( fire) time. Other applications of the new algorlthm are given, mchrding an efficient implementation of Christofides ’ traveling salesman approximation algorithm and efficient solutions to update problems that require the linear programming duals for matching.
Nearest Common Ancestors: A survey and a new distributed algorithm
, 2002
"... Several papers describe linear time algorithms to preprocess a tree, such that one can answer subsequent nearest common ancestor queries in constant time. Here, we survey these algorithms and related results. A common idea used by all the algorithms for the problem is that a solution for complete ba ..."
Abstract

Cited by 89 (11 self)
 Add to MetaCart
(Show Context)
Several papers describe linear time algorithms to preprocess a tree, such that one can answer subsequent nearest common ancestor queries in constant time. Here, we survey these algorithms and related results. A common idea used by all the algorithms for the problem is that a solution for complete balanced binary trees is straightforward. Furthermore, for complete balanced binary trees we can easily solve the problem in a distributed way by labeling the nodes of the tree such that from the labels of two nodes alone one can compute the label of their nearest common ancestor. Whether it is possible to distribute the data structure into short labels associated with the nodes is important for several applications such as routing. Therefore, related labeling problems have received a lot of attention recently.
Internet Packet Filter Management and Rectangle Geometry
, 2001
"... We consider rule sets for internet packet routing and filtering, where each rule consists of a range of source addresses, a range of destination addresses, a priority, and an action. A given packet should be handled by the action from the maximum priority rule that matches its source and destination ..."
Abstract

Cited by 85 (1 self)
 Add to MetaCart
(Show Context)
We consider rule sets for internet packet routing and filtering, where each rule consists of a range of source addresses, a range of destination addresses, a priority, and an action. A given packet should be handled by the action from the maximum priority rule that matches its source and destination. We describe new data structures for quickly finding the rule matching an incoming packet, in nearlinear space, and a new algorithm for determining whether a rule set contains any conflicts, in time O(n 3/2 ). 1 Introduction The working of the current Internet and its posited evolution depend on efficient packet filtering mechanisms: databases of rules, maintained at various parts of the network, which use patterns to filter out sets of IP packets and specify actions to be performed on those sets. Typical filter patterns are based on packet header information such as the source or destination IP addresses. The actions to be performed depend on where the packet filtering is performed i...
Escape Analysis: Correctness Proof, Implementation and Experimental Results
 In Conference Record of the 25th Annual ACM Symposium on Principles of Programming Languages
, 1998
"... We describe an escape analysis [32, 14], used to determine whether the lifetime of data exceeds its static scope. We give a new correctness proof starting directly from a semantics. Contrary to previous proofs, it takes into account all the features of functional languages, including imperative fea ..."
Abstract

Cited by 67 (3 self)
 Add to MetaCart
(Show Context)
We describe an escape analysis [32, 14], used to determine whether the lifetime of data exceeds its static scope. We give a new correctness proof starting directly from a semantics. Contrary to previous proofs, it takes into account all the features of functional languages, including imperative features and polymorphism. The analysis has been designed so that it can be implemented under the small complexity bound of O(n log 2 n) where n is the size of the analyzed program. We have included it in the Caml Special Light compiler (an implementation of ML), and applied it to very large programs. We plan to apply these techniques to the Java programming language. Escape analysis has been applied to stack allocation. We improve the optimization technique by determining minimal lifetime for stack allocated data, and using inlining. We manage to stack allocate 25% of data in the theorem prover Coq. We analyzed the effect of this optimization, and noticed that its main effect is to improve ...
Optimal preprocessing for answering online product queries
, 1987
"... We examine the amount of preprocessing needed for answering certain online queries as fast as possible. We start with the following basic problem. Suppose we are given a semigroup (S,°). Let s 1,..., sn be elements of S. We want to answer online queries of the form, "What is the product s ..."
Abstract

Cited by 61 (3 self)
 Add to MetaCart
We examine the amount of preprocessing needed for answering certain online queries as fast as possible. We start with the following basic problem. Suppose we are given a semigroup (S,°). Let s 1,..., sn be elements of S. We want to answer online queries of the form, &quot;What is the product s i°s i +1 ° j −1 j n 1... °s °s? &quot; for any give ≤ i ≤ j ≤ n. We show that a preprocessing of Θ(n λ(k,n)) time and space is both necessary and sufficient to answer each such query in at most k steps, for any fixed k. The function λ(k,.) is the inverse of a certain function at the �k /2 �−th level of the primitive recursive hierarchy. In case linear preprocessing is desired, we show that one can answer each such query in at most O (α(n)) steps and that this is best possible. The function α(n) is the inverse Ackermann function. We also consider the following extended problem. Let T be a tree with an element of S associated with each of its vertices. We want to answer online queries of the form, &quot; What is the product of the elements associated with the vertices along the path from u to v? &quot; for any pair of vertices u and v in T. We derive results, which are similar to the above, for the preprocessing needed for answering such queries. All our sequential preprocessing algorithms can be parallelized efficiently to give optimal parallel algorithms which run in O (logn) time on a CREW PRAM. These parallel algorithms are optimal in both running time and total number of operations.