Results 1 - 10
of
17
Escape analysis for Java
- OOPSLA
, 1999
"... This paper presents a simple and efficient data flow algorithm for escape analysis of objects in Java programs to determine (i) if an object can be allocated on the stack; (ii) if an object is accessed only by a single thread duriing its lifetime, so that synchronization operations on that object ca ..."
Abstract
-
Cited by 241 (11 self)
- Add to MetaCart
This paper presents a simple and efficient data flow algorithm for escape analysis of objects in Java programs to determine (i) if an object can be allocated on the stack; (ii) if an object is accessed only by a single thread duriing its lifetime, so that synchronization operations on that object can be removed. We introduce a new program abstraction for escape analysis, the connection graph, that is used to establish reachability rela-tionships between objects and object references. We show that the connection graph can be summarized for each method such that the same summary information may be used effectively in different calling contexts. We present an interprocedural al-gorithm that uses the above property to efficiently compute the connection graph and identify the non-escaping objects for methods and threads. The experimental results, from a proto-type implementation of our framework in the IBM High Per-formance Compiler for Java, are very promising. The percent-age of objects that may be allocated on the stack exceeds 70% of all dynamically created objects in three out of the ten bench-marks (with a median of 19%), 11 % to 92 % of all lock oper-ations are eliminated in those ten programs (with a median of 5 l%), and the overall execution time reduction ranges from 2 % to 23 % (with a median of 7%) on a 333 MHz PowerPC workstation with 128 MB memory. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advant-age and that copies bear this notice and the full citation on the first page.
Stack allocation and synchronization optimizations for java using escape analysis
- ACM Transactions on Programming Languages and Systems
, 2003
"... This article presents an escape analysis framework for Java to determine (1) if an object is not reachable after its method of creation returns, allowing the object to be allocated on the stack, and (2) if an object is reachable only from a single thread during its lifetime, allowing unnecessary syn ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
This article presents an escape analysis framework for Java to determine (1) if an object is not reachable after its method of creation returns, allowing the object to be allocated on the stack, and (2) if an object is reachable only from a single thread during its lifetime, allowing unnecessary synchronization operations on that object to be removed. We introduce a new program abstraction for escape analysis, the connection graph, that is used to establish reachability relationships between objects and object references. We show that the connection graph can be succinctly summarized for each method such that the same summary information may be used in different calling contexts without introducing imprecision into the analysis. We present an interprocedural algorithm that uses the above property to efficiently compute the connection graph and identify the non-escaping objects for methods and threads. The experimental results, from a prototype implementation of our framework in the IBM High Performance Compiler for Java, are very promising. The percentage of objects that may be allocated on the stack exceeds 70 % of all dynamically created objects in the user code in three out of the ten benchmarks (with a median of 19%), 11 % to 92 % of all mutex lock operations are eliminated in those ten programs (with a median of 51%), and the overall execution time reduction ranges from 2 % to 23 % (with a median of 7%) on a 333 MHz PowerPC workstation with 512 MB memory.
Advanced Compiler Optimizations for Sparse Computations
- Journal of Parallel and Distributed Computing
, 1995
"... Regular data dependence checking on sparse codes usually results in very conservative estimates of actual dependences that will occur at run-time. Clearly, this is caused by the usage of compact data structures that are necessary to exploit sparsity in order to reduce storage requirements and comput ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
Regular data dependence checking on sparse codes usually results in very conservative estimates of actual dependences that will occur at run-time. Clearly, this is caused by the usage of compact data structures that are necessary to exploit sparsity in order to reduce storage requirements and computational time. However, if the compiler is presented with dense code and automatically converts it into code that operates on sparse data structures, then the dependence information obtained by analysis on the original code can be used to exploit potential concurrency in the generated code. In this paper we present synchronization generating and manipulating techniques that are based on this concept. 1 Introduction Nowadays compiler support usually fails to optimize sparse codes because compact storage formats are used for sparse matrices in order to exploit sparsity with respect to storage requirements and computational time. This exploitation results in complicated code in which, for exam...
Redundant Synchronization Elimination for DOACROSS Loops
- IEEE Transactions on Parallel and Distributed System
, 1994
"... Synchronizations are necessary when there are dependences between concurrent processes. However, many synchronizations are redundant because the composite effect of the other synchronizations may have already covered them. In this paper, we investigate the problem of redundant synchronization elimin ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
Synchronizations are necessary when there are dependences between concurrent processes. However, many synchronizations are redundant because the composite effect of the other synchronizations may have already covered them. In this paper, we investigate the problem of redundant synchronization elimination in DOACROSS loops and present an efficient algorithm that identifies redundant synchronizations in multiply-nested DOACROSS loops with multiple statements and control branches. Nonuniformity in redundancy at the boundaries of the loop iteration space, which is caused primarily by the backward dependence directions in some inner loops, is also addressed. The necessary and sufficient condition under which the synchronization is uniformly redundant in a multiply-nested loop is also described. These results allow a compiler to generate efficient data synchronization instructions for DOACROSS loops. Index Terms: DOACROSS, data dependence, redundant synchronization elimination. This work w...
Compiler Optimizations For Parallel Loops With Fine-Grained Synchronization
, 1994
"... this paper, we presented and evaluated a new runtime algorithm to parallelize these loops. Our scheme handles any type of data dependence pattern without requiring any special architectural support. Furthermore, compared to an older scheme with the same generality, it speeds up execution by allowing ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
this paper, we presented and evaluated a new runtime algorithm to parallelize these loops. Our scheme handles any type of data dependence pattern without requiring any special architectural support. Furthermore, compared to an older scheme with the same generality, it speeds up execution by allowing the reuse of the inspector phase across loop invocations, allowing partial overlap of dependent iterations, and optimizing the inspector for high locality and low communication. We have evaluated our algorithm with an extensive set of loops running on the 32-processor Cedar shared-memory multiprocessor. We used loops with varying parameters, such as number of iterations and references. The results show that our algorithm gives good speedups that reach 13 if the inspector is not reused and 26 if it is. Furthermore, our algorithm outperforms Zhu-Yew's scheme [ZY84] in nearly all cases, reaching a 37-fold speedup when the loop has many dependences. There are a few issues in runtime parallelization that require further study. The first one is to use the dependence information gathered by the inspector to rearrange the order of execution of iterations in the executor to minimize busy-waiting time. This strategy would imply spreading the iterations that participate in a dependence chain as much as possible so that the processors that execute them do not have to wait for each other. This is not currently done, and therefore processors spend time busy-waiting for the previous access in the dependence chain to finish. A second optimization is to eliminate the serialization of reads belonging to input dependences. Currently, our scheme serializes all accesses to the same location, including concurrent reads. Finally, we are developing compiler algorithms to determine if the inspecto...
Compiler Algorithms for Event Variable Synchronization
- In Proceedings of the 1991 ACM International Conference on Supercomputing
, 1991
"... Event variable synchronization is a well-known mechanism for enforcing data dependences in a program that runs in parallel on a shared memory multiprocessor. This paper presents compiler algorithms to automatically generate event variable synchronization code. Previously published algorithms dealt w ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Event variable synchronization is a well-known mechanism for enforcing data dependences in a program that runs in parallel on a shared memory multiprocessor. This paper presents compiler algorithms to automatically generate event variable synchronization code. Previously published algorithms dealt with single parallel loops in which dependence distances are constant and known by the compiler. However, loops in real application programs are often arbitrarily nested. Moreover, compilers are often unable to determine dependence distances. In contrast, our algorithms generate synchronization code based directly on array subscripts and do not require constant distances in data dependences. The algorithms are designed for arbitrarily nested loops, including triangular or trapezoidal loops. 1 Introduction On shared memory multiprocessors, the performance of scientific and engineering programs can often be improved by running DO loop iterations in parallel. Some recent simulation studies rep...
Statement Re-Ordering for DOACROSS Loops
, 1994
"... In this paper, we propose a new statement re-ordering algorithm for DOACROSS loops that overcomes some of the problems in the previous schemes. The new algorithm uses a hierarchical approach to locate strongly dependent statement groups and to order these groups considering critical dependences. A n ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
In this paper, we propose a new statement re-ordering algorithm for DOACROSS loops that overcomes some of the problems in the previous schemes. The new algorithm uses a hierarchical approach to locate strongly dependent statement groups and to order these groups considering critical dependences. A new optimization problem, dependence covering maximization, which was not discussed before is also introduced. It is shown that this optimization problem is NP-complete, and a heuristic algorithm is incorporated in our algorithm. Run-time complexity analysis is given for both algorithms. This new statement re-ordering scheme, combined with the dependence covering maximization, can be an important compiler optimization to parallelize loop structures for large scale coarse and fine grain parallelism. Keywords: Compiler Optimization, Data Dependence, Doacross Execution, Redundant Synchronization Elimination, Statement Re-ordering. This work was supported in part by the National Science Foundat...
A Graph Based Approach to Barrier Synchronisation Minimisation
- IN PROCEEDINGS OF THE 1997 ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING
, 1997
"... This paper presents a new graph theoretic approach to minimising the number of barriers in parallelised programs. A simple procedure to reduce the complexity of barrier placement, without affecting optimality, is developed. A new algorithm is then presented which places provably the minimal number o ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
This paper presents a new graph theoretic approach to minimising the number of barriers in parallelised programs. A simple procedure to reduce the complexity of barrier placement, without affecting optimality, is developed. A new algorithm is then presented which places provably the minimal number of barriers in perfect loop nests. This technique is extended so as to place the minimal number of barriers in certain imperfect loop nest structures. This scheme is generalised to accept entire programs and implemented in a prototype parallelising compiler where it has been applied to several well-known benchmarks and shown to place significantly fewer synchronisation points than an existing commercial compiler.

