Results 1 - 10
of
46
ABCD: Eliminating Array Bounds Checks on Demand
- IN ACM CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION
, 2000
"... To guarantee typesafe execution, Java and other strongly typed languages require bounds checking of array accesses. Because arraybounds checks may raise exceptions, they block code motion of instructions with side effects, thus preventing many useful code optimizations, such as partial redundancy el ..."
Abstract
-
Cited by 113 (6 self)
- Add to MetaCart
To guarantee typesafe execution, Java and other strongly typed languages require bounds checking of array accesses. Because arraybounds checks may raise exceptions, they block code motion of instructions with side effects, thus preventing many useful code optimizations, such as partial redundancy elimination or instruction scheduling of memory operations. Furthermore, because it is not expressible at bytecode level, the elimination of bounds checks can only be performed at run time, after the bytecode program is loaded. Using existing powerful bounds-check optimizers at run time is not feasible, however, because they are too heavyweight for the dynamic compilation setting. ABCD is a light-weight algorithm for elimination of Array Bounds Checks on Demand. Its design emphasizes simplicity and efficiency. In essence, ABCD works by adding a few edges to the SSA value graph and performing a simple traversal of the graph. Despite its simplicity, ABCD is surprisingly powerful. On our benchma...
Diskless Checkpointing
, 1997
"... Diskless Checkpointing is a technique for checkpointing the state of a long-running computation on a distributed system without relying on stable storage. As such, it eliminates the performance bottleneck of traditional checkpointing on distributed systems. In this paper, we motivate diskless checkp ..."
Abstract
-
Cited by 91 (3 self)
- Add to MetaCart
Diskless Checkpointing is a technique for checkpointing the state of a long-running computation on a distributed system without relying on stable storage. As such, it eliminates the performance bottleneck of traditional checkpointing on distributed systems. In this paper, we motivate diskless checkpointing and present the basic diskless checkpointing scheme along with several variants for improved performance. The performance of the basic scheme and its variants is evaluated on a high-performance network of workstations and compared to traditional disk-based checkpointing. We conclude that diskless checkpointing is a desirable alternative to disk-based checkpointing that can improve the performance of distributed applications in the face of failures.
Bitwidth Analysis with Application to Silicon Compilation
, 2000
"... This paper introduces Bitwise, a compiler that minimizes the bitwidth --- the number of bits used to representeach operand --- for both integers and pointers in a program. By propagating static information both forward and backward in the program dataflowgraph,Bitwise frees the programmer from decla ..."
Abstract
-
Cited by 80 (0 self)
- Add to MetaCart
This paper introduces Bitwise, a compiler that minimizes the bitwidth --- the number of bits used to representeach operand --- for both integers and pointers in a program. By propagating static information both forward and backward in the program dataflowgraph,Bitwise frees the programmer from declaring bitwidth invariants in cases where the compiler can determine bitwidths automatically. We find a rich opportunity for bitwidth reduction in modern multimedia and streaming application workloads. For new architectures that support sub-word quantities, we expect that our bitwidth reductions will savepower and increase processor performance. This paper
Demand-driven Computation of Interprocedural Data Flow
, 1995
"... This paper presents a general framework for deriving demanddriven algorithms for interprocedural data flow analysis of imperative programs. The goal of demand-driven analysis is to reduce the time and/or space overhead of conventional exhaustive analysis by avoiding the collection of information tha ..."
Abstract
-
Cited by 76 (9 self)
- Add to MetaCart
This paper presents a general framework for deriving demanddriven algorithms for interprocedural data flow analysis of imperative programs. The goal of demand-driven analysis is to reduce the time and/or space overhead of conventional exhaustive analysis by avoiding the collection of information that is not needed. In our framework, a demand for data flow information is modeled as a set of data flow queries. The derived demand-driven algorithms find responses to these queries through a partial reversal of the respective data flow analysis. Depending on whether minimizing time or space is of primary concern, result caching may be incorporated in the derived algorithm. Our framework is applicable to interprocedural data flow problems with a finite domain set. If the problem's flow functions are distributive, the derived demand algorithms provide as precise information as the corresponding exhaustive analysis. For problems with monotone but non-distributive flow functions the provided dat...
The Alignment-Distribution Graph
- In Proceedings of the Sixth Workshop on Languages and Compilers for Parallel Computing
, 1993
"... Implementing a data-parallel language such as Fortran 90 on a distributed-memory parallel computer requires distributing aggregate data objects (such as arrays) among the memory modules attached to the processors. The mapping of objects to the machine determines the amount of residual communicatio ..."
Abstract
-
Cited by 61 (3 self)
- Add to MetaCart
Implementing a data-parallel language such as Fortran 90 on a distributed-memory parallel computer requires distributing aggregate data objects (such as arrays) among the memory modules attached to the processors. The mapping of objects to the machine determines the amount of residual communication needed to bring operands of parallel operations into alignment with each other. We present a program representation called the alignmentdistribution graph that makes these communication requirements explicit. We describe the details of the representation, show how to model communication cost in this framework, and outline several algorithms for determining object mappings that approximately minimize residual communication.
The Program Structure Tree: Computing Control Regions in Linear Time
, 1994
"... In this paper, we describe the program structure tree (PST), a hierarchical representation of program structure based on single entry single exit (SESE) regions of the control flow graph. We give a linear-time algorithm for finding SESE regions and for building the PST of arbitrary control flow grap ..."
Abstract
-
Cited by 53 (2 self)
- Add to MetaCart
In this paper, we describe the program structure tree (PST), a hierarchical representation of program structure based on single entry single exit (SESE) regions of the control flow graph. We give a linear-time algorithm for finding SESE regions and for building the PST of arbitrary control flow graphs (including irreducible ones). Next, we establish a connection between SESE regions and control dependence equivalence classes, and show how to use the algorithm to find control regions in linear time. Finally, we discuss some applications of the PST. Many control-flow algorithms, such as construction of Static Single Assignment form, can be speeded up by applying the algorithms in a divide-and-conquer style to each SESE region on its own. The PST is also used to speed up data flow analysis by exploiting `sparsity'. Experimental results from the Perfect Club and SPEC89 benchmarks confirm that the PST approach finds and exploits program structure.
Spatial Computation
- in International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS
, 2004
"... This paper describes a computer architecture, Spatial Computation (SC), which is based on the translation of high-level language programs directly into hardware structures. SC program implementations are completely distributed, with no centralized control. SC circuits are optimized for wires at the ..."
Abstract
-
Cited by 37 (10 self)
- Add to MetaCart
This paper describes a computer architecture, Spatial Computation (SC), which is based on the translation of high-level language programs directly into hardware structures. SC program implementations are completely distributed, with no centralized control. SC circuits are optimized for wires at the expense of computation units. In this paper we investigate a particular implementation of SC: ASH (Application-Specific Hardware). Under the assumption that computation is cheaper than communication, ASH replicates computation units to simplify interconnect, building a system which uses very simple, completely dedicated communication channels. As a consequence, communication on the datapath never requires arbitration; the only arbitration required is for accessing memory. ASH relies on very simple hardware primitives, using no associative structures, no multiported register files, no scheduling logic, no broadcast, and no clocks. As a consequence, ASH hardware is fast and extremely power efficient.
Composing Dataflow Analyses and Transformations
, 2001
"... Dataflow analyses can have mutually beneficial interactions. Previous e#orts to exploit these interactions have either (1) iteratively performed each individual analysis until no further improvements are discovered or (2) developed "superanalyses " that manually combine conceptually separate analyse ..."
Abstract
-
Cited by 35 (6 self)
- Add to MetaCart
Dataflow analyses can have mutually beneficial interactions. Previous e#orts to exploit these interactions have either (1) iteratively performed each individual analysis until no further improvements are discovered or (2) developed "superanalyses " that manually combine conceptually separate analyses. We have devised a new approach that allows analyses to be defined independently while still enabling them to be combined automatically and profitably. Our approach avoids the loss of precision associated with iterating individual analyses and the implementation di#culties of manually writing a super-analysis. The key to our approach is a novel method of implicit communication between the individual components of a super-analysis based on graph transformations. In this paper, we precisely define our approach; we demonstrate that it is sound and it terminates; finally we give experimental results showing that in practice (1) our framework produces results at least as precise as iterating the individual analyses while compiling at least 5 times faster, and (2) our framework achieves the same precision as a manually written super-analysis while incurring a compiletime overhead of less than 20%.
Value Dependence Graphs: Representation without Taxation
- IN CONFERENCE RECORD OF THE 21ST ANNUAL ACM SYMPOSIUM ON PRINCIPLES OF PROGRAMMING LANGUAGES. ACM
, 1994
"... The value dependence graph (VDG) is a sparse dataflow-like representation that simplifies program analysis and transformation. It is a functional representation that represents control flow as data flow and makes explicit all machine quantities, such as stores and I/O channels. We are developing a c ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
The value dependence graph (VDG) is a sparse dataflow-like representation that simplifies program analysis and transformation. It is a functional representation that represents control flow as data flow and makes explicit all machine quantities, such as stores and I/O channels. We are developing a compiler that builds a VDG representing a program, analyzes and transforms the VDG, then produces a control flow graph (CFG) [ASU86] from the optimized VDG. This framework simplifies transformations and improves upon several published results. For example, it enables more powerful code motion than [CLZ86, FOW87], eliminates as many redundancies as [AWZ88, RWZ88] (except for redundant loops), and provides important information to the code scheduler [BR91]. We exhibit a one-pass method for elimination of partial redundancies that never performs redundant code motion [KRS92, DS93] and is simpler than the classical [MR79, Dha91] or SSA [RWZ88] methods. These results accrue from eliminating the CFG from the analysis/transformation phases and using demand dependences in preference to control dependences.

