Results 1 - 10
of
40
Automatic Program Parallelization
, 1993
"... This paper presents an overview of automatic program parallelization techniques. It covers dependence analysis techniques, followed by a discussion of program transformations, including straight-line code parallelization, do loop transformations, and parallelization of recursive routines. The last s ..."
Abstract
-
Cited by 97 (8 self)
- Add to MetaCart
This paper presents an overview of automatic program parallelization techniques. It covers dependence analysis techniques, followed by a discussion of program transformations, including straight-line code parallelization, do loop transformations, and parallelization of recursive routines. The last section of the paper surveys several experimental studies on the effectiveness of parallelizing compilers.
Foundations of Timed Concurrent Constraint Programming
- Proceedings of the Ninth Annual IEEE Symposium on Logic in Computer Science
, 1994
"... We develop a model for timed, reactive computation by extending the asynchronous, untimed concurrent constraint programming model in a simple and uniform way. In the spirit of process algebras, we develop some combinators expressible in this model, and reconcile their operational, logical and denota ..."
Abstract
-
Cited by 76 (10 self)
- Add to MetaCart
We develop a model for timed, reactive computation by extending the asynchronous, untimed concurrent constraint programming model in a simple and uniform way. In the spirit of process algebras, we develop some combinators expressible in this model, and reconcile their operational, logical and denotational character. We show how programs may be compiled into finite-state machines with loop-free computations at each state, thus guaranteeing bounded response time. 1 Introduction and Motivation Reactive systems [12,3,9] are those that react continuously with their environment at a rate controlled by the environment. Execution in a reactive system proceeds in bursts of activity. In each phase, the environment stimulates the system with an input, obtains a response in bounded time, and may then be inactive (with respect to the system) for an arbitrary period of time before initiating the next burst. Examples of reactive systems are controllers and signal-processing systems. The primary issu...
Combining Analyses, Combining Optimizations
, 1995
"... This thesis presents a framework for describing optimizations. It shows how to combine two such frameworks and how to reason about the properties of the resulting framework. The structure of the framework provides insight into when a combination yields better results. Also presented is a simple iter ..."
Abstract
-
Cited by 67 (4 self)
- Add to MetaCart
This thesis presents a framework for describing optimizations. It shows how to combine two such frameworks and how to reason about the properties of the resulting framework. The structure of the framework provides insight into when a combination yields better results. Also presented is a simple iterative algorithm for solving these frameworks. A framework is shown that combines Constant Propagation, Unreachable Code Elimination, Global Congruence Finding and Global Value Numbering. For these optimizations, the iterative algorithm runs in O(n^2) time.
This thesis then presents an O(n log n) algorithm for combining the same optimizations. This technique also finds many of the common subexpressions found by Partial Redundancy Elimination. However, it requires a global code motion pass to make the optimized code correct, also presented. The global code motion algorithm removes some Partially Dead Code as a side-effect. An implementation demonstrates that the algorithm has shorter compile times than repeated passes of the separate optimizations while producing run-time speedups of 4%–7%.
While global analyses are stronger, peephole analyses can be unexpectedly powerful. This thesis demonstrates parse-time peephole optimizations that find more than 95% of the constants and common subexpressions found by the best combined analysis. Finding constants and common subexpressions while parsing reduces peak intermediate representation size. This speeds up the later global analyses, reducing total compilation time by 10%. In conjunction with global code motion, these peephole optimizations generate excellent code very quickly, a useful feature for compilers that stress compilation speed over code quality.
Timed Default Concurrent Constraint Programming
- Journal of Symbolic Computation
, 1996
"... Synchronous programming (Berry (1989)) is a powerful approach to programming reactive systems. Following the idea that "processes are relations extended over time" (Abramsky (1993)), we propose a simple but powerful model for timed, determinate computation, extending the closure-operator model for u ..."
Abstract
-
Cited by 61 (11 self)
- Add to MetaCart
Synchronous programming (Berry (1989)) is a powerful approach to programming reactive systems. Following the idea that "processes are relations extended over time" (Abramsky (1993)), we propose a simple but powerful model for timed, determinate computation, extending the closure-operator model for untimed concurrent constraint programming (CCP). In (Saraswat et al. 1994a) we had proposed a model for this called tcc--- here we extend the model of tcc to express strong time-outs: if an event A does not happen through time t, cause event B to happen at time t. Such constructs arise naturally in practice (e.g. in modeling transistors) and are supported in synchronous programming languages. The fundamental conceptual difficulty posed by these operations is that they are nonmonotonic. We provide a compositional semantics to the non-monotonic version of concurrent constraint programming (Default cc) obtained by changing the underlying logic from intuitionistic logic to Reiter's default logic...
CARS: A new code generation framework for clustered ILP processors
- In HPCA
, 2001
"... Clustered ILP processors are characterized by a large number of non-centralized on-chip resources grouped into clusters. Traditional code generation schemes for these processors consist of multiple phases for cluster assignment, register allocation and instruction scheduling. Most of these approache ..."
Abstract
-
Cited by 40 (1 self)
- Add to MetaCart
Clustered ILP processors are characterized by a large number of non-centralized on-chip resources grouped into clusters. Traditional code generation schemes for these processors consist of multiple phases for cluster assignment, register allocation and instruction scheduling. Most of these approaches need additional re-scheduling phases because they often do not impose finite resource constraints in all phases of code generation. These phase-ordered solutions have several drawbacks, resulting in the generation of poor performance code. Moreover, the iterative/back-tracking algorithms used in some of these schemes have large running times. In this paper we present CARS, a code generation framework for Clustered ILP processors, which combines the cluster assignment, register allocation, and instruction scheduling phases into a single code generation phase, thereby eliminating the problems associated with phase-ordered solutions. The CARS algorithm explicitly takes into account all the resource constraints at each cluster scheduling step to reduce spilling and to avoid iterative re-scheduling steps. We also present a new on-the-fly register allocation scheme developed for CARS. We describe an implementation of the proposed code generation framework and the results of a performance evaluation study using the SPEC95/2000 and MediaBench benchmarks.
Spatial Computation
- in International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS
, 2004
"... This paper describes a computer architecture, Spatial Computation (SC), which is based on the translation of high-level language programs directly into hardware structures. SC program implementations are completely distributed, with no centralized control. SC circuits are optimized for wires at the ..."
Abstract
-
Cited by 37 (10 self)
- Add to MetaCart
This paper describes a computer architecture, Spatial Computation (SC), which is based on the translation of high-level language programs directly into hardware structures. SC program implementations are completely distributed, with no centralized control. SC circuits are optimized for wires at the expense of computation units. In this paper we investigate a particular implementation of SC: ASH (Application-Specific Hardware). Under the assumption that computation is cheaper than communication, ASH replicates computation units to simplify interconnect, building a system which uses very simple, completely dedicated communication channels. As a consequence, communication on the datapath never requires arbitration; the only arbitration required is for accessing memory. ASH relies on very simple hardware primitives, using no associative structures, no multiported register files, no scheduling logic, no broadcast, and no clocks. As a consequence, ASH hardware is fast and extremely power efficient.
Pegasus: An efficient intermediate representation
, 2002
"... We present Pegasus, a compact and expressive intermediate representation for imperative languages. The representation is suitable for target architectures supporting predicated execution and aggressive speculation. In Pegasus information about the global dataflow of the program is encoded in local s ..."
Abstract
-
Cited by 28 (9 self)
- Add to MetaCart
We present Pegasus, a compact and expressive intermediate representation for imperative languages. The representation is suitable for target architectures supporting predicated execution and aggressive speculation. In Pegasus information about the global dataflow of the program is encoded in local structures, enabling compact and efficient algorithms for program optimizations. As a proof of the versatility of Pegasus, we have used it in a compiler translating C programs to hardware implementations. 1
From Control Flow to Dataflow
, 1989
"... Are imperative languages tied inseparably to the von Neumann model or can they be implemented in some natural way on dataflow architectures? In this paper, we show how imperative language programs can be translated into dataflow graphs and executed on a dataflow machine like Monsoon. This translatio ..."
Abstract
-
Cited by 21 (4 self)
- Add to MetaCart
Are imperative languages tied inseparably to the von Neumann model or can they be implemented in some natural way on dataflow architectures? In this paper, we show how imperative language programs can be translated into dataflow graphs and executed on a dataflow machine like Monsoon. This translation can exploit both fine-grain and coarse-grain parallelism in imperative language programs. More importantly, we establish a close connection between our work and current research in the imperative languages community on data dependences, control dependences, program dependence graphs, and static single assignment form. These results suggest that data ow graphs can serve as an executable intermediate representation in parallelizing compilers.
Static Single Information Form
- Master's thesis, Massachussets Institute of Technology
, 1999
"... This paper presents a new intermediate format called Static Single Information (SSI) form. SSI form generalizes the traditional concept of a variable de nition to include all information de nition points, or points where the analysis may obtain information about the value in a variable. Informatio ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
This paper presents a new intermediate format called Static Single Information (SSI) form. SSI form generalizes the traditional concept of a variable de nition to include all information de nition points, or points where the analysis may obtain information about the value in a variable. Information de nition points include conditional branches as well as assignments. Because SSI form provides a new name for each variable at each information de nition point, it provides excellent support for both predicated analyses, which exploit information gained from conditionals, and backwards dataow analyses.
Prototyping Fortran-90 Compilers for Massively Parallel Machines
- In Proceedings of the SIGPLAN '92 Conference on Program Language Design and Implementation
, 1992
"... Massively parallel architectures, and the languages used to program them, are among both the most difficult and the most rapidly-changing subjects for compilation. This has created a demand for new compiler prototyping technologies that allow novel styles of compilation and optimization to be tested ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
Massively parallel architectures, and the languages used to program them, are among both the most difficult and the most rapidly-changing subjects for compilation. This has created a demand for new compiler prototyping technologies that allow novel styles of compilation and optimization to be tested in a reasonable amount of time. Using formal specification techniques, we have produced a data-parallel Fortran-90 subset compiler for Thinking Machines' Connection Machine/2 and Connection Machine/5. The prototype produces code from initial Fortran-90 benchmarks demonstrating sustained performance superior to hand-coded *Lisp and competitive with Thinking Machines' CM Fortran compiler. This paper presents some new specification techniques necessary to construct competitive, easily retargetable prototype compilers. 1 Introduction Existing compilers for massively parallel machines have generally been constructed using traditional methods, combining generation from specification for a few su...

