Results 1 - 10
of
82
The program dependence graph and its use in optimization
- ACM Transactions on Programming Languages and Systems
, 1987
"... In this paper we present an intermediate program representation, called the program dependence graph (PDG), that makes explicit both the data and control dependence5 for each operation in a program. Data dependences have been used to represent only the relevant data flow relationships of a program. ..."
Abstract
-
Cited by 749 (3 self)
- Add to MetaCart
In this paper we present an intermediate program representation, called the program dependence graph (PDG), that makes explicit both the data and control dependence5 for each operation in a program. Data dependences have been used to represent only the relevant data flow relationships of a program. Control dependence5 are introduced to analogously represent only the essential control flow relationships of a program. Control dependences are derived from the usual control flow graph. Many traditional optimizations operate more efficiently on the PDG. Since dependences in the PDG connect computationally related parts of the program, a single walk of these dependences is sufficient to perform many optimizations. The PDG allows transformations such as vectorization, that previ-ously required special treatment of control dependence, to be performed in a manner that is uniform for both control and data dependences. Program transformations that require interaction of the two dependence types can also be easily handled with our representation. As an example, an incremental approach to modifying data dependences resulting from branch deletion or loop unrolling is intro-duced. The PDG supports incremental optimization, permitting transformations to be triggered by one another and applied only to affected dependences.
The Multiscalar Architecture
, 1993
"... The centerpiece of this thesis is a new processing paradigm for exploiting instruction level parallelism. This paradigm, called the multiscalar paradigm, splits the program into many smaller tasks, and exploits fine-grain parallelism by executing multiple, possibly (control and/or data) depen-dent t ..."
Abstract
-
Cited by 113 (8 self)
- Add to MetaCart
The centerpiece of this thesis is a new processing paradigm for exploiting instruction level parallelism. This paradigm, called the multiscalar paradigm, splits the program into many smaller tasks, and exploits fine-grain parallelism by executing multiple, possibly (control and/or data) depen-dent tasks in parallel using multiple processing elements. Splitting the instruction stream at statically determined boundaries allows the compiler to pass substantial information about the tasks to the hardware. The processing paradigm can be viewed as extensions of the superscalar and multiprocess-ing paradigms, and shares a number of properties of the sequential processing model and the dataflow processing model. The multiscalar paradigm is easily realizable, and we describe an implementation of the multis-calar paradigm, called the multiscalar processor. The central idea here is to connect multiple sequen-tial processors, in a decoupled and decentralized manner, to achieve overall multiple issue. The mul-tiscalar processor supports speculative execution, allows arbitrary dynamic code motion (facilitated by an efficient hardware memory disambiguation mechanism), exploits communication localities, and does all of these with hardware that is fairly straightforward to build. Other desirable aspects of the
Software Synthesis for DSP Using Ptolemy
- Journal of VLSI Signal Processing
, 1993
"... Ptolemy is an environment for simulation, prototyping, and software synthesis for heterogeneous systems. It uses modern object-oriented software technology (in C++) to model each subsystem in a natural and efficient manner, and to integrate these subsystems into a whole. The objectives of Ptolemy en ..."
Abstract
-
Cited by 62 (25 self)
- Add to MetaCart
Ptolemy is an environment for simulation, prototyping, and software synthesis for heterogeneous systems. It uses modern object-oriented software technology (in C++) to model each subsystem in a natural and efficient manner, and to integrate these subsystems into a whole. The objectives of Ptolemy encompass practically all aspects of designing signal processing and communications systems, ranging from algorithms and communication strategies, through simulation, hardware and software design, parallel computing, and generation of real-time prototypes. In this paper we will introduce the software synthesis aspects of the Ptolemy system. The environment presented here is both modular and extensible. Ptolemy allows the user to choose among various single- or multiple-processor schedulers. 1.0 Introduction Practical signal processing systems today are rarely implemented without software or firmware, even at the ASIC level. Programmable DSPs, in particular, form the heart of many implementati...
A Register Allocation Framework Based on Hierarchical Cyclic Interval Graphs
- In International Workshop on Compiler Construction, Paderdorn
, 1993
"... In this paper, we propose the use of cyclic interval graphs as an alternative representation for register allocation. The "thickness" of the cyclic interval graph captures the notion of overlap between live ranges of variables relative to each particular point of time in the program execution. We de ..."
Abstract
-
Cited by 52 (13 self)
- Add to MetaCart
In this paper, we propose the use of cyclic interval graphs as an alternative representation for register allocation. The "thickness" of the cyclic interval graph captures the notion of overlap between live ranges of variables relative to each particular point of time in the program execution. We demonstrate that cyclic interval graphs provide a feasible and effective representation that accurately captures the periodic nature of live ranges found in loops. A new heuristic algorithm for minimum register allocation, the fat cover algorithm, has been developed and implemented to exploit such program structure. In addition, a new spilling algorithm is proposed that makes use of the extra information available in the interval graph representation. These two algorithms work together to provide a two-phase register allocation process that does not require iteration of the spilling or coloring phases. We extend the notion of cyclic interval graphs to hierarchical cyclic interval graphs and we...
Cheops: A Reconfigurable Data-Flow System for Video Processing
- IEEE Transactions on Circuits and Systems for Video Technology
, 1995
"... The Cheops Imaging System is a compact, modular platform for acquisition, processing, and display of digital video sequences and model-based representations of moving scenes, and is intended as both a laboratory tool and a prototype architecture for future programmable video decoders. Rather than us ..."
Abstract
-
Cited by 48 (5 self)
- Add to MetaCart
The Cheops Imaging System is a compact, modular platform for acquisition, processing, and display of digital video sequences and model-based representations of moving scenes, and is intended as both a laboratory tool and a prototype architecture for future programmable video decoders. Rather than using a large number of general-purpose processors and dividing up image processing tasks spatially, Cheops abstracts out a set of basic, computationally intensive stream operations that may be performed in parallel and embodies them in specialized hardware. We review the Cheops architecture, describe the software system that has been developed to perform resource management, and present the results of some performance tests.
Advances in dataflow programming languages
- ACM Comput. Surv
, 2004
"... Abstract. Many developments have taken place within dataflow programming languages in the past decade. In particular, there has been a great deal of activity and advancement in the field of dataflow visual programming languages. The motivation for this article is to review the content of these recen ..."
Abstract
-
Cited by 41 (0 self)
- Add to MetaCart
Abstract. Many developments have taken place within dataflow programming languages in the past decade. In particular, there has been a great deal of activity and advancement in the field of dataflow visual programming languages. The motivation for this article is to review the content of these recent developments and how they came
Memory Dependence Prediction
, 1998
"... As the existing techniques that empower the modern high-performance processors are being refined and as the underlying technology trade-offs change, new bottlenecks are exposed and new challenges are raised. This thesis introduces a new tool, Memory Dependence Prediction that can be useful in combat ..."
Abstract
-
Cited by 36 (5 self)
- Add to MetaCart
As the existing techniques that empower the modern high-performance processors are being refined and as the underlying technology trade-offs change, new bottlenecks are exposed and new challenges are raised. This thesis introduces a new tool, Memory Dependence Prediction that can be useful in combating these bottlenecks and meeting the new challenges. Memory dependence prediction is a technique to guess whether a load or a store will experience a dependence. Memory dependence prediction exploits regularity in the memory dependence stream of ordinary programs, a phenomenon which is also identified in this thesis. To demonstrate the utility of memory dependence prediction this thesis also presents the following three novel microarchitectural techniques: 1. Dynamic Speculation/Synchronization of Memory Dependences: this thesis demonstrates that to exploit parallelism over larger regions of code waiting to determine the dependences a load has is not the best performing option. Higher performance is possible if memory dependence speculation is used especially if memory dependence prediction is used to guide this speculation.
Phased Logic: Supporting the Synchronous Design Paradigm with Delay-Insensitive Circuitry
, 1996
"... Phased logic is proposed as a solution to the increasing problem of timing complexity in digital design. It is a delay--insensitive design methodology that seeks to restore the separation between logical and physical design by eliminating the need to distribute low--skew clock signals and carefully ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
Phased logic is proposed as a solution to the increasing problem of timing complexity in digital design. It is a delay--insensitive design methodology that seeks to restore the separation between logical and physical design by eliminating the need to distribute low--skew clock signals and carefully balance propagation delays. However, unlike other methodologies that avoid clocks, phased logic supports the cyclic, deterministic behavior of the synchronous design paradigm. This permits the designer to rely chiefly on current experience and CAD tools to create phased logic systems. Marked graph theory is used as a framework for governing the interaction of phased logic gates that operate directly on Level--Encoded two--phase Dual--Rail (LEDR) signals. A synthesis algorithm is developed for converting clocked systems to phased logic systems and is applied to benchmark examples. Performance results indicate that phased logic tends to be tolerant of logic delay imbalances and has predictable...
Stream Computations Organized for Reconfigurable Execution (SCORE): Introduction and Tutorial
- in Proceedings of the International Conference on Field-Programmable Logic and Applications
, 2000
"... A primary impediment to wide-spread exploitation of reconfigurable computing is the lack of a unifying computational model which allows application portability and longevity without sacrificing a substantial fraction of the raw capabilities. We introduce SCORE (Stream Computation Organized for Recon ..."
Abstract
-
Cited by 30 (8 self)
- Add to MetaCart
A primary impediment to wide-spread exploitation of reconfigurable computing is the lack of a unifying computational model which allows application portability and longevity without sacrificing a substantial fraction of the raw capabilities. We introduce SCORE (Stream Computation Organized for Reconfigurable Execution), a streambased compute model which virtualizes reconfigurable computing resources (compute, storage, and communication) by dividing a computation up into fixed-size "pages" and time-multiplexing the virtual pages on available physical hardware. Consequently, SCORE applications can scale up or down automatically to exploit a wide range of hardware sizes. We hypothesize that the SCORE model will ease development and deployment of reconfigurable applications and expand the range of applications which can benefit from reconfigurable execution. Further, we believe that a well engineered SCORE implementation can be efficient, wasting little of the capabilities of the raw hardw...
Relaxation-Based Electrical Simulation
- IEEE Tr. on Electronic Devices
, 1983
"... Abstract-Circuit simulation programs have proven to be most in-portant computer-aided design tools for the analysis of the electri1:al performance of integrated circuits. One of the most common analy!es performed by circuit simulators and the most expensive in terms of computer time is nonlinear tim ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
Abstract-Circuit simulation programs have proven to be most in-portant computer-aided design tools for the analysis of the electri1:al performance of integrated circuits. One of the most common analy!es performed by circuit simulators and the most expensive in terms of computer time is nonlinear time-domain transient analysis. Conventional circuit simulators were designed initially for the cost-effective analysis of circuits containing a few hundred transistors or less. Because of the need.to verify the performance of larger circuits, many users have successfully simulated circuits containing thousands of transistors despite the cost. Recently, a new class of algorithms has been applied to the electrical IC simulation problem. New simulators using these methods provide accurate waveform information with up to two orders of magnitude speed impro {e-ment for large circuits. These programs use relaxation methods for 1he solution of the set of ordinary differential equations, which describe lhe circuit under analysis, rather than the direct sparse-matrix methods on which standard circuit simulators are based. In this paper, the techniques used in relaxation-based electrical simula-tion are presented in a rigorous and unified framework, and the numerical properties of the various methods are explored. Both the advantages 2nd the limitations of these techniques for the analysis of large IC's are described.

