Results 1  10
of
18
On Optimizing A Class Of MultiDimensional Loops With Reductions For Parallel Execution
 Parallel Processing Letters
, 1997
"... This paper addresses the compiletime optimization of a form of nestedloop computation that is motivated by a computational physics application. The computations involve multidimensional surface and volume integrals where the integrand is a product of a number of array terms. Besides the issue of ..."
Abstract

Cited by 28 (22 self)
 Add to MetaCart
This paper addresses the compiletime optimization of a form of nestedloop computation that is motivated by a computational physics application. The computations involve multidimensional surface and volume integrals where the integrand is a product of a number of array terms. Besides the issue of optimal distribution of the arrays among the processors, there is also scope for reordering of the operations using the commutativity and associativity properties of addition and multiplication, and the application of the distributive law to significantly reduce the number of operations executed. A formalization of the operation minimization problem and proof of its NPcompleteness is provided. A pruning search strategy for determination of an optimal form is developed. An analysis of the communication requirements and a polynomialtime algorithm for determination of optimal distribution of the arrays are also provided. Keywords: loop parallelization, operation minimization, communication op...
Embedded Software in RealTime Signal Processing Systems: Design Technologies
 Proc. IEEE
, 1997
"... This paper discusses design technology issues for embedded systems using processor cores, with a focus on software compilation tools. Architectural characteristics of contemporary processor cores are reviewed and tool requirements are formulated. This is followed by a comprehensive survey of both ex ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
This paper discusses design technology issues for embedded systems using processor cores, with a focus on software compilation tools. Architectural characteristics of contemporary processor cores are reviewed and tool requirements are formulated. This is followed by a comprehensive survey of both existing and new software compilation techniques that are considered important in the context of embedded processors
Minimum Register Instruction Sequencing to Reduce Register Spills in OutofOrder Issue Superscalar Architectures
 IEEE Transactions on Computers
, 2003
"... Abstract — In this paper we address the problem of generating an optimal ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Abstract — In this paper we address the problem of generating an optimal
Allocating Registers in Multiple InstructionIssuing Processors
 IN PROCEEDINGS OF THE IFIP WG 10.3 WORKING CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT'95
, 1995
"... This work addresses the problem of scheduling a basic block of operations on a multiple instructionissuing processor. We show that integrating register constraints into operation sequencing algorithms is a complex problem in itself. Indeed, while scheduling a forest of unit time operations on a pro ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
This work addresses the problem of scheduling a basic block of operations on a multiple instructionissuing processor. We show that integrating register constraints into operation sequencing algorithms is a complex problem in itself. Indeed, while scheduling a forest of unit time operations on a processor with P parallel instruction slots can be solved in polynomial time, the problem becomes NPhard when P is unbounded but only R registers are available. As a result we have devised a concise integer linear programming formulation of this scheduling problem that accounts for both register and instruction issuing constraints. This allows the use of offtheshelf routines to find optimum solutions, which can then be compared with the results obtained by polynomialtime heuristics. Two such heuristics are given, and their combined results are shown to be optimal in 99.5% of the cases for trees of height at most 6. A byproduct of these experiments is to show that our integer programming f...
Concise Specifications of Locally Optimal Code Generators
, 1987
"... Dynamic programming allows locally optimal instruction selection for expression trees. More importantly, the algorithm allows concise and elegant specification of code generators. Aho, Ganapathi, and Tjiang have built the Twig codegeneratorgenerator, which produces dynamicprogramming codegenerat ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
Dynamic programming allows locally optimal instruction selection for expression trees. More importantly, the algorithm allows concise and elegant specification of code generators. Aho, Ganapathi, and Tjiang have built the Twig codegeneratorgenerator, which produces dynamicprogramming codegenerators from grammarlike specifications. Encoding a complex architecture as a grammar for a dynamicprogramming codegenerator generator shows the expressive power of the technique. Each instruction, addressing mode, register and class can be expressed individually in the grammar. The grammar can be factored much more readily than with the GrahamGlanville LR(1) algorithm, so it can be much more concise. Twig specifications for the VAX and MC68020 are described, and the corresponding code generators select very good (and under the right assumptions, optimal) instruction sequences. Limitations and possible improvements to the specification language are discussed. 1. Introduction One of the last...
TEMPLATE: A generic TEchnology Mapping PLATform
 IN PREPARATION, PREPRINTREIHE, INSTITUT F"UR INFORMATIK, UNIVERSIT"AT W"URZBURG
, 1997
"... Technology mapping problems arize in logic synthesis systems, when the gap between a synthesized boolean network and the implementation of that network within a given target technology has to be bridged. This paper presents a modular, versatile technology mapping system that supports many differ ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
Technology mapping problems arize in logic synthesis systems, when the gap between a synthesized boolean network and the implementation of that network within a given target technology has to be bridged. This paper presents a modular, versatile technology mapping system that supports many different target technologies. Guided by a complexity analysis of the problem, we develop a variety of efficient, exact or heuristic methods for technology driven network clustering. Depending on the target technology and optimization methods and goals, different subnetworks must be provided as candidates for clustering. Methods to achieve this are also included. We conclude with experimental results we obtained with several configurations of the system for different target technologies.
Area and Search Space Control for Technology Mapping
, 2000
"... We present a technology mapping procedure in which an areadelay tradeoff curve is constructed at each node using matches found for different decompositions of the node. This information is used effectively to find implementations that meet delay constraints while reducing area. The procedure combi ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
We present a technology mapping procedure in which an areadelay tradeoff curve is constructed at each node using matches found for different decompositions of the node. This information is used effectively to find implementations that meet delay constraints while reducing area. The procedure combines stateoftheart mapping procedures, in which a graph covering is applied to a special graph structure which succinctly encodes many representations. Major challenges were avoiding memory explosion and finding good cost estimations. The combined procedure outperforms the best result among any of the procedures used separately.
Code Generation for FixedPoint DSPs
 ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS
, 1998
"... This paper examines the problem of codegeneration for Digital Signal Processors (DSPs). There are two major contributions of this work. First, we propose an optimal O(n) algorithm for the tasks of register allocation and instruction scheduling for expression trees, for an important class of DSP arc ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
This paper examines the problem of codegeneration for Digital Signal Processors (DSPs). There are two major contributions of this work. First, we propose an optimal O(n) algorithm for the tasks of register allocation and instruction scheduling for expression trees, for an important class of DSP architectures. Optimality is guaranteed by sufficient conditions derived from a structural representation of the processor Instruction Set Architecture (ISA). Second, we develop heuristics for the case when basic blocks are Directed Acyclic Graphs (DAGs).
NearOptimal Instruction Selection on DAGs
, 2008
"... Instruction selection is a key component of code generation. High quality instruction selection is of particular importance in the embedded space where complex instruction sets are common and code size is a prime concern. Although instruction selection on tree expressions is a well understood and ea ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Instruction selection is a key component of code generation. High quality instruction selection is of particular importance in the embedded space where complex instruction sets are common and code size is a prime concern. Although instruction selection on tree expressions is a well understood and easily solved problem, instruction selection on directed acyclic graphs is NPcomplete. In this paper we present NOLTIS, a nearoptimal, linear time instruction selection algorithm for DAG expressions. NOLTIS is easy to implement, fast, and effective with a demonstrated average code size improvement of 5.1 % compared to the traditional tree decomposition and tiling approach.