Results 1  10
of
22
Effective Partial Redundancy Elimination
 Proceedings of the ACM SIGPLAN '94 Conference on Programming Language Design and Implementation
, 1994
"... Partial redundancy elimination is a code optimization with a long history of literature and implementation. In practice, its effectiveness depends on issues of naming and code shape. This paper shows that a combination of global reassociation and global value numbering can increase the effectiveness ..."
Abstract

Cited by 85 (13 self)
 Add to MetaCart
Partial redundancy elimination is a code optimization with a long history of literature and implementation. In practice, its effectiveness depends on issues of naming and code shape. This paper shows that a combination of global reassociation and global value numbering can increase the effectiveness of partial redundancy elimination. By imposing a discipline on the choice of names and the shape of expressions, we are able to expose more redundancies. As part of the work, we introduce a new algorithm for global reassociation of expressions. It uses global information to reorder expressions, creating opportunities for other optimizations. The new algorithm generalizes earlier work that ordered FORTRAN array address expressions to improve optimization [25]. 1 Introduction Partial redundancy elimination is a powerful optimization that has been discussed in the literature for many years (e.g., [21, 8, 14, 12, 18]). Unfortunately, partial redundancy elimination has two serious limitations...
Experience with a SoftwareDefined Machine Architecture
 Unreachable Procedures in Objectoriented WRL Research Report 91/10
, 1991
"... We built a system in which the compiler back end and the linker work together to present an abstract machine at a considerably higher level than the actual machine. The intermediate language translated by the back end is the target language of all highlevel compilers and is also the only assembl ..."
Abstract

Cited by 56 (7 self)
 Add to MetaCart
We built a system in which the compiler back end and the linker work together to present an abstract machine at a considerably higher level than the actual machine. The intermediate language translated by the back end is the target language of all highlevel compilers and is also the only assembly language generally available. This lets us do intermodule register allocation, which would be harder if some of the code in the program had come from a traditional assembler, out of sight of the optimizer. We do intermodule register allocation and pipeline instruction scheduling at link time, using information gathered by the compiler back end. The mechanism for analyzing and modifying the program at link time was also useful in a wide array of instrumentation tools. i 1. Introduction When our lab built its experimental RISC workstation, the Titan, we defined a highlevel assembly language as the official interface to the machine. This highlevel assembly language, called Mahler,...
Automatic, TemplateBased RunTime Specialization: Implementation and Experimental Study
 In International Conference on Computer Languages
, 1998
"... Specializing programs with respect to runtime values has been shown to drastically improve code performance on realistic programs ranging from operating systems to graphics. Recently, various approaches to specializing code at runtime have been proposed. However, these approaches still suffer from ..."
Abstract

Cited by 50 (12 self)
 Add to MetaCart
Specializing programs with respect to runtime values has been shown to drastically improve code performance on realistic programs ranging from operating systems to graphics. Recently, various approaches to specializing code at runtime have been proposed. However, these approaches still suffer from shortcomings that limit their applicability: they are manual, too expensive, or require programs to be written in a dedicated language. We solve these problems by introducing new techniques to implement runtime specialization. The key to our approach is the use of code templates. Templates are automatically generated from ordinary programs and are optimized before run time, allowing highquality code to be quickly generated at run time. Experimental results obtained on scientific and graphics code indicate that our approach is highly effective. Little runtime overhead is introduced, since code generation primarily consists of copying instructions. Runtime specialized programs run up to 1...
Operator Strength Reduction
, 1995
"... This paper presents a new al gS ithm for operator strengM reduction, called OSR. OSR improves upon an earlier alg orithm due to Allen, Cocke, and Kennedy [Allen et al. 1981]. OSR operates on the static sing e assig4 ent (SSA) form of a procedure [Cytron et al. 1991]. By taking advantag of the pr ..."
Abstract

Cited by 29 (9 self)
 Add to MetaCart
This paper presents a new al gS ithm for operator strengM reduction, called OSR. OSR improves upon an earlier alg orithm due to Allen, Cocke, and Kennedy [Allen et al. 1981]. OSR operates on the static sing e assig4 ent (SSA) form of a procedure [Cytron et al. 1991]. By taking advantag of the properties of SSA form, we have derived an alg ithm that is simple to understand, quick to implement, and, in practice, fast to run. Its asymptotic complexity is, in the worst case, the same as the Allen, Cocke, and Kennedy al gS ithm (ACK). OSR achieves optimization results that are equivalent to those obtained with the ACK alg orithm. OSR has been implemented in several research and production compilers
Some optimizations of hardware multiplication by constant matrices
 Federal Communications Commission
, 2003
"... This paper presents some improvements on the optimization of hardware multiplication by constant matrices. We focus on the automatic generation of circuits that involve constant matrix multiplication (CMM), i.e. multiplication of a vector by a constant matrix. The proposed method, based on number re ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
This paper presents some improvements on the optimization of hardware multiplication by constant matrices. We focus on the automatic generation of circuits that involve constant matrix multiplication (CMM), i.e. multiplication of a vector by a constant matrix. The proposed method, based on number recoding and dedicated common subexpression factorization algorithms was implemented in a VHDL generator. The obtained results on several applications have been implemented on FPGAs and compared to previous solutions. Up to 40 % area and speed savings are achieved. 1
Extended results for minimumadder constant integer multipliers
 in Circuits and Systems, IEEE International Symposium on. IEEE, May 2002
, 2002
"... By introducing simplifications to multiplier graphs we extend the previous work on minimum adder multipliers to five adders and show that this is enough to express all coefficients up to 19 bits. The average savings are more than 25 % for 19 bits compared with CSD multipliers. The simplifications in ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
By introducing simplifications to multiplier graphs we extend the previous work on minimum adder multipliers to five adders and show that this is enough to express all coefficients up to 19 bits. The average savings are more than 25 % for 19 bits compared with CSD multipliers. The simplifications include addition reordering and vertex reduction to see that different graphs can generate the same coefficient sets. Thus, fewer graphs need to be evaluated. A classification of the graphs reduces the effort to search the coefficient space further. 1.
Multiplierless multiple constant multiplication
 ACM Trans. Algorithms
"... A variable can be multiplied by a given set of fixedpoint constants using a multiplier block that consists exclusively of additions, subtractions, and shifts. The generation of a multiplier block from the set of constants is known as the multiple constant multiplication (MCM) problem. Finding the o ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
A variable can be multiplied by a given set of fixedpoint constants using a multiplier block that consists exclusively of additions, subtractions, and shifts. The generation of a multiplier block from the set of constants is known as the multiple constant multiplication (MCM) problem. Finding the optimal solution, i.e., the one with the fewest number of additions and subtractions is known to be NPcomplete. We propose a new algorithm for the MCM problem, which produces solutions that require up to 20 % less additions and subtractions than the best previously known algorithm. At the same time our algorithm, in contrast to the closest competing algorithm, is not limited by the constant bitwidths. We present our algorithm using a unifying formal framework for the best, graphbased, MCM algorithms and provide a detailed runtime analysis and experimental evaluation. We show that our algorithm can handle problem sizes as large as 100 32bit constants in a time acceptable for most applications. The implementation of the new algorithm is available at www.spiral.net.
Constant Multipliers for FPGAs
, 2000
"... This paper presents a survey of techniques to implement multiplications by constants on FPGAs. It shows in particular that a simple and wellknown technique, canonical signed recoding, can help design smaller constant multiplier cores than those present in current libraries. An implementation of thi ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
This paper presents a survey of techniques to implement multiplications by constants on FPGAs. It shows in particular that a simple and wellknown technique, canonical signed recoding, can help design smaller constant multiplier cores than those present in current libraries. An implementation of this idea in Xilinx JBits is detailed and discussed. The use of the latest algorithms for discovering efficient chains of adders, subtractors and shifters for a given constant multiplication is also discussed. Exploring such solutions is made possible by the new FPGA programming frameworks based on generic programming languages, such as JBits, which allow an arbitrary amount of irregularity to be implemented even within an arithmetic core.
Multiplication by an Integer Constant
, 2001
"... We present and compare various algorithms, including a new one, allowing to perform multiplications by integer constants using elementary operations. Such algorithms are useful, as they occur in several problems, such as the ToomCooklike algorithms to multiply large multipleprecision integers, th ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
We present and compare various algorithms, including a new one, allowing to perform multiplications by integer constants using elementary operations. Such algorithms are useful, as they occur in several problems, such as the ToomCooklike algorithms to multiply large multipleprecision integers, the approximate computation of consecutive values of a polynomial, and the generation of integer multiplications by compilers.