Results 1 - 10
of
17
Effective Partial Redundancy Elimination
- Proceedings of the ACM SIGPLAN '94 Conference on Programming Language Design and Implementation
, 1994
"... Partial redundancy elimination is a code optimization with a long history of literature and implementation. In practice, its effectiveness depends on issues of naming and code shape. This paper shows that a combination of global reassociation and global value numbering can increase the effectiveness ..."
Abstract
-
Cited by 84 (13 self)
- Add to MetaCart
Partial redundancy elimination is a code optimization with a long history of literature and implementation. In practice, its effectiveness depends on issues of naming and code shape. This paper shows that a combination of global reassociation and global value numbering can increase the effectiveness of partial redundancy elimination. By imposing a discipline on the choice of names and the shape of expressions, we are able to expose more redundancies. As part of the work, we introduce a new algorithm for global reassociation of expressions. It uses global information to reorder expressions, creating opportunities for other optimizations. The new algorithm generalizes earlier work that ordered FORTRAN array address expressions to improve optimization [25]. 1 Introduction Partial redundancy elimination is a powerful optimization that has been discussed in the literature for many years (e.g., [21, 8, 14, 12, 18]). Unfortunately, partial redundancy elimination has two serious limitations...
Experience with a Software-Defined Machine Architecture
- Unreachable Procedures in Object-oriented WRL Research Report 91/10
, 1991
"... We built a system in which the compiler back end and the linker work together to present an abstract machine at a considerably higher level than the actual machine. The intermediate language translated by the back end is the target language of all high-level compilers and is also the only assembl ..."
Abstract
-
Cited by 53 (7 self)
- Add to MetaCart
We built a system in which the compiler back end and the linker work together to present an abstract machine at a considerably higher level than the actual machine. The intermediate language translated by the back end is the target language of all high-level compilers and is also the only assembly language generally available. This lets us do intermodule register allocation, which would be harder if some of the code in the program had come from a traditional assembler, out of sight of the optimizer. We do intermodule register allocation and pipeline instruction scheduling at link time, using information gathered by the compiler back end. The mechanism for analyzing and modifying the program at link time was also useful in a wide array of instrumentation tools. i 1. Introduction When our lab built its experimental RISC workstation, the Titan, we defined a high-level assembly language as the official interface to the machine. This high-level assembly language, called Mahler,...
Automatic, Template-Based Run-Time Specialization: Implementation and Experimental Study
- In International Conference on Computer Languages
, 1998
"... Specializing programs with respect to run-time values has been shown to drastically improve code performance on realistic programs ranging from operating systems to graphics. Recently, various approaches to specializing code at run-time have been proposed. However, these approaches still suffer from ..."
Abstract
-
Cited by 47 (12 self)
- Add to MetaCart
Specializing programs with respect to run-time values has been shown to drastically improve code performance on realistic programs ranging from operating systems to graphics. Recently, various approaches to specializing code at run-time have been proposed. However, these approaches still suffer from shortcomings that limit their applicability: they are manual, too expensive, or require programs to be written in a dedicated language. We solve these problems by introducing new techniques to implement run-time specialization. The key to our approach is the use of code templates. Templates are automatically generated from ordinary programs and are optimized before run time, allowing high-quality code to be quickly generated at run time. Experimental results obtained on scientific and graphics code indicate that our approach is highly effective. Little run-time overhead is introduced, since code generation primarily consists of copying instructions. Run-time specialized programs run up to 1...
Operator Strength Reduction
, 1995
"... This paper presents a new al gS ithm for operator strengM reduction, called OSR. OSR improves upon an earlier alg orithm due to Allen, Cocke, and Kennedy [Allen et al. 1981]. OSR operates on the static sing e assig4 ent (SSA) form of a procedure [Cytron et al. 1991]. By taking advantag of the pr ..."
Abstract
-
Cited by 26 (9 self)
- Add to MetaCart
This paper presents a new al gS ithm for operator strengM reduction, called OSR. OSR improves upon an earlier alg orithm due to Allen, Cocke, and Kennedy [Allen et al. 1981]. OSR operates on the static sing e assig4 ent (SSA) form of a procedure [Cytron et al. 1991]. By taking advantag of the properties of SSA form, we have derived an alg--- ithm that is simple to understand, quick to implement, and, in practice, fast to run. Its asymptotic complexity is, in the worst case, the same as the Allen, Cocke, and Kennedy al gS ithm (ACK). OSR achieves optimization results that are equivalent to those obtained with the ACK alg orithm. OSR has been implemented in several research and production compilers
Some optimizations of hardware multiplication by constant matrices
- Federal Communications Commission
, 2003
"... This paper presents some improvements on the optimization of hardware multiplication by constant matrices. We focus on the automatic generation of circuits that involve constant matrix multiplication (CMM), i.e. multiplication of a vector by a constant matrix. The proposed method, based on number re ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
This paper presents some improvements on the optimization of hardware multiplication by constant matrices. We focus on the automatic generation of circuits that involve constant matrix multiplication (CMM), i.e. multiplication of a vector by a constant matrix. The proposed method, based on number recoding and dedicated common sub-expression factorization algorithms was implemented in a VHDL generator. The obtained results on several applications have been implemented on FPGAs and compared to previous solutions. Up to 40 % area and speed savings are achieved. 1
Constant Multipliers for FPGAs
, 2000
"... This paper presents a survey of techniques to implement multiplications by constants on FPGAs. It shows in particular that a simple and well-known technique, canonical signed recoding, can help design smaller constant multiplier cores than those present in current libraries. An implementation of thi ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
This paper presents a survey of techniques to implement multiplications by constants on FPGAs. It shows in particular that a simple and well-known technique, canonical signed recoding, can help design smaller constant multiplier cores than those present in current libraries. An implementation of this idea in Xilinx JBits is detailed and discussed. The use of the latest algorithms for discovering efficient chains of adders, subtractors and shifters for a given constant multiplication is also discussed. Exploring such solutions is made possible by the new FPGA programming frameworks based on generic programming languages, such as JBits, which allow an arbitrary amount of irregularity to be implemented even within an arithmetic core.
Extended results for minimum-adder constant integer multipliers
- in Circuits and Systems, IEEE International Symposium on. IEEE, May 2002
, 2002
"... By introducing simplifications to multiplier graphs we extend the previous work on minimum adder multipliers to five adders and show that this is enough to express all coefficients up to 19 bits. The average savings are more than 25 % for 19 bits compared with CSD multipliers. The simplifications in ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
By introducing simplifications to multiplier graphs we extend the previous work on minimum adder multipliers to five adders and show that this is enough to express all coefficients up to 19 bits. The average savings are more than 25 % for 19 bits compared with CSD multipliers. The simplifications include addition reordering and vertex reduction to see that different graphs can generate the same coefficient sets. Thus, fewer graphs need to be evaluated. A classification of the graphs reduces the effort to search the coefficient space further. 1.
Multiplierless Multiple Constant Multiplication
- ACM Transactions on Algorithms
"... A variable can be multiplied by a given set of fixed-point constants using a multiplier block that consists exclusively of additions, subtractions, and shifts. The generation of a multiplier block from the set of constants is known as the multiple constant multiplication (MCM) problem. Finding the o ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
A variable can be multiplied by a given set of fixed-point constants using a multiplier block that consists exclusively of additions, subtractions, and shifts. The generation of a multiplier block from the set of constants is known as the multiple constant multiplication (MCM) problem. Finding the optimal solution, i.e., the one with the fewest number of additions and subtractions is known to be NP-complete. We propose a new heuristic algorithm for the MCM problem, which finds solutions that require up to 20 % less additions and subtractions than the solutions found by the best previously known algorithm. At the same time, our algorithms is not limited by the constant bitwidths, in contrast to the closest competing algorithm. We present our algorithm using a unifying formal framework for the best, graph-based MCM algorithms and provide a detailed runtime analysis and experimental evaluation. We show that our algorithm can handle problem sizes as large as 100 32-bit constants in a time acceptable for most applications. The implementation of the new algorithm is available at www.spiral.net.
Multiplication by an Integer Constant
, 2001
"... We present and compare various algorithms, including a new one, allowing to perform multiplications by integer constants using elementary operations. Such algorithms are useful, as they occur in several problems, such as the Toom-Cook-like algorithms to multiply large multiple-precision integers, th ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We present and compare various algorithms, including a new one, allowing to perform multiplications by integer constants using elementary operations. Such algorithms are useful, as they occur in several problems, such as the Toom-Cook-like algorithms to multiply large multiple-precision integers, the approximate computation of consecutive values of a polynomial, and the generation of integer multiplications by compilers.
Multiplication by a constant is sublinear
- In 18th Symposium on Computer Arithmetic. IEEE
, 2007
"... Abstract — This paper explores the use of the double-base number system (DBNS) for constant integer multiplication. The DBNS recoding scheme represents integers – in this case constants – in a multiple-radix way in the hope of minimizing the number of additions to be performed during constant multip ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract — This paper explores the use of the double-base number system (DBNS) for constant integer multiplication. The DBNS recoding scheme represents integers – in this case constants – in a multiple-radix way in the hope of minimizing the number of additions to be performed during constant multiplication. On the theoretical side, we propose a formal proof which shows that our recoding technique diminishes the number of additions in a sublinear way. Therefore, we prove Lefèvre’s conjecture that the multiplication by an integer constant is achievable in sublinear time. In a second part, we investigate various strategies and we provide numerical data showcasing the potential interest of our approach. I.

