Results 1  10
of
13
Reduced Power Dissipation Through Truncated Multiplication
 in IEEE Alessandro Volta Memorial Workshop on Low Power Design
, 1999
"... Reducing the power dissipation of parallel multipliers is important in the design of digital signal processing systems. In many of these systems, the products of parallel multipliers are rounded to avoid growth in word size. The power dissipation and area of rounded parallel multipliers can be signi ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
Reducing the power dissipation of parallel multipliers is important in the design of digital signal processing systems. In many of these systems, the products of parallel multipliers are rounded to avoid growth in word size. The power dissipation and area of rounded parallel multipliers can be significantly reduced by a technique known as truncated multiplication. With this technique, the least significant columns of the multiplication matrix are not used. Instead, the carries generated by these columns are estimated. This estimate is added with the most significant columns to produce the rounded product. This paper presents the design and implementation of parallel truncated multipliers. Simulations indicate that truncated parallel multipliers dissipate between 29 and 40 percent less power than standard parallel multipliers for operand sizes of 16 and 32 bits. 1: Introduction Highspeed parallel multipliers are fundamental building blocks in digital signal processing systems [1]. In...
VariableCorrection Truncated Floating Point Multipliers
 in Proceedings of the Thirty Fourth Asilomar Conference on Signals, Circuits and Systems
, 2000
"... About half the hardware for floating point multipliers is needed only to guarantee correctly rounded results. For multimedia, graphics, and DSP systems, a significant reduction in area, delay, and power can be achieved by producing results that are not correctly rounded. This paper presents an ef ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
About half the hardware for floating point multipliers is needed only to guarantee correctly rounded results. For multimedia, graphics, and DSP systems, a significant reduction in area, delay, and power can be achieved by producing results that are not correctly rounded. This paper presents an efficient method for designing variablecorrection truncated floating point multipliers that produce results with a maximum error of less than one unit in the last place. With this method, several of the less significant columns of the significand multiplier and the rounding logic for floating point multiplication are eliminated. Technical areas: (13) DSP hardware, software, and coreware; (14) ASIC and FPGA algorithm/processor design. POC: Michael Schulte, 19 Memorial Dr. West, EECS Dept., Lehigh University, Bethlehem, PA 18015. Email: mschulte@eecs.lehigh.edu, Phone: (610) 7585036, FAX: (610) 7586279. Extended Abstract Most modern processors perform floating point operations accord...
HighSpeed Inverse Square Roots
 Proceedings of the 14th IEEE Symposium on Computer Arithmetic
, 1999
"... Inverse square roots are used in several digital signal processing, multimedia, and scientific computing applications. This paper presents a highspeed method for computing inverse square roots. This method uses a table lookup, operand modification, and multiplication to obtain an initial approximat ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Inverse square roots are used in several digital signal processing, multimedia, and scientific computing applications. This paper presents a highspeed method for computing inverse square roots. This method uses a table lookup, operand modification, and multiplication to obtain an initial approximation to the inverse square root. This is followed by a modified NewtonRaphson iteration, consisting of one square, one multiplycomplement, and one multiplyadd operation. The initial approximation and NewtonRaphson iteration employ specialized hardware to reduce the delay, area, and power dissipation. Application of this method is illustrated through the design of an inverse square root unit for operands in the IEEE single precision format. An implementation of this unit with a 4layer metal, 2.5 Volt, 0.25 micron CMOS standard cell library has a cycle time of 6.7 ns, an area of 0.41 mm 2 , a latency of five cycles, and a throughput of one result per cycle. 1. Introduction Square roots a...
A Custom Computing Framework for Orientation and Photogrammetry
 MIT EECS
, 2000
"... There is great demand today for realtime computer vision systems, with applications including image enhancement, target detection and surveillance, autonomous navigation, and scene reconstruction. These operations generally require extensive computing power; when multiple conventional processors an ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
There is great demand today for realtime computer vision systems, with applications including image enhancement, target detection and surveillance, autonomous navigation, and scene reconstruction. These operations generally require extensive computing power; when multiple conventional processors and custom gate arrays are inappropriate, due to either excessive cost or risk, a class of devices known as FieldProgrammable Gate Arrays (FPGAs) can be employed. FPGAs offer the flexibility of a programmable solution and nearly the performance of a custom gate array. When implementing a custom algorithm in an FPGA, one must be more efficient than with a gate array technology. By tailoring the algorithms, architectures, and precisions, the gate count of an algorithm may be sufficiently reduced to fit into an FPGA. The challenge is to perform this customization of the algorithm, while still maintaining the required performance. The techniques required to perform algorithmic optimization for FPGAs are scattered across many fields; what is currently lacking is a framework for utilizing all these well known and developing techniques. The purpose of this thesis is to develop
Using truncated multipliers in DCT and IDCT hardware accelerators
, 2003
"... Truncated multipliers offer significant improvements in area, delay, and power. However, little research has been done on their use in actual applications, probably due to concern about the computational errors they introduce. This paper describes a software tool used for simulating the use of trunc ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Truncated multipliers offer significant improvements in area, delay, and power. However, little research has been done on their use in actual applications, probably due to concern about the computational errors they introduce. This paper describes a software tool used for simulating the use of truncated multipliers in DCT and IDCT hardware accelerators. Images that have been compressed and decompressed by DCT and IDCT accelerators using truncated multipliers are presented. In accelerators based on Chen's algorithm (256 multiplies per 8 block for DCT, 192 multiplies per block for IDCT), there is no visible difference between images reconstructed using truncated multipliers with 50% of the multiplication matrix eliminated and images reconstructed using standard multipliers with the same operand lengths and intermediate precision.
A Generalized Methodology for LowerError Area Efficient FixedWidth Multipliers
 in Proceedings of the IEEE International Symposium on Circuits and Systems
, 2002
"... In this paper, we extend our generalized methodology for designing lowererror areaefficient fixedwidth two’scomplement multipliers that receive two sbit numbers and produce an sbit product. The generalized methodology involving four steps results in several better errorcompensation biases. Th ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In this paper, we extend our generalized methodology for designing lowererror areaefficient fixedwidth two’scomplement multipliers that receive two sbit numbers and produce an sbit product. The generalized methodology involving four steps results in several better errorcompensation biases. These better errorcompensation biases can be easily mapped to lowererror fixedwidth multipliers suitable for VLSI realization. 1.
International Journal of Electronics and Computer Science Engineering 627 Available Online at www.ijecse.org ISSN 22771956 Traditional and Truncation schemes for Different Multiplier
"... Abstract A rapid and proficient in power requirement multiplier is always vital in electronics industry like DSP, image processing and ALU in microprocessors. Multiplier is such an imperative block w ith respect to power consumption and area occupied in the system. In order to meet the demand for h ..."
Abstract
 Add to MetaCart
Abstract A rapid and proficient in power requirement multiplier is always vital in electronics industry like DSP, image processing and ALU in microprocessors. Multiplier is such an imperative block w ith respect to power consumption and area occupied in the system. In order to meet the demand for high speed, various parallel array multiplication algorithms have been proposed by a number of authors. The array multipliers use a large amount of hardware, consequently consuming a large amount of power. One of the methods for multiplication is based on Indian Vedic mathematics. The total Vedic mathematics is based on sixteen sutras (word formulae) and manifests a merged structure of mathematics. The parallel multipliers for example radix 2 and radix 4 booth multiplier does the computations using less number of adders and less number of iterative steps that results in, they occupy less space to that of serial multiplier. Truncated multipliers offer noteworthy enhancements in area, delay, and power. Truncated multiplication provides different method for reducing the power dissipation and area of rounded parallel multipliers in DSP systems. Since in a truncated multiplier the x less significant bits of the fullwidth product are discarded thus partial products are removed and replaced by a suit able compensation equations, match the accuracy with hardware cost. A pseudocarry compensation truncation (PCT) scheme, it is for the multiplexer based array multiplier, which yields less average error among existing truncation methods. After studying many research papers it’s found that some of the schemes for multiplier are suitable because their own
unknown title
"... Multiplication is frequently required in digital signal processing. Parallel multipliers provide a highspeed method for multiplication, but require large area for VLSI implementations. In most signal processing applications, a rounded product is desired to avoid growth in word size. Thus an importa ..."
Abstract
 Add to MetaCart
Multiplication is frequently required in digital signal processing. Parallel multipliers provide a highspeed method for multiplication, but require large area for VLSI implementations. In most signal processing applications, a rounded product is desired to avoid growth in word size. Thus an important design goal is to reduce the area requirement of the rounded output multiplier. This paper presents a method for parallel multiplication which computes the products of two nbit numbers by summing only the most significant columns with a variable correction method. This paper also presents a comparative study of Field Programmable Gate Array (FPGA) implementation of
Truncated Squarers with Constant and Variable Correction
"... Please verify that (1) all pages are present, (2) all figures are acceptable, (3) all fonts and special characters are correct, and (4) all text and figures fit within the ..."
Abstract
 Add to MetaCart
Please verify that (1) all pages are present, (2) all figures are acceptable, (3) all fonts and special characters are correct, and (4) all text and figures fit within the
NewtonRaphson division module via truncated multipliers
"... Reduction in area and power dissipation of NewtonRaphson (NR) division module could be achieved by substituting the standard parallel multiplier with truncated multiplier. This is possible since when standard multipliers are employed, their products are rounded to avoid unnecessary growth in word ..."
Abstract
 Add to MetaCart
Reduction in area and power dissipation of NewtonRaphson (NR) division module could be achieved by substituting the standard parallel multiplier with truncated multiplier. This is possible since when standard multipliers are employed, their products are rounded to avoid unnecessary growth in word size. Truncated array multipliers, on the other hand, will generate a higher maximum absolute error, and even when limited to one unit in the last place it is not clear how it will affect the performance of the unit. In this paper error analysis of 8bit NR module, using Variable Correction Truncated Array and Constant Correction Truncated Dadda multipliers is performed. The maximum and absolute errors are analyzed and contrasted from the implementations of the standard multipliers. Typically division in Digital Signal Processing (DSP) Processors is implemented either with multiplicative algorithms such as NewtonRaphson (NR) and Goldschmidt's algorithms, or via additive algorithms such as digitrecurrence