Results 1  10
of
10
Reduced Power Dissipation Through Truncated Multiplication
 in IEEE Alessandro Volta Memorial Workshop on Low Power Design
, 1999
"... Reducing the power dissipation of parallel multipliers is important in the design of digital signal processing systems. In many of these systems, the products of parallel multipliers are rounded to avoid growth in word size. The power dissipation and area of rounded parallel multipliers can be signi ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
Reducing the power dissipation of parallel multipliers is important in the design of digital signal processing systems. In many of these systems, the products of parallel multipliers are rounded to avoid growth in word size. The power dissipation and area of rounded parallel multipliers can be significantly reduced by a technique known as truncated multiplication. With this technique, the least significant columns of the multiplication matrix are not used. Instead, the carries generated by these columns are estimated. This estimate is added with the most significant columns to produce the rounded product. This paper presents the design and implementation of parallel truncated multipliers. Simulations indicate that truncated parallel multipliers dissipate between 29 and 40 percent less power than standard parallel multipliers for operand sizes of 16 and 32 bits. 1: Introduction Highspeed parallel multipliers are fundamental building blocks in digital signal processing systems [1]. In...
VariableCorrection Truncated Floating Point Multipliers
 in Proceedings of the Thirty Fourth Asilomar Conference on Signals, Circuits and Systems
, 2000
"... About half the hardware for floating point multipliers is needed only to guarantee correctly rounded results. For multimedia, graphics, and DSP systems, a significant reduction in area, delay, and power can be achieved by producing results that are not correctly rounded. This paper presents an ef ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
About half the hardware for floating point multipliers is needed only to guarantee correctly rounded results. For multimedia, graphics, and DSP systems, a significant reduction in area, delay, and power can be achieved by producing results that are not correctly rounded. This paper presents an efficient method for designing variablecorrection truncated floating point multipliers that produce results with a maximum error of less than one unit in the last place. With this method, several of the less significant columns of the significand multiplier and the rounding logic for floating point multiplication are eliminated. Technical areas: (13) DSP hardware, software, and coreware; (14) ASIC and FPGA algorithm/processor design. POC: Michael Schulte, 19 Memorial Dr. West, EECS Dept., Lehigh University, Bethlehem, PA 18015. Email: mschulte@eecs.lehigh.edu, Phone: (610) 7585036, FAX: (610) 7586279. Extended Abstract Most modern processors perform floating point operations accord...
HighSpeed Inverse Square Roots
 Proceedings of the 14th IEEE Symposium on Computer Arithmetic
, 1999
"... Inverse square roots are used in several digital signal processing, multimedia, and scientific computing applications. This paper presents a highspeed method for computing inverse square roots. This method uses a table lookup, operand modification, and multiplication to obtain an initial approximat ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Inverse square roots are used in several digital signal processing, multimedia, and scientific computing applications. This paper presents a highspeed method for computing inverse square roots. This method uses a table lookup, operand modification, and multiplication to obtain an initial approximation to the inverse square root. This is followed by a modified NewtonRaphson iteration, consisting of one square, one multiplycomplement, and one multiplyadd operation. The initial approximation and NewtonRaphson iteration employ specialized hardware to reduce the delay, area, and power dissipation. Application of this method is illustrated through the design of an inverse square root unit for operands in the IEEE single precision format. An implementation of this unit with a 4layer metal, 2.5 Volt, 0.25 micron CMOS standard cell library has a cycle time of 6.7 ns, an area of 0.41 mm 2 , a latency of five cycles, and a throughput of one result per cycle. 1. Introduction Square roots a...
A Custom Computing Framework for Orientation and Photogrammetry
 MIT EECS
, 2000
"... There is great demand today for realtime computer vision systems, with applications including image enhancement, target detection and surveillance, autonomous navigation, and scene reconstruction. These operations generally require extensive computing power; when multiple conventional processors an ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
There is great demand today for realtime computer vision systems, with applications including image enhancement, target detection and surveillance, autonomous navigation, and scene reconstruction. These operations generally require extensive computing power; when multiple conventional processors and custom gate arrays are inappropriate, due to either excessive cost or risk, a class of devices known as FieldProgrammable Gate Arrays (FPGAs) can be employed. FPGAs offer the flexibility of a programmable solution and nearly the performance of a custom gate array. When implementing a custom algorithm in an FPGA, one must be more efficient than with a gate array technology. By tailoring the algorithms, architectures, and precisions, the gate count of an algorithm may be sufficiently reduced to fit into an FPGA. The challenge is to perform this customization of the algorithm, while still maintaining the required performance. The techniques required to perform algorithmic optimization for FPGAs are scattered across many fields; what is currently lacking is a framework for utilizing all these well known and developing techniques. The purpose of this thesis is to develop
A Generalized Methodology for LowerError Area Efficient FixedWidth Multipliers
 in Proceedings of the IEEE International Symposium on Circuits and Systems
, 2002
"... In this paper, we extend our generalized methodology for designing lowererror areaefficient fixedwidth two’scomplement multipliers that receive two sbit numbers and produce an sbit product. The generalized methodology involving four steps results in several better errorcompensation biases. Th ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In this paper, we extend our generalized methodology for designing lowererror areaefficient fixedwidth two’scomplement multipliers that receive two sbit numbers and produce an sbit product. The generalized methodology involving four steps results in several better errorcompensation biases. These better errorcompensation biases can be easily mapped to lowererror fixedwidth multipliers suitable for VLSI realization. 1.
Using truncated multipliers in DCT and IDCT hardware accelerators
, 2003
"... Truncated multipliers offer significant improvements in area, delay, and power. However, little research has been done on their use in actual applications, probably due to concern about the computational errors they introduce. This paper describes a software tool used for simulating the use of trunc ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Truncated multipliers offer significant improvements in area, delay, and power. However, little research has been done on their use in actual applications, probably due to concern about the computational errors they introduce. This paper describes a software tool used for simulating the use of truncated multipliers in DCT and IDCT hardware accelerators. Images that have been compressed and decompressed by DCT and IDCT accelerators using truncated multipliers are presented. In accelerators based on Chen's algorithm (256 multiplies per 8 block for DCT, 192 multiplies per block for IDCT), there is no visible difference between images reconstructed using truncated multipliers with 50% of the multiplication matrix eliminated and images reconstructed using standard multipliers with the same operand lengths and intermediate precision.
Carry Prediction and Selection for Truncated Multiplication
"... Abstract This paper presents an error compensation method for truncated multiplication. From two nbit operands, the operator produces an nbit product with small error compared to the 2nbit exact product. The method is based on a logical computation followed by a simplification process. The filte ..."
Abstract
 Add to MetaCart
Abstract This paper presents an error compensation method for truncated multiplication. From two nbit operands, the operator produces an nbit product with small error compared to the 2nbit exact product. The method is based on a logical computation followed by a simplification process. The filtering parameter used in the simplification process helps to control the tradeoff between hardware cost and accuracy. The proposed truncated multiplication scheme has been synthesized on an FPGA platform. It gives a better accuracy over area ratio than previous wellknown schemes such as the constant correcting and variable correcting truncation schemes (CCT and VCT). I.
unknown title
"... Multiplication is frequently required in digital signal processing. Parallel multipliers provide a highspeed method for multiplication, but require large area for VLSI implementations. In most signal processing applications, a rounded product is desired to avoid growth in word size. Thus an importa ..."
Abstract
 Add to MetaCart
Multiplication is frequently required in digital signal processing. Parallel multipliers provide a highspeed method for multiplication, but require large area for VLSI implementations. In most signal processing applications, a rounded product is desired to avoid growth in word size. Thus an important design goal is to reduce the area requirement of the rounded output multiplier. This paper presents a method for parallel multiplication which computes the products of two nbit numbers by summing only the most significant columns with a variable correction method. This paper also presents a comparative study of Field Programmable Gate Array (FPGA) implementation of
International Journal of Electronics and Computer Science Engineering 627 Available Online at www.ijecse.org ISSN 22771956 Traditional and Truncation schemes for Different Multiplier
"... Abstract A rapid and proficient in power requirement multiplier is always vital in electronics industry like DSP, image processing and ALU in microprocessors. Multiplier is such an imperative block w ith respect to power consumption and area occupied in the system. In order to meet the demand for h ..."
Abstract
 Add to MetaCart
Abstract A rapid and proficient in power requirement multiplier is always vital in electronics industry like DSP, image processing and ALU in microprocessors. Multiplier is such an imperative block w ith respect to power consumption and area occupied in the system. In order to meet the demand for high speed, various parallel array multiplication algorithms have been proposed by a number of authors. The array multipliers use a large amount of hardware, consequently consuming a large amount of power. One of the methods for multiplication is based on Indian Vedic mathematics. The total Vedic mathematics is based on sixteen sutras (word formulae) and manifests a merged structure of mathematics. The parallel multipliers for example radix 2 and radix 4 booth multiplier does the computations using less number of adders and less number of iterative steps that results in, they occupy less space to that of serial multiplier. Truncated multipliers offer noteworthy enhancements in area, delay, and power. Truncated multiplication provides different method for reducing the power dissipation and area of rounded parallel multipliers in DSP systems. Since in a truncated multiplier the x less significant bits of the fullwidth product are discarded thus partial products are removed and replaced by a suit able compensation equations, match the accuracy with hardware cost. A pseudocarry compensation truncation (PCT) scheme, it is for the multiplexer based array multiplier, which yields less average error among existing truncation methods. After studying many research papers it’s found that some of the schemes for multiplier are suitable because their own
Design and Implementation of Truncated Multipliers for Precision Improvement and Its Application to a Filter Structure
"... ABSTRACT: Truncated multipliers offers significant improvements in area, delay, and power. The proposed method finally reduces the number of full adders and half adders during the tree reduction. While using this proposed method experimentally, area can be saved. The output is in the form of LSB and ..."
Abstract
 Add to MetaCart
ABSTRACT: Truncated multipliers offers significant improvements in area, delay, and power. The proposed method finally reduces the number of full adders and half adders during the tree reduction. While using this proposed method experimentally, area can be saved. The output is in the form of LSB and MSB. Finally the LSB part is compressed by using operations such as deletion, reduction, truncation, rounding and final addition. In previous related papers, to reduce the truncation error by adding error compensation circuits. In this project truncation error is not more than 1 ulp (unit of least position). So there is no need of error compensation circuits, and the final output will be précised. To further extend the work the design is realized in a FIR filter.