Results 1  10
of
12
Reduced Power Dissipation Through Truncated Multiplication
 in IEEE Alessandro Volta Memorial Workshop on Low Power Design
, 1999
"... Reducing the power dissipation of parallel multipliers is important in the design of digital signal processing systems. In many of these systems, the products of parallel multipliers are rounded to avoid growth in word size. The power dissipation and area of rounded parallel multipliers can be signi ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
Reducing the power dissipation of parallel multipliers is important in the design of digital signal processing systems. In many of these systems, the products of parallel multipliers are rounded to avoid growth in word size. The power dissipation and area of rounded parallel multipliers can be significantly reduced by a technique known as truncated multiplication. With this technique, the least significant columns of the multiplication matrix are not used. Instead, the carries generated by these columns are estimated. This estimate is added with the most significant columns to produce the rounded product. This paper presents the design and implementation of parallel truncated multipliers. Simulations indicate that truncated parallel multipliers dissipate between 29 and 40 percent less power than standard parallel multipliers for operand sizes of 16 and 32 bits. 1: Introduction Highspeed parallel multipliers are fundamental building blocks in digital signal processing systems [1]. In...
VariablePrecision, Interval Arithmetic Processors
"... This chapter presents the design and analysis of variableprecision, interval arithmetic processors. The processors give the user the ability to specify the precision of the computation, determine the accuracy of the results, and recompute inaccurate results with higher precision. The processors sup ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
This chapter presents the design and analysis of variableprecision, interval arithmetic processors. The processors give the user the ability to specify the precision of the computation, determine the accuracy of the results, and recompute inaccurate results with higher precision. The processors support a wide variety of arithmetic operations on variableprecision floating point numbers and intervals. Efficient hardware algorithms and specially designed functional units increase the speed, accuracy, and reliability of numerical computations. Area and delay estimates indicate that the processors can be implemented with areas and cycle times that are comparable to conventional IEEE doubleprecision floating point coprocessors. Execution time estimates indicate that the processors are two to three orders of magnitude faster than a conventional software package for variableprecision, interval arithmetic. 1.1 INTRODUCTION Floating point arithmetic provides a highspeed method for perform...
Parallel Saturating Fractional Arithmetic Units
 IN 9TH GREAT LAKES SYMPOSIUM ON VLSI
, 1999
"... This paper describes the designs of a saturating adder, multiplier, single MAC unit, and dual MAC unit with one cycle latencies. The dual MAC unit can perform two saturating MAC operations in parallel and accumulate the results with saturation. Specialized saturation logic ensures that the output of ..."
Abstract

Cited by 12 (6 self)
 Add to MetaCart
This paper describes the designs of a saturating adder, multiplier, single MAC unit, and dual MAC unit with one cycle latencies. The dual MAC unit can perform two saturating MAC operations in parallel and accumulate the results with saturation. Specialized saturation logic ensures that the output of the dual MAC unit is identical to the result of the operations performed serially with saturation after each multiplication and each addition 1
Design Tradeoffs Using Truncated Multipliers in Fir Filter Implementations
, 2002
"... This paper presents a general FIR filter architecture utilizing truncated tree multipliers for computation. The average error, maximum error, and variance of error due to truncation are derived for the proposed architecture. A novel technique that reduces the average error of the filter and is indep ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
This paper presents a general FIR filter architecture utilizing truncated tree multipliers for computation. The average error, maximum error, and variance of error due to truncation are derived for the proposed architecture. A novel technique that reduces the average error of the filter and is independent of the number of unformed columns is presented, as well as equations describing the signaltonoise ratio of the truncation error. A software tool written in Java is described that automatically generates structural VHDL models for specific filters based on this architecture, given parameters such as the number of taps, operand lengths, number of multipliers, and the number of truncated columns. We show that a 22.5% reduction in area can be achieved for a 24tap filter with 16bit coe#cients. The ratio of the average error to the full scale value is only 1.4 10 9 , with only an 8.4 dB reduction in SNR for this implementation.
A Combined Interval and Floating Point Multiplier
 In Proceedings of the 8th Great Lakes Symposium on VLSI (Los Alamitos, CA
, 1998
"... Interval arithmetic provides an efficient method for monitoring and controlling errors in numerical calculations. However, existing software packages for interval arithmetic are often too slow for numerically intensive computations. This paper presents the design of a multiplier that performs either ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Interval arithmetic provides an efficient method for monitoring and controlling errors in numerical calculations. However, existing software packages for interval arithmetic are often too slow for numerically intensive computations. This paper presents the design of a multiplier that performs either interval or floating point multiplication. This multiplier requires only slightly more area and delay than a conventional floating point multiplier, and is one to two orders of magnitude faster than software implementations of interval multiplication. 1 Introduction The performance of conventional microprocessors currently increases at a rate of approximately 55 percent per year and is expected to increase by a factor of 50 over the next ten years [1]. This rapid increase in computing power has led to a greater reliance on results produced by computer simulation and modeling. Although many areas depend on computer generated results for reliable information, roundoff error and catastrophic...
Combined Unsigned and Two's Complement Saturating Multipliers
, 2000
"... In many digital signal processing and multimedia applications, results that overflow are saturated to the most positive or most negative representable number. This paper presents efficient techniques for performing saturating nbit integer multiplication on unsigned and two's complement numbers. Un ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
In many digital signal processing and multimedia applications, results that overflow are saturated to the most positive or most negative representable number. This paper presents efficient techniques for performing saturating nbit integer multiplication on unsigned and two's complement numbers. Unlike conventional techniques for saturating multiplication, which compute a 2nbit product and then examine the n most significant product bits to determine if overflow has occurred, the techniques presented in this paper compute only the (n + 1) least significant bits of the product. Specialized overflow detection units, which operate in parallel with the multiplier, determine if overflow has occurred and the product should be saturated. These techniques are applied to designs for saturating array multipliers that perform either unsigned or two's complement saturating integer multiplication, based on an input control signal. Compared to array multipliers that use conventional methods for sa...
A Combined 16Bit Binary And Dual Galois Field Multiplier
"... Galois field arithmetic is commonly used in ReedSolomon encoding and decoding. This paper presents the design of a combined 16bit binary and dual Galois field (GF) multiplier. This multiplier is capable of performing either a 16bit two's complement or unsigned multiplication, or two independent 8 ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Galois field arithmetic is commonly used in ReedSolomon encoding and decoding. This paper presents the design of a combined 16bit binary and dual Galois field (GF) multiplier. This multiplier is capable of performing either a 16bit two's complement or unsigned multiplication, or two independent 8bit GF(2 ) multiplications in SIMD fashion. The combined multiplier is designed by modifying a conventional binary tree multiplier. It uses a novel wiring methodology to provide two simultaneous GF(2 a minor impact on area and delay. Three alternatives for the multiplier design are presented. Area and delay estimates indicate that compared to a conventional binary tree multiplier, the combined multiplier has roughly 6% more delay and 23% more area.
Combined Multiplication and SumofSquares Units
 in Proceedings of the IEEE International Conference on ApplicationSpecific Systems, Architectures, and Processors
, 2003
"... Multiplication and squaring are important operations in digital signal processing and multimedia applications. This paper presents designs for units that implement either multiplication, A × B, or sumofsquares computations, A 2 + B 2, based on an input control signal. Compared to conventional para ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Multiplication and squaring are important operations in digital signal processing and multimedia applications. This paper presents designs for units that implement either multiplication, A × B, or sumofsquares computations, A 2 + B 2, based on an input control signal. Compared to conventional parallel multipliers, these units have a modest increase in area and delay, but allow either multiplication or sumofsquares computations to be performed. Combined multiplication and sumofsquares units for unsigned and two’s complement operands are presented, along with integrated designs that can operate on either unsigned or two’s complement operands. The designs can also be extended to work with a third accumulator operand to compute either Z + A × B or Z + A 2 + B 2. Synthesis results indicate that a combined multiplication and sumofsquares unit for 32bit two’s complement operands can be implemented with roughly 15 % more area and nearly the same worst case delay as a conventional 32bit two’s complement multiplier. 1
Arithmetic, pp. 168174, IEEE Computer Society, 1997. [41] M. J. Schulte and E. E. Swartzlander, "Hardware Designs for Exactly Rounded Elementary Functions,"
"... 9, IEEE Computer Society, 1993. 5 [27] J. Fandrianto, "Algorithm for High Speed Shared Radix 4 Division and Radix 4 SquareRoot, " in Proc. 8th IEEE Symposium on Computer Arithmetic, pp. 7379, IEEE Computer Society, 1987. [28] C. V. Ramamoorthy, J. R. Goodman, and K. H. Kim, "Some Properties of ..."
Abstract
 Add to MetaCart
9, IEEE Computer Society, 1993. 5 [27] J. Fandrianto, "Algorithm for High Speed Shared Radix 4 Division and Radix 4 SquareRoot, " in Proc. 8th IEEE Symposium on Computer Arithmetic, pp. 7379, IEEE Computer Society, 1987. [28] C. V. Ramamoorthy, J. R. Goodman, and K. H. Kim, "Some Properties of Iterative SquareRooting Methods Using HighSpeed Multiplication," IEEE Transactions on Computers, vol. C21, pp. 837847, 1972. [29] M. J. Flynn, "On Division by Functional Iteration," IEEE Transactions on Computers, vol. C19, pp. 702706, 1970. [30] S. Oberman and M. Flynn, "Division Algorithms and Implementations," ieeetc, vol. C46, pp. 833854, August 1997. [31] P. Soderquist and M. Leeser, "An Area/performance Comparison of Subtractive and Multiplicative Divide/Square Root Implementations," in Proc. 12th IEEE Symposium on Computer Arithmetic (S. Knowles and W. H. McAllister, eds.), IEEE Computer Society,
Efficient Integer Multiplication Overflow Detection Circuits
"... Multiplication of two nbit integers produces a 2nbit product. To allow the result to be stored in the same format as the inputs, many processors return the n least significant bits of the product and an overflow flag. This paper describes methods for integer multiplication with overflow detection ..."
Abstract
 Add to MetaCart
Multiplication of two nbit integers produces a 2nbit product. To allow the result to be stored in the same format as the inputs, many processors return the n least significant bits of the product and an overflow flag. This paper describes methods for integer multiplication with overflow detection for unsigned and two's complement numbers. A method for combining unsigned and two's complement integer multiplication with overflow detection is also presented. The overflow detection circuits presented in this paper have O(n) gates and O(log(n)) delay, which makes them more efficient than previous overflow detection circuits.