Results 1 - 10
of
10
Reduced Power Dissipation Through Truncated Multiplication
- in IEEE Alessandro Volta Memorial Workshop on Low Power Design
, 1999
"... Reducing the power dissipation of parallel multipliers is important in the design of digital signal processing systems. In many of these systems, the products of parallel multipliers are rounded to avoid growth in word size. The power dissipation and area of rounded parallel multipliers can be signi ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
Reducing the power dissipation of parallel multipliers is important in the design of digital signal processing systems. In many of these systems, the products of parallel multipliers are rounded to avoid growth in word size. The power dissipation and area of rounded parallel multipliers can be significantly reduced by a technique known as truncated multiplication. With this technique, the least significant columns of the multiplication matrix are not used. Instead, the carries generated by these columns are estimated. This estimate is added with the most significant columns to produce the rounded product. This paper presents the design and implementation of parallel truncated multipliers. Simulations indicate that truncated parallel multipliers dissipate between 29 and 40 percent less power than standard parallel multipliers for operand sizes of 16 and 32 bits. 1: Introduction High-speed parallel multipliers are fundamental building blocks in digital signal processing systems [1]. In...
Parallel Saturating Fractional Arithmetic Units
- IN 9TH GREAT LAKES SYMPOSIUM ON VLSI
, 1999
"... This paper describes the designs of a saturating adder, multiplier, single MAC unit, and dual MAC unit with one cycle latencies. The dual MAC unit can perform two saturating MAC operations in parallel and accumulate the results with saturation. Specialized saturation logic ensures that the output of ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
This paper describes the designs of a saturating adder, multiplier, single MAC unit, and dual MAC unit with one cycle latencies. The dual MAC unit can perform two saturating MAC operations in parallel and accumulate the results with saturation. Specialized saturation logic ensures that the output of the dual MAC unit is identical to the result of the operations performed serially with saturation after each multiplication and each addition 1
Variable-Precision, Interval Arithmetic Processors
"... This chapter presents the design and analysis of variable-precision, interval arithmetic processors. The processors give the user the ability to specify the precision of the computation, determine the accuracy of the results, and recompute inaccurate results with higher precision. The processors sup ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
This chapter presents the design and analysis of variable-precision, interval arithmetic processors. The processors give the user the ability to specify the precision of the computation, determine the accuracy of the results, and recompute inaccurate results with higher precision. The processors support a wide variety of arithmetic operations on variable-precision floating point numbers and intervals. Efficient hardware algorithms and specially designed functional units increase the speed, accuracy, and reliability of numerical computations. Area and delay estimates indicate that the processors can be implemented with areas and cycle times that are comparable to conventional IEEE double-precision floating point coprocessors. Execution time estimates indicate that the processors are two to three orders of magnitude faster than a conventional software package for variable-precision, interval arithmetic. 1.1 INTRODUCTION Floating point arithmetic provides a high-speed method for perform...
Design Tradeoffs Using Truncated Multipliers in Fir Filter Implementations
, 2002
"... This paper presents a general FIR filter architecture utilizing truncated tree multipliers for computation. The average error, maximum error, and variance of error due to truncation are derived for the proposed architecture. A novel technique that reduces the average error of the filter and is indep ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
This paper presents a general FIR filter architecture utilizing truncated tree multipliers for computation. The average error, maximum error, and variance of error due to truncation are derived for the proposed architecture. A novel technique that reduces the average error of the filter and is independent of the number of unformed columns is presented, as well as equations describing the signal-to-noise ratio of the truncation error. A software tool written in Java is described that automatically generates structural VHDL models for specific filters based on this architecture, given parameters such as the number of taps, operand lengths, number of multipliers, and the number of truncated columns. We show that a 22.5% reduction in area can be achieved for a 24-tap filter with 16-bit coe#cients. The ratio of the average error to the full scale value is only 1.4 10 -9 , with only an 8.4 dB reduction in SNR for this implementation.
A Combined Interval and Floating Point Multiplier
- In Proceedings of the 8th Great Lakes Symposium on VLSI (Los Alamitos, CA
, 1998
"... Interval arithmetic provides an efficient method for monitoring and controlling errors in numerical calculations. However, existing software packages for interval arithmetic are often too slow for numerically intensive computations. This paper presents the design of a multiplier that performs either ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Interval arithmetic provides an efficient method for monitoring and controlling errors in numerical calculations. However, existing software packages for interval arithmetic are often too slow for numerically intensive computations. This paper presents the design of a multiplier that performs either interval or floating point multiplication. This multiplier requires only slightly more area and delay than a conventional floating point multiplier, and is one to two orders of magnitude faster than software implementations of interval multiplication. 1 Introduction The performance of conventional microprocessors currently increases at a rate of approximately 55 percent per year and is expected to increase by a factor of 50 over the next ten years [1]. This rapid increase in computing power has led to a greater reliance on results produced by computer simulation and modeling. Although many areas depend on computer generated results for reliable information, roundoff error and catastrophic...
Combined Multiplication and Sum-of-Squares Units
- in Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors
, 2003
"... Multiplication and squaring are important operations in digital signal processing and multimedia applications. This paper presents designs for units that implement either multiplication, A × B, or sum-of-squares computations, A 2 + B 2, based on an input control signal. Compared to conventional para ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Multiplication and squaring are important operations in digital signal processing and multimedia applications. This paper presents designs for units that implement either multiplication, A × B, or sum-of-squares computations, A 2 + B 2, based on an input control signal. Compared to conventional parallel multipliers, these units have a modest increase in area and delay, but allow either multiplication or sum-of-squares computations to be performed. Combined multiplication and sum-of-squares units for unsigned and two’s complement operands are presented, along with integrated designs that can operate on either unsigned or two’s complement operands. The designs can also be extended to work with a third accumulator operand to compute either Z + A × B or Z + A 2 + B 2. Synthesis results indicate that a combined multiplication and sum-of-squares unit for 32-bit two’s complement operands can be implemented with roughly 15 % more area and nearly the same worst case delay as a conventional 32-bit two’s complement multiplier. 1
Combined Unsigned and Two's Complement Saturating Multipliers
, 2000
"... In many digital signal processing and multimedia applications, results that overflow are saturated to the most positive or most negative representable number. This paper presents efficient techniques for performing saturating n-bit integer multiplication on unsigned and two's complement numbers. Un ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In many digital signal processing and multimedia applications, results that overflow are saturated to the most positive or most negative representable number. This paper presents efficient techniques for performing saturating n-bit integer multiplication on unsigned and two's complement numbers. Unlike conventional techniques for saturating multiplication, which compute a 2n-bit product and then examine the n most significant product bits to determine if overflow has occurred, the techniques presented in this paper compute only the (n + 1) least significant bits of the product. Specialized overflow detection units, which operate in parallel with the multiplier, determine if overflow has occurred and the product should be saturated. These techniques are applied to designs for saturating array multipliers that perform either unsigned or two's complement saturating integer multiplication, based on an input control signal. Compared to array multipliers that use conventional methods for sa...
A Combined 16-Bit Binary And Dual Galois Field Multiplier
"... Galois field arithmetic is commonly used in Reed-Solomon encoding and decoding. This paper presents the design of a combined 16-bit binary and dual Galois field (GF) multiplier. This multiplier is capable of performing either a 16-bit two's complement or unsigned multiplication, or two independent 8 ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Galois field arithmetic is commonly used in Reed-Solomon encoding and decoding. This paper presents the design of a combined 16-bit binary and dual Galois field (GF) multiplier. This multiplier is capable of performing either a 16-bit two's complement or unsigned multiplication, or two independent 8-bit GF(2 ) multiplications in SIMD fashion. The combined multiplier is designed by modifying a conventional binary tree multiplier. It uses a novel wiring methodology to provide two simultaneous GF(2 a minor impact on area and delay. Three alternatives for the multiplier design are presented. Area and delay estimates indicate that compared to a conventional binary tree multiplier, the combined multiplier has roughly 6% more delay and 23% more area.
Arithmetic, pp. 168--174, IEEE Computer Society, 1997. [41] M. J. Schulte and E. E. Swartzlander, "Hardware Designs for Exactly Rounded Elementary Functions,"
"... -9, IEEE Computer Society, 1993. 5 [27] J. Fandrianto, "Algorithm for High Speed Shared Radix 4 Division and Radix 4 SquareRoot, " in Proc. 8th IEEE Symposium on Computer Arithmetic, pp. 73--79, IEEE Computer Society, 1987. [28] C. V. Ramamoorthy, J. R. Goodman, and K. H. Kim, "Some Properties of ..."
Abstract
- Add to MetaCart
-9, IEEE Computer Society, 1993. 5 [27] J. Fandrianto, "Algorithm for High Speed Shared Radix 4 Division and Radix 4 SquareRoot, " in Proc. 8th IEEE Symposium on Computer Arithmetic, pp. 73--79, IEEE Computer Society, 1987. [28] C. V. Ramamoorthy, J. R. Goodman, and K. H. Kim, "Some Properties of Iterative Square-Rooting Methods Using High-Speed Multiplication," IEEE Transactions on Computers, vol. C-21, pp. 837--847, 1972. [29] M. J. Flynn, "On Division by Functional Iteration," IEEE Transactions on Computers, vol. C-19, pp. 702--706, 1970. [30] S. Oberman and M. Flynn, "Division Algorithms and Implementations," ieeetc, vol. C46, pp. 833--854, August 1997. [31] P. Soderquist and M. Leeser, "An Area/performance Comparison of Subtractive and Multiplicative Divide/Square Root Implementations," in Proc. 12th IEEE Symposium on Computer Arithmetic (S. Knowles and W. H. McAllister, eds.), IEEE Computer Society,
Efficient Integer Multiplication Overflow Detection Circuits
"... Multiplication of two n-bit integers produces a 2n-bit product. To allow the result to be stored in the same format as the inputs, many processors return the n least significant bits of the product and an overflow flag. This paper describes methods for integer multiplication with overflow detection ..."
Abstract
- Add to MetaCart
Multiplication of two n-bit integers produces a 2n-bit product. To allow the result to be stored in the same format as the inputs, many processors return the n least significant bits of the product and an overflow flag. This paper describes methods for integer multiplication with overflow detection for unsigned and two's complement numbers. A method for combining unsigned and two's complement integer multiplication with overflow detection is also presented. The overflow detection circuits presented in this paper have O(n) gates and O(log(n)) delay, which makes them more efficient than previous overflow detection circuits.

