Results 1  10
of
12
Reduced Power Dissipation Through Truncated Multiplication
 in IEEE Alessandro Volta Memorial Workshop on Low Power Design
, 1999
"... Reducing the power dissipation of parallel multipliers is important in the design of digital signal processing systems. In many of these systems, the products of parallel multipliers are rounded to avoid growth in word size. The power dissipation and area of rounded parallel multipliers can be signi ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
Reducing the power dissipation of parallel multipliers is important in the design of digital signal processing systems. In many of these systems, the products of parallel multipliers are rounded to avoid growth in word size. The power dissipation and area of rounded parallel multipliers can be significantly reduced by a technique known as truncated multiplication. With this technique, the least significant columns of the multiplication matrix are not used. Instead, the carries generated by these columns are estimated. This estimate is added with the most significant columns to produce the rounded product. This paper presents the design and implementation of parallel truncated multipliers. Simulations indicate that truncated parallel multipliers dissipate between 29 and 40 percent less power than standard parallel multipliers for operand sizes of 16 and 32 bits. 1: Introduction Highspeed parallel multipliers are fundamental building blocks in digital signal processing systems [1]. In...
VariablePrecision, Interval Arithmetic Processors
"... This chapter presents the design and analysis of variableprecision, interval arithmetic processors. The processors give the user the ability to specify the precision of the computation, determine the accuracy of the results, and recompute inaccurate results with higher precision. The processors sup ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
This chapter presents the design and analysis of variableprecision, interval arithmetic processors. The processors give the user the ability to specify the precision of the computation, determine the accuracy of the results, and recompute inaccurate results with higher precision. The processors support a wide variety of arithmetic operations on variableprecision floating point numbers and intervals. Efficient hardware algorithms and specially designed functional units increase the speed, accuracy, and reliability of numerical computations. Area and delay estimates indicate that the processors can be implemented with areas and cycle times that are comparable to conventional IEEE doubleprecision floating point coprocessors. Execution time estimates indicate that the processors are two to three orders of magnitude faster than a conventional software package for variableprecision, interval arithmetic. 1.1 INTRODUCTION Floating point arithmetic provides a highspeed method for perform...
Integer Multiplication with Overflow Detection or Saturation
 IEEE Transactions on Computers
, 2000
"... AbstractÐHighspeed multiplication is frequently used in generalpurpose and applicationspecific computer systems. These systems often support integer multiplication, where two nbit integers are multiplied to produce a 2nbit product. To prevent growth in word length, processors typically return t ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
AbstractÐHighspeed multiplication is frequently used in generalpurpose and applicationspecific computer systems. These systems often support integer multiplication, where two nbit integers are multiplied to produce a 2nbit product. To prevent growth in word length, processors typically return the n least significant bits of the product and a flag that indicates whether or not overflow has occurred. Alternatively, some processors saturate results that overflow to the most positive or most negative representable number. This paper presents efficient methods for performing unsigned or two's complement integer multiplication with overflow detection or saturation. These methods have significantly less area and delay than conventional methods for integer multiplication with overflow detection or saturation.
A Combined Interval and Floating Point Multiplier
 In Proceedings of the 8th Great Lakes Symposium on VLSI (Los Alamitos, CA
, 1998
"... Interval arithmetic provides an efficient method for monitoring and controlling errors in numerical calculations. However, existing software packages for interval arithmetic are often too slow for numerically intensive computations. This paper presents the design of a multiplier that performs either ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Interval arithmetic provides an efficient method for monitoring and controlling errors in numerical calculations. However, existing software packages for interval arithmetic are often too slow for numerically intensive computations. This paper presents the design of a multiplier that performs either interval or floating point multiplication. This multiplier requires only slightly more area and delay than a conventional floating point multiplier, and is one to two orders of magnitude faster than software implementations of interval multiplication. 1 Introduction The performance of conventional microprocessors currently increases at a rate of approximately 55 percent per year and is expected to increase by a factor of 50 over the next ten years [1]. This rapid increase in computing power has led to a greater reliance on results produced by computer simulation and modeling. Although many areas depend on computer generated results for reliable information, roundoff error and catastrophic...
HighSpeed Inverse Square Roots
 Proceedings of the 14th IEEE Symposium on Computer Arithmetic
, 1999
"... Inverse square roots are used in several digital signal processing, multimedia, and scientific computing applications. This paper presents a highspeed method for computing inverse square roots. This method uses a table lookup, operand modification, and multiplication to obtain an initial approximat ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Inverse square roots are used in several digital signal processing, multimedia, and scientific computing applications. This paper presents a highspeed method for computing inverse square roots. This method uses a table lookup, operand modification, and multiplication to obtain an initial approximation to the inverse square root. This is followed by a modified NewtonRaphson iteration, consisting of one square, one multiplycomplement, and one multiplyadd operation. The initial approximation and NewtonRaphson iteration employ specialized hardware to reduce the delay, area, and power dissipation. Application of this method is illustrated through the design of an inverse square root unit for operands in the IEEE single precision format. An implementation of this unit with a 4layer metal, 2.5 Volt, 0.25 micron CMOS standard cell library has a cycle time of 6.7 ns, an area of 0.41 mm 2 , a latency of five cycles, and a throughput of one result per cycle. 1. Introduction Square roots a...
Combined Unsigned and Two's Complement Squarers
 in Conference Record of the Thirty Third Asilomar Conference on Signals, Systems, and Computers
, 1999
"... Squaring is an important operation in digital signal processing applications. For several applications, a significant reduction in area, delay, and power consumption is achieved by performing squaring using specialized squarers, instead of multipliers. Although most previous research on parallel squ ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Squaring is an important operation in digital signal processing applications. For several applications, a significant reduction in area, delay, and power consumption is achieved by performing squaring using specialized squarers, instead of multipliers. Although most previous research on parallel squarers focuses on the design of unsigned squarers, squaring of two's complement numbers is also often required. This paper presents the design of parallel squarers that perform either unsigned or two's complement squaring, based on an input control signal. Compared to unsigned parallel squarers, these squarers require only a small amount of additional delay and area. 1. Introduction Squaring is often required in digital signal processing applications, such as adaptive filtering [1], image compression [2], Euclidean branch calculation [3], pattern recognition [4], equalization [5], and decoding and demodulation [6], [7]. Squaring is also used to implement multipliers using the quarter square ...
Integer Multiplication With Overflow Detection Or Saturation
 Master's thesis, Lehigh University, 19 Memorial Dr
, 2000
"... 1 1 Introduction 2 1.1 Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Overflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Saturation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Thesis Overview . . . . . . ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
1 1 Introduction 2 1.1 Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Overflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Saturation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Previous Research 6 2.1 Unsigned Parallel Multipliers . . . . . . . . . . . . . . . . . . . . . . 6 iv 2.1.1 Unsigned Array Multipliers . . . . . . . . . . . . . . . . . . . 7 2.1.2 Unsigned Tree Multipliers . . . . . . . . . . . . . . . . . . . . 11 2.2 Two's Complement Multipliers . . . . . . . . . . . . . . . . . . . . . . 14 2.2.1 Two's Complement Array Multipliers . . . . . . . . . . . . . . 16 2.2.2 Two's Complement Tree Multipliers . . . . . . . . . . . . . . . 19 3 Overflow Detection and Saturation for Unsigned Integer Multiplication 21 3.1 General Design Approach . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2 Unsigned Arr...
Transitionactivity aware design of reductionstages for parallel multipliers
 in Proc. of Great Lakes Symposium on VLSI
, 2007
"... We propose an interconnect reorganization algorithm for reduction stages in parallel multipliers. It aims at minimizing power consumption for given static probabilities at the primary inputs. In typical signal processing applications the transition probability varies between the most and least signi ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We propose an interconnect reorganization algorithm for reduction stages in parallel multipliers. It aims at minimizing power consumption for given static probabilities at the primary inputs. In typical signal processing applications the transition probability varies between the most and least significant bits. The same is the case for individual signals within the multiplier. Our interconnect reorganization exploits this to reduce the overall switching activity, thus reducing the multiplier’s power consumption. We have developed a CAD tool that reorganizes the connections within the multiplier architecture in an optimized way. Since the applied heuristic requires power estimation, we have also developed a very fast estimator fine tuned for parallel multipliers. The CAD tool automatically generates gatelevel VHDL code for the optimized multipliers. This code and code for unoptimized multipliers have been compared using state of the art power estimation tools. The reduction in power consumption ranges from 7 % up to 23 % and can be achieved without any noticeable overhead in performance and area.
Combined Multiplication and SumofSquares Units
 in Proceedings of the IEEE International Conference on ApplicationSpecific Systems, Architectures, and Processors
, 2003
"... Multiplication and squaring are important operations in digital signal processing and multimedia applications. This paper presents designs for units that implement either multiplication, A × B, or sumofsquares computations, A 2 + B 2, based on an input control signal. Compared to conventional para ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Multiplication and squaring are important operations in digital signal processing and multimedia applications. This paper presents designs for units that implement either multiplication, A × B, or sumofsquares computations, A 2 + B 2, based on an input control signal. Compared to conventional parallel multipliers, these units have a modest increase in area and delay, but allow either multiplication or sumofsquares computations to be performed. Combined multiplication and sumofsquares units for unsigned and two’s complement operands are presented, along with integrated designs that can operate on either unsigned or two’s complement operands. The designs can also be extended to work with a third accumulator operand to compute either Z + A × B or Z + A 2 + B 2. Synthesis results indicate that a combined multiplication and sumofsquares unit for 32bit two’s complement operands can be implemented with roughly 15 % more area and nearly the same worst case delay as a conventional 32bit two’s complement multiplier. 1
Power Optimized Partial Product Reduction Interconnect Ordering in Parallel Multipliers
"... Abstract — When designing the reduction tree of a parallel multiplier, we can exploit a large intrinsic freedom for the interconnection order of partial products. The transition activities vary significantly for different internal partial products. In this work we propose a method for generation of ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract — When designing the reduction tree of a parallel multiplier, we can exploit a large intrinsic freedom for the interconnection order of partial products. The transition activities vary significantly for different internal partial products. In this work we propose a method for generation of powerefficient parallel multipliers in such a way that its partial products are connected to minimize activity. The reduction tree is designed progressively. A Simulated Annealing optimizer uses power cost numbers from a specially implemented probabilistic gatelevel power estimator and selects a powerefficient solution for each stage of the reduction tree. VHDL simulation using ModelSim shows a significant reduction in the overall number of transitions. This reduction ranges from 15 % up to 32 % compared to randomly generated reduction trees and is achieved without any noticeable area or performance overhead. I.