Results 1  10
of
11
Optimal Lefttoright Binary SignedDigit Recoding
, 2000
"... This paper describes new methods for producing optimal binary signeddigit representations. This can be useful in the fast computation of exponentiations. Contrary to existing algorithms, the digits are scanned from left to right (i.e., from the most significant position to the least significant ..."
Abstract

Cited by 34 (3 self)
 Add to MetaCart
This paper describes new methods for producing optimal binary signeddigit representations. This can be useful in the fast computation of exponentiations. Contrary to existing algorithms, the digits are scanned from left to right (i.e., from the most significant position to the least significant position). This may lead to better performances in both hardware and software.
VLSI Implementation of Discrete Wavelet Transform
, 1996
"... This paper presents a VLSI implementation of discrete wavelet transform (DWT). The architecture is simple, modular, and cascadable for computation of one, or multidimensional DWT. It comprises of four basic units: input delay, filter, register bank, and control unit. The proposed architecture is sy ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
This paper presents a VLSI implementation of discrete wavelet transform (DWT). The architecture is simple, modular, and cascadable for computation of one, or multidimensional DWT. It comprises of four basic units: input delay, filter, register bank, and control unit. The proposed architecture is systolic in nature and performs both highpass and lowpass coefficient calculations with only one set of multipliers. In addition, it requires a small onchip interface circuitry for interconnection to a standard communication bus. A detailed analysis of the effect of finite precision of data and wavelet filter coefficients on the accuracy of the DWT coefficients is presented. The architecture has been simulated in VLSI and has a hardware utilization efficiency of 87.5%. Being systolic in nature, the architecture can compute DWT at a data rate of N × 10 6 samples/sec corresponding to a clock speed of N MHz. 1. Introduction In the last decade, there has been an enormous increase in the appl...
Reduced Power Dissipation Through Truncated Multiplication
 in IEEE Alessandro Volta Memorial Workshop on Low Power Design
, 1999
"... Reducing the power dissipation of parallel multipliers is important in the design of digital signal processing systems. In many of these systems, the products of parallel multipliers are rounded to avoid growth in word size. The power dissipation and area of rounded parallel multipliers can be signi ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
Reducing the power dissipation of parallel multipliers is important in the design of digital signal processing systems. In many of these systems, the products of parallel multipliers are rounded to avoid growth in word size. The power dissipation and area of rounded parallel multipliers can be significantly reduced by a technique known as truncated multiplication. With this technique, the least significant columns of the multiplication matrix are not used. Instead, the carries generated by these columns are estimated. This estimate is added with the most significant columns to produce the rounded product. This paper presents the design and implementation of parallel truncated multipliers. Simulations indicate that truncated parallel multipliers dissipate between 29 and 40 percent less power than standard parallel multipliers for operand sizes of 16 and 32 bits. 1: Introduction Highspeed parallel multipliers are fundamental building blocks in digital signal processing systems [1]. In...
Design Issues In High Performance Floating Point Arithmetic Units
, 1996
"... In recent years computer applications have increased in their computational complexity. The industrywide usage of performance benchmarks, such as SPECmarks, forces processor designers to pay particular attention to implementation of the floating point unit, or FPU. Special purpose applications, suc ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
In recent years computer applications have increased in their computational complexity. The industrywide usage of performance benchmarks, such as SPECmarks, forces processor designers to pay particular attention to implementation of the floating point unit, or FPU. Special purpose applications, such as high performance graphics rendering systems, have placed further demands on processors. High speed floating point hardware is a requirement to meet these increasing demands. This work examines the stateoftheart in FPU design and proposes techniques for improving the performance and the performance/area ratio of future FPUs. In recent FPUs, emphasis has been placed on designing everfaster adders and multipliers, with division receiving less attention. The design space of FP dividers is large, comprising five different classes of division algorithms: digit recurrence, functional iteration, very high radix, table lookup, and variable latency. While division is an infrequent operation...
Design and Implementation of a 16 by 16 LowPower Two's Complement Multiplier
 in Proc. 2000 IEEE Int. Symp. Circuits and Systems
, 2000
"... This paper describes the design and implementation of a highspeed lowpower 16 by 16 two's complement parallel multiplier. The multiplier uses optimized radix4 Booth encoders to generate the partial products, and an array of strategically placed (3,2), (5,3), and (7,4) counters to reduce the parti ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
This paper describes the design and implementation of a highspeed lowpower 16 by 16 two's complement parallel multiplier. The multiplier uses optimized radix4 Booth encoders to generate the partial products, and an array of strategically placed (3,2), (5,3), and (7,4) counters to reduce the partial products to sum and carry vectors. The more significant bits of the product are computed from left to right using a modified ErcegovacLang converter. An implementation of the multiplier in 0.25 m static CMOS technology has an area of 0.126 mm 2 , a measured delay of 4.39 ns, and a average power dissipation of 0.110 mW/MHz at 2.5 Volts and 100 ffi C. I.
Integer Multiplication with Overflow Detection or Saturation
 IEEE Transactions on Computers
, 2000
"... AbstractÐHighspeed multiplication is frequently used in generalpurpose and applicationspecific computer systems. These systems often support integer multiplication, where two nbit integers are multiplied to produce a 2nbit product. To prevent growth in word length, processors typically return t ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
AbstractÐHighspeed multiplication is frequently used in generalpurpose and applicationspecific computer systems. These systems often support integer multiplication, where two nbit integers are multiplied to produce a 2nbit product. To prevent growth in word length, processors typically return the n least significant bits of the product and a flag that indicates whether or not overflow has occurred. Alternatively, some processors saturate results that overflow to the most positive or most negative representable number. This paper presents efficient methods for performing unsigned or two's complement integer multiplication with overflow detection or saturation. These methods have significantly less area and delay than conventional methods for integer multiplication with overflow detection or saturation.
Mixed Swing Techniques for Low Energy/Operation Datapath Circuits
, 1997
"... The portable communications industry’s vision of integrating a complete multimedia complex on a single die, coupled with the desktop computing industry’s vision of integrating multimedia functionality into generalpurpose microprocessors has transformed lowering the power dissipation of digital si ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
The portable communications industry’s vision of integrating a complete multimedia complex on a single die, coupled with the desktop computing industry’s vision of integrating multimedia functionality into generalpurpose microprocessors has transformed lowering the power dissipation of digital signal processing (DSP) datapath circuits into an increasingly important challenge in current and future fabrication processes. Fullystatic CMOS logic accompanied with supply voltage scaling has enjoyed widespread usage in lowering datapath power dissipation over the last decade. However, fundamental limitations preclude device threshold voltage scaling under the constant drainsource field scaling paradigm in future deepsubmicron processes, imposing limitations on voltage scaling. This has motivated a strong necessity for exploring new methodologies to lower the power dissipation of nextgeneration highspeed datapath circuits. This thesis investigates Mixed Swing techniques for reducing the power dissipation of static CMOS datapath operators while retaining their high performance, or
Technology Scaling Effects on Multipliers
 IEEE Trans
, 1996
"... Since integrated circuits were invented, fabrication engineers have been able to steadily decrease the dimensions of the devices (transistors). These reductions in the minimum feature sizes have resulted in improved performance. In addition, the dimensions of the interconnect used to connect the act ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Since integrated circuits were invented, fabrication engineers have been able to steadily decrease the dimensions of the devices (transistors). These reductions in the minimum feature sizes have resulted in improved performance. In addition, the dimensions of the interconnect used to connect the active transistors have also scaled. The decreasing dimensions of the physical devices causes the capacitance and resistances of the different parts of the multiplier to change. Therefore the relative delay due to each part of the multiplier changes. In addition the different encoding schemes used to generate the partial products and the different topologies used in the reduction of the partial products effect the total latency of the multiplier. This paper examines the effects of the smaller device dimensions on multipliers. It will show that interconnect is becoming more important and that generating the partial products using an procedure provides the minimum latency for small feature size...
Integer Multiplication With Overflow Detection Or Saturation
 Master's thesis, Lehigh University, 19 Memorial Dr
, 2000
"... 1 1 Introduction 2 1.1 Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Overflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Saturation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Thesis Overview . . . . . . ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
1 1 Introduction 2 1.1 Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Overflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Saturation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Previous Research 6 2.1 Unsigned Parallel Multipliers . . . . . . . . . . . . . . . . . . . . . . 6 iv 2.1.1 Unsigned Array Multipliers . . . . . . . . . . . . . . . . . . . 7 2.1.2 Unsigned Tree Multipliers . . . . . . . . . . . . . . . . . . . . 11 2.2 Two's Complement Multipliers . . . . . . . . . . . . . . . . . . . . . . 14 2.2.1 Two's Complement Array Multipliers . . . . . . . . . . . . . . 16 2.2.2 Two's Complement Tree Multipliers . . . . . . . . . . . . . . . 19 3 Overflow Detection and Saturation for Unsigned Integer Multiplication 21 3.1 General Design Approach . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2 Unsigned Arr...
Combined Unsigned and Two's Complement Saturating Multipliers
, 2000
"... In many digital signal processing and multimedia applications, results that overflow are saturated to the most positive or most negative representable number. This paper presents efficient techniques for performing saturating nbit integer multiplication on unsigned and two's complement numbers. Un ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
In many digital signal processing and multimedia applications, results that overflow are saturated to the most positive or most negative representable number. This paper presents efficient techniques for performing saturating nbit integer multiplication on unsigned and two's complement numbers. Unlike conventional techniques for saturating multiplication, which compute a 2nbit product and then examine the n most significant product bits to determine if overflow has occurred, the techniques presented in this paper compute only the (n + 1) least significant bits of the product. Specialized overflow detection units, which operate in parallel with the multiplier, determine if overflow has occurred and the product should be saturated. These techniques are applied to designs for saturating array multipliers that perform either unsigned or two's complement saturating integer multiplication, based on an input control signal. Compared to array multipliers that use conventional methods for sa...