Results 1  10
of
28
Reduced Power Dissipation Through Truncated Multiplication
 in IEEE Alessandro Volta Memorial Workshop on Low Power Design
, 1999
"... Reducing the power dissipation of parallel multipliers is important in the design of digital signal processing systems. In many of these systems, the products of parallel multipliers are rounded to avoid growth in word size. The power dissipation and area of rounded parallel multipliers can be signi ..."
Abstract

Cited by 26 (7 self)
 Add to MetaCart
(Show Context)
Reducing the power dissipation of parallel multipliers is important in the design of digital signal processing systems. In many of these systems, the products of parallel multipliers are rounded to avoid growth in word size. The power dissipation and area of rounded parallel multipliers can be significantly reduced by a technique known as truncated multiplication. With this technique, the least significant columns of the multiplication matrix are not used. Instead, the carries generated by these columns are estimated. This estimate is added with the most significant columns to produce the rounded product. This paper presents the design and implementation of parallel truncated multipliers. Simulations indicate that truncated parallel multipliers dissipate between 29 and 40 percent less power than standard parallel multipliers for operand sizes of 16 and 32 bits. 1: Introduction Highspeed parallel multipliers are fundamental building blocks in digital signal processing systems [1]. In...
Viredaz “BitSerial Multipliers and Squarers
 IEEE Transactions on Computers
, 1994
"... ..."
(Show Context)
A Compact HighSpeed (31,5) Parallel Counter Circuit Based on Capacitive ThresholdLogic Gates
 IEEE Journal of SolidState Circuits
, 1996
"... A novel highspeed circuit implementation of the (31,5)parallel counter (i.e., population counter) based on capacitive threshold logic (CTL) is presented. The circuit consists of 20 threshold logic gates arranged in two stages, i.e., the parallel counter described here has an effective logic depth ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
(Show Context)
A novel highspeed circuit implementation of the (31,5)parallel counter (i.e., population counter) based on capacitive threshold logic (CTL) is presented. The circuit consists of 20 threshold logic gates arranged in two stages, i.e., the parallel counter described here has an effective logic depth of two. The chargebased CTL gates are essentially dynamic circuits which require a periodic refresh or precharge cycle, but unlike conventional dynamic CMOS gates, the circuit can be operated in synchronous as well as in asynchronous mode. The counter circuit is implemented using conventional 1.2 ¯m doublepoly CMOS technology, and it occupies a silicon area of about 0.08 mm 2 : Extensive postlayout simulations indicate that the circuit has a typical inputtooutput propagation delay of less than 3 ns, and the test circuit is shown to operate reliably when consecutive 31b input vectors are applied at a rate of up to 16 Mvectors/s. With its demonstrated data processing capability of abou...
Automatic synthesis of compressor trees: reevaluating large counters
 Design Automation and Test in Europe (DATE ’07
"... Despite the progress of the last decades in electronic design automation, arithmetic circuits have always received way less attention than other classes of digital circuits. Logic synthesisers, which play a fundamental role in design today, play a minor role on most arithmetic circuits, performing s ..."
Abstract

Cited by 12 (9 self)
 Add to MetaCart
(Show Context)
Despite the progress of the last decades in electronic design automation, arithmetic circuits have always received way less attention than other classes of digital circuits. Logic synthesisers, which play a fundamental role in design today, play a minor role on most arithmetic circuits, performing some local optimisations but hardly improving the overall structure of arithmetic components. Architectural optimisations have been often studied manually, and only in the case of very common building blocks such as fast adders and multiinput adders, adhoc techniques have been developed. A notable case is multiinput addition, which is the core of many circuits such as multipliers, etc. The most common technique to implement multiinput addition is using compressor trees, which are often composed of carrysave adders (based on (3: 2) counters, i.e., full adders). A large body of literature exists to implement compressor trees using large counters. However, all the large counters were built by using full and half adders recursively. In this paper we give some definite answers to issues related to the use of large counters. We present a general technique to implement large counters whose performance is much better than the ones composed of full and half adders. Also we show that it is not always useful to use larger optimised counters and sometimes a combination of various size counters gives the best performance. Our results show 15 % improvement in the critical path delay. In some cases even hardware area is reduced by using our counters. 1.
Design and Implementation of a 16 by 16 LowPower Two's Complement Multiplier
 in Proc. 2000 IEEE Int. Symp. Circuits and Systems
, 2000
"... This paper describes the design and implementation of a highspeed lowpower 16 by 16 two's complement parallel multiplier. The multiplier uses optimized radix4 Booth encoders to generate the partial products, and an array of strategically placed (3,2), (5,3), and (7,4) counters to reduce the ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
This paper describes the design and implementation of a highspeed lowpower 16 by 16 two's complement parallel multiplier. The multiplier uses optimized radix4 Booth encoders to generate the partial products, and an array of strategically placed (3,2), (5,3), and (7,4) counters to reduce the partial products to sum and carry vectors. The more significant bits of the product are computed from left to right using a modified ErcegovacLang converter. An implementation of the multiplier in 0.25 m static CMOS technology has an area of 0.126 mm 2 , a measured delay of 4.39 ns, and a average power dissipation of 0.110 mW/MHz at 2.5 Volts and 100 ffi C. I.
Efficient Hamming Weight Comparators for Binary Vectors Based on Accumulative and Up/Down Parallel Counters
"... Abstract—New countingbased methods for comparing the Hamming weight of a binary vector with a constant, as well as comparing the Hamming weights of two input vectors, are proposed. It is shown that the proposed comparators are faster and simpler, both in asymptotic sense and for moderate vector len ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
Abstract—New countingbased methods for comparing the Hamming weight of a binary vector with a constant, as well as comparing the Hamming weights of two input vectors, are proposed. It is shown that the proposed comparators are faster and simpler, both in asymptotic sense and for moderate vector lengths, compared with the best available fully digital designs. These speed and cost advantages result from a more efficient population counting, as well as the merger of counting and comparison operations, via accumulative and up/down parallel counters. Index Terms—Column compression, comparator, Hamming distance, multioperand addition, parallel counter, population count. I.
DepthEfficient Threshold Circuits for Multiplication and Symmetric Function Computation
 Proc. Int ‘1 Computing and Combinatorics Con&, LNCS
, 1996
"... . The multiplication operation and the computation of symmetric functions are fundamental problems in arithmetic and algebraic computation. We describe unitweight threshold circuits to perform the multiplication of two nbit integers, which have fanin k, edge complexity O(n 2+1=d ), and depth O ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
. The multiplication operation and the computation of symmetric functions are fundamental problems in arithmetic and algebraic computation. We describe unitweight threshold circuits to perform the multiplication of two nbit integers, which have fanin k, edge complexity O(n 2+1=d ), and depth O(log d + log n= log k), for any fixed integer d ? 0. For a given fanin, our constructions have considerably smaller depth (or edge complexity) than the best previous circuits of similar edge complexity (or depth, respectively). Similar results are also shown for the iterated addition operation and the computation of symmetric functions. In particular, we propose a unitweight threshold circuit to compute the sum of m nbit numbers that has fanin k, edge complexity O(nm 1+1=d ), and depth O(log d + log m= log k + log n= log k). 1 Introduction The delay required to perform multiplication and iterated addition is crucial to the performance of many computationally intensive applications. T...
The amalgam compiler infrastructure
, 2004
"... To my mother and father, for their love, support, and guidance iii ACKNOWLEDGMENTS First and foremost, I would like to thank my adviser, Professor Nick Carter, for his guidance and support during this endeavor. Without his encouragement and ability to help me step back and look at the big picture, t ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
To my mother and father, for their love, support, and guidance iii ACKNOWLEDGMENTS First and foremost, I would like to thank my adviser, Professor Nick Carter, for his guidance and support during this endeavor. Without his encouragement and ability to help me step back and look at the big picture, this thesis would not have been possible. Next, I would like to thank Derek Gottlieb and Josh Walstrom for the numerous stimulating technical discussions and for the simulation infrastructure used to gather the results for this thesis. To both Lee Baugh and Brian Greskamp, thank you for the innumerable discussions on the intermediate program representation that were invaluable to this thesis. Thanks also to ChiWei Wang and Chris Grier for your contributions to the benchmark suite. Many thanks to my brother, Steve, who spent far too many hours reviewing the text of this thesis. I also wish to thank my parents. Your years of support and encouragement have not gone unnoticed. Lastly, to my love, Tara, thank you for your unconditional love and daily support.
Saturating counters: Application and Design Alternatives
 Proc. of the 16th IEEE Symp. on Computer Arithmetic
, 2003
"... We define a new class of parallel counters, Saturating Counters, which provide the exact count of the inputs that are 1 only if this count is below a given threshold. Such counters are useful in, for example, a selftest and repair unit for embedded memories in a systemonachip. We describe this a ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We define a new class of parallel counters, Saturating Counters, which provide the exact count of the inputs that are 1 only if this count is below a given threshold. Such counters are useful in, for example, a selftest and repair unit for embedded memories in a systemonachip. We describe this application and present several alternatives for the design of the saturating counter. We then compare the delay and area of the proposed design alternatives. 1.
Optimaldepth threshold circuits for multiplication and related problems
 Proc. 33th Asilomar Conf. Signals, Systems, and Computers
, 1999
"... Multiplication is one of the most fundamental operations in arithmetic and algebraic computations. In this paper, we present depthoptimal circuits for performing multiplication, multioperand addition, and symmetric function evaluation with small size and restricted fanin. In particular, we show th ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Multiplication is one of the most fundamental operations in arithmetic and algebraic computations. In this paper, we present depthoptimal circuits for performing multiplication, multioperand addition, and symmetric function evaluation with small size and restricted fanin. In particular, we show that the product of two nbit numbers can be computed using a unitweight threshold circuit of fanink, depth 3 logk n + log2 d log2(1+&)1 lo&) + O(l), and edge complexity O(n2f’/d log(d + l)), for any integer d> 0. All the circuits proposed in this paper have constant depth when logkn is a constant and are depthoptimal within small constant factors for any fanin k. 1