Results 11  20
of
167
Signed Binary Addition Circuitry with Inherent Even Parity Outputs
, 1997
"... A signed binary (SB) addition circuit is presented that always produces an even parity representation of the sum word. The novelty of this design is that no extra check bits are generated or used. The redundancy inherent in a SB representation is further exploited to contain parity information. ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
A signed binary (SB) addition circuit is presented that always produces an even parity representation of the sum word. The novelty of this design is that no extra check bits are generated or used. The redundancy inherent in a SB representation is further exploited to contain parity information.
Polynomial Formal Verification of Multipliers
, 1997
"... Until recently, verifying multipliers with formal methods was not feasible, even for small input word sizes. About two years ago, a new data structure, called Multiplicative Binary Moment Diagram (*BMD), was introduced for representing arithmetic functions over Boolean variables. Based on this data ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
Until recently, verifying multipliers with formal methods was not feasible, even for small input word sizes. About two years ago, a new data structure, called Multiplicative Binary Moment Diagram (*BMD), was introduced for representing arithmetic functions over Boolean variables. Based on this data structure, methods were proposed by which verification of multipliers with input word sizes of up to 256 bits became now feasible. Only experimental data has been provided for these verification methods until now. In this paper we give a formal proof that logic verification using *BMDs is polynomially bounded in both space and time, when applied to the class of Wallacetree like multipliers.
LayoutAware Synthesis of Arithmetic Circuits
 Design Automation Conference (DAC) , 2002. Proceedings. 39th
, 2002
"... In deep submicron (DSM) technology, wires are equally or more important than logic components since wirerelated problems such as crosstalk, noise are much critical in systemonchip (SoC) design. Recently, a method [12] for generating a partial product reduction tree (PPRT) with optimaltiming usi ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
In deep submicron (DSM) technology, wires are equally or more important than logic components since wirerelated problems such as crosstalk, noise are much critical in systemonchip (SoC) design. Recently, a method [12] for generating a partial product reduction tree (PPRT) with optimaltiming using bitlevel adders to implement arithmetic circuits, which outperforms the current best designs, is proposed. However, in the conventional approaches including [12], interconnects are not primary components to be optimized in the synthesis of arithmetic circuits, mainly due to its high integration complexity or unpredictable wire effects, thereby resulting in unsatisfactory layout results with long and messed wire connections. To overcome the limitation, we propose a new module generation/synthesis algorithm for arithmetic circuits utilizing carrysaveadder (CSA) modules, which not only optimizes the circuit timing but also generates a much regular interconnect topology of the final circuits. Specifically, we propose a twostep algorithm: (Phase 1: CSA module generation) we propose an optimaltiming CSA module generation algorithm for an arithmetic expression under a general CSA timing model;(Phase 2: Bitlevel interconnect refinements) we optimally refine the interconnects between the CSA modules while retaining the global CSAtree structure produced by Phase 1. It is shown that the timing of the circuits produced by our approach is equal or almost close to that by [12] in most testcases (even without including the interconnect delay), and at the same time, the interconnects in layout are significantly short and regular.
Vlsi Architecture For Datapath Integration Of Arithmetic Over GF(2^m) On Digital Signal Processors
 in Proc. IEEE ICASSP'97
, 1997
"... This paper examines the implementation of Finite Field arithmetic, i.e. multiplication, division, and exponentiation, for any standard basis GF(2 ) with m8 on a DSP datapath. We introduce an opportunity to exploit cells and the interconnection structure of a typical binary multiplier unit for the ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
This paper examines the implementation of Finite Field arithmetic, i.e. multiplication, division, and exponentiation, for any standard basis GF(2 ) with m8 on a DSP datapath. We introduce an opportunity to exploit cells and the interconnection structure of a typical binary multiplier unit for the Finite Field operations by adding just a small overhead of logic. We develop division and exponentiation based on multiplication on the algorithm level and present a simple scheme for implementation of all operations on a processor datapath.
A Compact HighSpeed (31,5) Parallel Counter Circuit Based on Capacitive ThresholdLogic Gates
 IEEE Journal of SolidState Circuits
, 1996
"... A novel highspeed circuit implementation of the (31,5)parallel counter (i.e., population counter) based on capacitive threshold logic (CTL) is presented. The circuit consists of 20 threshold logic gates arranged in two stages, i.e., the parallel counter described here has an effective logic depth ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
A novel highspeed circuit implementation of the (31,5)parallel counter (i.e., population counter) based on capacitive threshold logic (CTL) is presented. The circuit consists of 20 threshold logic gates arranged in two stages, i.e., the parallel counter described here has an effective logic depth of two. The chargebased CTL gates are essentially dynamic circuits which require a periodic refresh or precharge cycle, but unlike conventional dynamic CMOS gates, the circuit can be operated in synchronous as well as in asynchronous mode. The counter circuit is implemented using conventional 1.2 ¯m doublepoly CMOS technology, and it occupies a silicon area of about 0.08 mm 2 : Extensive postlayout simulations indicate that the circuit has a typical inputtooutput propagation delay of less than 3 ns, and the test circuit is shown to operate reliably when consecutive 31b input vectors are applied at a rate of up to 16 Mvectors/s. With its demonstrated data processing capability of abou...
Automatic synthesis of compressor trees: reevaluating large counters
 Design Automation and Test in Europe (DATE ’07
"... Despite the progress of the last decades in electronic design automation, arithmetic circuits have always received way less attention than other classes of digital circuits. Logic synthesisers, which play a fundamental role in design today, play a minor role on most arithmetic circuits, performing s ..."
Abstract

Cited by 8 (7 self)
 Add to MetaCart
Despite the progress of the last decades in electronic design automation, arithmetic circuits have always received way less attention than other classes of digital circuits. Logic synthesisers, which play a fundamental role in design today, play a minor role on most arithmetic circuits, performing some local optimisations but hardly improving the overall structure of arithmetic components. Architectural optimisations have been often studied manually, and only in the case of very common building blocks such as fast adders and multiinput adders, adhoc techniques have been developed. A notable case is multiinput addition, which is the core of many circuits such as multipliers, etc. The most common technique to implement multiinput addition is using compressor trees, which are often composed of carrysave adders (based on (3: 2) counters, i.e., full adders). A large body of literature exists to implement compressor trees using large counters. However, all the large counters were built by using full and half adders recursively. In this paper we give some definite answers to issues related to the use of large counters. We present a general technique to implement large counters whose performance is much better than the ones composed of full and half adders. Also we show that it is not always useful to use larger optimised counters and sometimes a combination of various size counters gives the best performance. Our results show 15 % improvement in the critical path delay. In some cases even hardware area is reduced by using our counters. 1.
PublicKey Cryptographic Processor for RSA and ECC
 Columbia University
, 2004
"... We describe a generalpurpose processor architecture for accelerating publickey computations on server systems that demand high performance and flexibility to accommodate large numbers of secure connections with heterogeneous clients that are likely to be limited in the set of cryptographic algorit ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
We describe a generalpurpose processor architecture for accelerating publickey computations on server systems that demand high performance and flexibility to accommodate large numbers of secure connections with heterogeneous clients that are likely to be limited in the set of cryptographic algorithms supported. Flexibility is achieved in that the processor supports multiple publickey cryptosystems, namely RSA, DSA, DH, and ECC, arbitrary key sizes and, in the case of ECC, arbitrary curves over fields GF (p) and GF (2 m). At the core of the processor is a novel dualfield multiplier based on a modified carrysave adder (CSA) tree that supports both GF (p) and GF (2 m). In the case of a 64bit integer multiplier, the necessary modifications increase its size by a mere 5%. To efficiently schedule the multiplier, we implemented a multiplyaccumulate instruction that combines several steps of a multipleprecision multiplication in a single operation: multiplication, carry propagation, and partial product accumulation. We have developed a hardware prototype of the cryptographic processor in FPGA technology. If implemented in current 1.5 GHz processor technology, the processor executes 5,265 RSA1024 op/s and 25,756 ECC163 op/s the given key sizes offer comparable security strength. Looking at future security levels, performance is 786 op/s for RSA2048 and 9,576 op/s for ECC233. 1
Optimal Carry Save Networks
"... A general theory is developed for constructing the asymptotically shallowest networks and the asymptotically smallest networks (with respect to formula size) for the carry save addition of n numbers using any given basic carry save adder as a building block. Using these optimal carry save additi ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
A general theory is developed for constructing the asymptotically shallowest networks and the asymptotically smallest networks (with respect to formula size) for the carry save addition of n numbers using any given basic carry save adder as a building block. Using these optimal carry save addition networks the shallowest known multiplication circuits and the shortest formulae for the majority function (and many other symmetric Boolean functions) are obtained. In this paper, simple basic carry save adders are described, using which multiplication circuits of depth 3:71 log n (the result of which is given as the sum of two numbers) and majority formulae of size O(n 3:21 ) are constructed. Using more complicated basic carry save adders, not described here, these results could be further improved. Our best bounds are currently 3:57 log n for depth and O(n 3:13 ) for formula size. 1. Introduction The question `How fast can we multiply?' is one of the fundamental questions...
Multioutput Functional Decomposition with Exploitation of Don't Cares
 Proc. DATE 98
, 1998
"... Functional decomposition is an important technique in logic synthesis, especially for the design of lookup table based fpga architectures. ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Functional decomposition is an important technique in logic synthesis, especially for the design of lookup table based fpga architectures.
Integer Multiplication with Overflow Detection or Saturation
 IEEE Transactions on Computers
, 2000
"... AbstractÐHighspeed multiplication is frequently used in generalpurpose and applicationspecific computer systems. These systems often support integer multiplication, where two nbit integers are multiplied to produce a 2nbit product. To prevent growth in word length, processors typically return t ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
AbstractÐHighspeed multiplication is frequently used in generalpurpose and applicationspecific computer systems. These systems often support integer multiplication, where two nbit integers are multiplied to produce a 2nbit product. To prevent growth in word length, processors typically return the n least significant bits of the product and a flag that indicates whether or not overflow has occurred. Alternatively, some processors saturate results that overflow to the most positive or most negative representable number. This paper presents efficient methods for performing unsigned or two's complement integer multiplication with overflow detection or saturation. These methods have significantly less area and delay than conventional methods for integer multiplication with overflow detection or saturation.