Results 1  10
of
248
Graphbased algorithms for Boolean function manipulation
 IEEE TRANSACTIONS ON COMPUTERS
, 1986
"... In this paper we present a new data structure for representing Boolean functions and an associated set of manipulation algorithms. Functions are represented by directed, acyclic graphs in a manner similar to the representations introduced by Lee [1] and Akers [2], but with further restrictions on th ..."
Abstract

Cited by 3153 (51 self)
 Add to MetaCart
(Show Context)
In this paper we present a new data structure for representing Boolean functions and an associated set of manipulation algorithms. Functions are represented by directed, acyclic graphs in a manner similar to the representations introduced by Lee [1] and Akers [2], but with further restrictions on the ordering of decision variables in the graph. Although a function requires, in the worst case, a graph of size exponential in the number of arguments, many of the functions encountered in typical applications have a more reasonable representation. Our algorithms have time complexity proportional to the sizes of the graphs being operated on, and hence are quite efficient as long as the graphs do not grow too large. We present experimental results from applying these algorithms to problems in logic design verification that demonstrate the practicality of our approach.
The AreaTime Complexity of Binary Multiplication
 Journal of the ACM
, 1981
"... ABSTRACT The problem of performing multtphcaUon of nbit binary numbers on a chip is considered Let A denote the ch~p area and T the time reqmred to perform mult~phcation. By using a model of computation which is a realistic approx~mauon to current and anucipated LSI or VLSI technology, ~t is shown ..."
Abstract

Cited by 32 (1 self)
 Add to MetaCart
(Show Context)
ABSTRACT The problem of performing multtphcaUon of nbit binary numbers on a chip is considered Let A denote the ch~p area and T the time reqmred to perform mult~phcation. By using a model of computation which is a realistic approx~mauon to current and anucipated LSI or VLSI technology, ~t is shown that A T 2. for all a ~ [0, 1], where A0 and To are posmve constants which depend on the technology but are mdependent of n. The exponent 1 + a is the best possible A consequence of this result is that binary multiphcatlon is &quot;harder &quot; than binary addmon More precisely, ff(AT2~)M(n) and (AT2~)A(n) denote the mmimum areatime complexity for nb~t binary multiphcauon and addmon, respectively, then (AT2~)M(n) _ 1 f~(nla) for 0 _< a< na for ~<a_<l for°>, ( = fi(nl/2) for all a _> 0).
Reduced Power Dissipation Through Truncated Multiplication
 in IEEE Alessandro Volta Memorial Workshop on Low Power Design
, 1999
"... Reducing the power dissipation of parallel multipliers is important in the design of digital signal processing systems. In many of these systems, the products of parallel multipliers are rounded to avoid growth in word size. The power dissipation and area of rounded parallel multipliers can be signi ..."
Abstract

Cited by 22 (7 self)
 Add to MetaCart
(Show Context)
Reducing the power dissipation of parallel multipliers is important in the design of digital signal processing systems. In many of these systems, the products of parallel multipliers are rounded to avoid growth in word size. The power dissipation and area of rounded parallel multipliers can be significantly reduced by a technique known as truncated multiplication. With this technique, the least significant columns of the multiplication matrix are not used. Instead, the carries generated by these columns are estimated. This estimate is added with the most significant columns to produce the rounded product. This paper presents the design and implementation of parallel truncated multipliers. Simulations indicate that truncated parallel multipliers dissipate between 29 and 40 percent less power than standard parallel multipliers for operand sizes of 16 and 32 bits. 1: Introduction Highspeed parallel multipliers are fundamental building blocks in digital signal processing systems [1]. In...
The SNAP Project: Design of Floating Point Arithmetic Units
 In Proceedings of Arith13
, 1997
"... In recent years computer applications have increased in their computational complexity. The industrywide usage of performance benchmarks, such as SPECmarks, and the popularity of 3D graphics applications forces processor designers to pay particular attention to implementation of the floating p ..."
Abstract

Cited by 21 (2 self)
 Add to MetaCart
(Show Context)
In recent years computer applications have increased in their computational complexity. The industrywide usage of performance benchmarks, such as SPECmarks, and the popularity of 3D graphics applications forces processor designers to pay particular attention to implementation of the floating point unit, or FPU. This paper presents results of the Stanford subnanosecond arithmetic processor (SNAP) research effort in the design of hardware for floating point addition, multiplication and division. We show that one cycle FP addition is achievable 32% of the time using a variable latency algorithm. For multiplication, a binary tree is often inferior to a Wallacetree designed using an algorithmic layout approach for contemporary feature sizes (0.3m). Further, in most cases twobit Booth encoding of the multiplier is preferable to nonBooth encoding for partial product generation. It appears that for division, optimum areaperformance is achieved using functional iteration, ...
Design and implementation of the morphoSys reconfigurable computing processor
 J. Very Large Scale Integr. Signal Process. Syst
, 2000
"... Abstract. In this paper, we describe the implementation of MorphoSys, a reconfigurable processing system targeted at dataparallel and computationintensive applications. The MorphoSys architecture consists of a reconfigurable component (an array of reconfigurable cells) combined with a RISC control ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper, we describe the implementation of MorphoSys, a reconfigurable processing system targeted at dataparallel and computationintensive applications. The MorphoSys architecture consists of a reconfigurable component (an array of reconfigurable cells) combined with a RISC control processor and a high bandwidth memory interface. We briefly discuss the systemlevel model, array architecture, and control processor. Next, we present the detailed design implementation and the various aspects of physical layout of different subblocks of MorphoSys. The physical layout was constrained for 100 MHz operation, with low power consumption, and was implemented using 0.35 m, four metal layer CMOS (3.3 Volts) technology. We provide simulation results for the MorphoSys architecture (based on VHDL model) for some typical dataparallel applications (video compression and automatic target recognition). The results indicate that the MorphoSys system can achieve significantly better performance for most of these applications in comparison with other systems and processors. 1.
ZeroValue Point Attacks on Elliptic Curve Cryptosystem
 Information Security  ISC 2003, LNCS 2851
"... Abstract. Several experimental results ensure that the differential power analysis (DPA) breaks the implementation of elliptic curve cryptosystem (ECC) on memory constraint devices. In order to resist the DPA, the parameters of the underlying curve must be randomized. We usually randomize the base p ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Several experimental results ensure that the differential power analysis (DPA) breaks the implementation of elliptic curve cryptosystem (ECC) on memory constraint devices. In order to resist the DPA, the parameters of the underlying curve must be randomized. We usually randomize the base point in the projective coordinate, or we transform all parameters to the random isomorphic curve. However, Goubin pointed out the point (0, y) can not be randomized by these countermeasures. This point is often contained in the standard curves, and we have to care this attack. In this paper, we propose a novel attack, called the zerovalue point attack. On the contrary to Goubin’s attack, we use the zerovalue registers in the addition formulae. Even if a point has no zerovalue coordinate, the auxiliary registers might take zerovalue. We investigate these zerovalue registers that cannot be randomized by the above randomization. Indeed on elliptic curves over prime fields, we have found several points P = (x, y) which cause the zerovalue registers, e.g., (1)3x 2 + a = 0, (2)5x 4 + 2ax 2 − 4bx + a 2 = 0, (3)P is ycoordinate selfcollision point, etc. We demonstrate the standard curves that have these points. Interestingly, some conditions required for the zerovalue attack depend on the explicit implementation of the addition formulae — in order to resist this type of attacks, we have to care how to assemble the multiplications and the additions in the addition formulae. Moreover, we show zerovalue points for Montgomerytype method and elliptic curves over binary fields.
On Fast IEEE Rounding
, 1991
"... A systematic general rounding procedure is proposed for floatingpoint arithmetic operations. This procedure consists of 2 steps: constructing a rounding table and selecting a prediction scheme. Optimization guidelines are given in each step to allow hardware to be minimized. This procedurebased ro ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
A systematic general rounding procedure is proposed for floatingpoint arithmetic operations. This procedure consists of 2 steps: constructing a rounding table and selecting a prediction scheme. Optimization guidelines are given in each step to allow hardware to be minimized. This procedurebased rounding method has the additional advantage that verification and generalization are straightforward. Constructing a rounding table involves examining the range of the result and the shifting possibilities during the normalization step in an operation while selecting a prediction scheme depends on detail of the hardware model used. Two rounding hardware models are described. The first is shown to be identical to that reported by Santoro et al. [1]. The second is more powerful, providing solutions where the first fails. Applying this approach to the IEEE rounding modes for highspeed conventional binary multipliers reveals that round to infinity is more difficult to implement than the round to...
Parallel Saturating Fractional Arithmetic Units
 IN 9TH GREAT LAKES SYMPOSIUM ON VLSI
, 1999
"... This paper describes the designs of a saturating adder, multiplier, single MAC unit, and dual MAC unit with one cycle latencies. The dual MAC unit can perform two saturating MAC operations in parallel and accumulate the results with saturation. Specialized saturation logic ensures that the output of ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
This paper describes the designs of a saturating adder, multiplier, single MAC unit, and dual MAC unit with one cycle latencies. The dual MAC unit can perform two saturating MAC operations in parallel and accumulate the results with saturation. Specialized saturation logic ensures that the output of the dual MAC unit is identical to the result of the operations performed serially with saturation after each multiplication and each addition 1
Modular range reduction: A new algorithm for fast and accurate computation of the elementary functions
 Journal of Universal Computer Science
, 1995
"... Abstract: A new range reduction algorithm, called Modular Range Reduction (MRR), brie y introduced by the authors in [Daumas et al. 1994] is deeply analyzed. It is used to reduce the arguments to exponential and trigonometric function algorithms to be within the small range for which the algorithms ..."
Abstract

Cited by 13 (10 self)
 Add to MetaCart
Abstract: A new range reduction algorithm, called Modular Range Reduction (MRR), brie y introduced by the authors in [Daumas et al. 1994] is deeply analyzed. It is used to reduce the arguments to exponential and trigonometric function algorithms to be within the small range for which the algorithms are valid. MRR reduces the arguments quickly and accurately. A fast hardwired implementation of MRR operates in time O(log ( n)), where n is the number of bits of the binary input value. For example, with MRR it becomes possible to compute the sine and cosine of a very large number accurately. We propose two possible architectures implementing this algorithm.