Results 1 - 10
of
73
Translating pseudo-boolean constraints into SAT
- Journal on Satisfiability, Boolean Modeling and Computation
, 2006
"... In this paper, we describe and evaluate three different techniques for translating pseudoboolean constraints (linear constraints over boolean variables) into clauses that can be handled by a standard SAT-solver. We show that by applying a proper mix of translation techniques, a SAT-solver can perfor ..."
Abstract
-
Cited by 83 (2 self)
- Add to MetaCart
In this paper, we describe and evaluate three different techniques for translating pseudoboolean constraints (linear constraints over boolean variables) into clauses that can be handled by a standard SAT-solver. We show that by applying a proper mix of translation techniques, a SAT-solver can perform on a par with the best existing native pseudo-boolean solvers. This is particularly valuable in those cases where the constraint problem of interest is naturally expressed as a SAT problem, except for a handful of constraints. Translating those constraints to get a pure clausal problem will take full advantage of the latest improvements in SAT research. A particularly interesting result of this work is the efficiency of sorting networks to express pseudo-boolean constraints. Although tangential to this presentation, the result gives a suggestion as to how synthesis tools may be modified to produce arithmetic circuits more suitable for SAT based reasoning. Keywords: pseudo-Boolean, SAT-solver, SAT translation, integer linear programming
The Sum-Absolute-Difference Motion Estimation Accelerator
- In Proceedings of the 24 th Euromicro Conference
"... In this paper we investigate the Sum Absolute Difference (SAD) operation, an operation frequently used by a number of algorithms for digital motion estimation. For such operation, we propose a single vector instruction that can be performed (in hardware) on an entire block of data in parallel. We in ..."
Abstract
-
Cited by 23 (15 self)
- Add to MetaCart
In this paper we investigate the Sum Absolute Difference (SAD) operation, an operation frequently used by a number of algorithms for digital motion estimation. For such operation, we propose a single vector instruction that can be performed (in hardware) on an entire block of data in parallel. We investigate possible implementations for such an instruction. Assuming a machine cycle comparable to the cycle of a two cycle multiply, we show that for a block of 16x1 or 16x16, the SAD operation can be performed in 3 or 4 machine cycles respectively. The proposed implementation operates as follows: first we determine in parallel which of the operands is the smallest in a pair of operands. Second we compute the absolute value of the difference of each pairs by subtracting the smallest value from the largest and finally we compute the accumulation. The operations associated with the second and the third step are performed in parallel resulting in a multiply (accumulate) type of operation. Our approach covers also the Mean Absolute Difference (MAD) operation at the exclusion of a shifting (division) operation.
The SNAP Project: Design of Floating Point Arithmetic Units
- In Proceedings of Arith-13
, 1997
"... In recent years computer applications have increased in their computational complexity. The industry-wide usage of performance benchmarks, such as SPECmarks, and the popularity of 3D graphics applications forces processor designers to pay particular attention to implementation of the floating p ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
In recent years computer applications have increased in their computational complexity. The industry-wide usage of performance benchmarks, such as SPECmarks, and the popularity of 3D graphics applications forces processor designers to pay particular attention to implementation of the floating point unit, or FPU. This paper presents results of the Stanford subnanosecond arithmetic processor (SNAP) research effort in the design of hardware for floating point addition, multiplication and division. We show that one cycle FP addition is achievable 32% of the time using a variable latency algorithm. For multiplication, a binary tree is often inferior to a Wallace-tree designed using an algorithmic layout approach for contemporary feature sizes (0.3m). Further, in most cases two-bit Booth encoding of the multiplier is preferable to non-Booth encoding for partial product generation. It appears that for division, optimum area-performance is achieved using functional iteration, ...
Reduced Power Dissipation Through Truncated Multiplication
- in IEEE Alessandro Volta Memorial Workshop on Low Power Design
, 1999
"... Reducing the power dissipation of parallel multipliers is important in the design of digital signal processing systems. In many of these systems, the products of parallel multipliers are rounded to avoid growth in word size. The power dissipation and area of rounded parallel multipliers can be signi ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
Reducing the power dissipation of parallel multipliers is important in the design of digital signal processing systems. In many of these systems, the products of parallel multipliers are rounded to avoid growth in word size. The power dissipation and area of rounded parallel multipliers can be significantly reduced by a technique known as truncated multiplication. With this technique, the least significant columns of the multiplication matrix are not used. Instead, the carries generated by these columns are estimated. This estimate is added with the most significant columns to produce the rounded product. This paper presents the design and implementation of parallel truncated multipliers. Simulations indicate that truncated parallel multipliers dissipate between 29 and 40 percent less power than standard parallel multipliers for operand sizes of 16 and 32 bits. 1: Introduction High-speed parallel multipliers are fundamental building blocks in digital signal processing systems [1]. In...
A New Design Technique for Column Compression Multipliers
- IEEE TRANSACTIONS ON COMPUTERS
, 1995
"... In this paper, a new design technique for column-compression (CC) multipliers is presented. Constraints for column compression with full and half adders are analyzed and, under these constraints, considerable flexibility for implementation of the CC multiplier, including the allocation of adders, an ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
In this paper, a new design technique for column-compression (CC) multipliers is presented. Constraints for column compression with full and half adders are analyzed and, under these constraints, considerable flexibility for implementation of the CC multiplier, including the allocation of adders, and choosing the length of the final fast adder, is exploited. Using the example of an 8 8 bit CC multiplier, we show that architectures obtained from this new design technique are more area efficient, and have shorter interconnections than the classical Dadda CC multiplier. We finally show that our new technique is also suitable for the design of two's complement multipliers.
The design of a high speed ASIC unit for the hash function SHA-256 (384, 512)
- DATE'04 (DESIGN AUTOMATION AND TEST IN EUROPE)
, 2004
"... After recalling the basic algorithms published by NIST for implementing the hash functions SHA-256 (384, 512), a basic circuit characterized by a cascade of full adder arrays is given. Implementation options are discussed and two methods for improving speed are exposed: the delay balancing and the p ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
After recalling the basic algorithms published by NIST for implementing the hash functions SHA-256 (384, 512), a basic circuit characterized by a cascade of full adder arrays is given. Implementation options are discussed and two methods for improving speed are exposed: the delay balancing and the pipelining. An application of the former is first given, obtaining a circuit that reduces the length of the critical path by a full adder array. A pipelined version is then given, obtaining a reduction of two full adder arrays in the critical path. The two methods are afterwards combined and the results obtained through hardware synthesis are exposed, where a comparison between the new circuits is also given.
Design and Implementation of the MorphoSys Reconfigurable Computing Processor
- Journal of VLSI and Signal Processing-Systems for Signal, Image and Video Technology
, 2000
"... . In this paper, we describe the implementation of MorphoSys, a reconfigurable processing system targeted at data-parallel and computation-intensive applications. The MorphoSys architecture consists of a reconfigurable component (an array of reconfigurable cells) combined with a RISC control process ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
. In this paper, we describe the implementation of MorphoSys, a reconfigurable processing system targeted at data-parallel and computation-intensive applications. The MorphoSys architecture consists of a reconfigurable component (an array of reconfigurable cells) combined with a RISC control processor and a high bandwidth memory interface. We briefly discuss the system-level model, array architecture, and control processor. Next, we present the detailed design implementation and the various aspects of physical layout of different sub-blocks of MorphoSys. The physical layout was constrained for 100 MHz operation, with low power consumption, and was implemented using 0.35 m, four metal layer CMOS (3.3 Volts) technology. We provide simulation results for the MorphoSys architecture (based on VHDL model) for some typical data-parallel applications (video compression and automatic target recognition). The results indicate that the MorphoSys system can achieve significantly better performance...
Signed Binary Addition Circuitry with Inherent Even Parity Outputs
, 1997
"... A signed binary (SB) addition circuit is presented that always produces an even parity representation of the sum word. The novelty of this design is that no extra check bits are generated or used. The redundancy inherent in a SB representation is further exploited to contain parity information. ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
A signed binary (SB) addition circuit is presented that always produces an even parity representation of the sum word. The novelty of this design is that no extra check bits are generated or used. The redundancy inherent in a SB representation is further exploited to contain parity information.
Aspects of Systems and Circuits for Nanoelectronics
- PROCEEDINGS OF THE IEEE
, 1997
"... This paper analyzes the effect of this technological progress on the design of nanoelectronic circuits and describes computational paradigms revealing novel features such as distributed storage, fault tolerance, self-organization, and local processing. In particular, linear threshold networks, the a ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
This paper analyzes the effect of this technological progress on the design of nanoelectronic circuits and describes computational paradigms revealing novel features such as distributed storage, fault tolerance, self-organization, and local processing. In particular, linear threshold networks, the associative matrix, self-organizing feature maps, and cellular arrays are investigated from the viewpoint of their potential significance for nanoelectronics. Although these concepts have already been implemented using present technologies, the intention of this paper is to give an impression of their usefulness to system implementations with quantum-effect devices.
A comparison of three rounding algorithms for IEEE floating-point multiplication
, 1998
"... A new IEEE compliant floating-point rounding algorithm for computing the rounded product from a carry-save representation of the product is presented. The new rounding algorithm is compared with the rounding algorithms of Yu and Zyner [23] and of Quach et al. [18]. For each rounding algorithm, a log ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
A new IEEE compliant floating-point rounding algorithm for computing the rounded product from a carry-save representation of the product is presented. The new rounding algorithm is compared with the rounding algorithms of Yu and Zyner [23] and of Quach et al. [18]. For each rounding algorithm, a logical description and a block diagram is given and the latency is analyzed. We conclude that the new rounding algorithm is the fastest rounding algorithm, provided that an injection (which depends only on the rounding mode and the sign) can be added in during the reduction of the partial products into a carry-save encoded digit string. In double precision the latency of the new rounding algorithm is 12 logic levels compared to 14 logic levels in the algorithm of Quach et al., and 16 logic levels in the algorithm of Yu and Zyner. 1. Introduction Every modern microprocessor includes a floating-point (FP) multiplier that complies with the IEEE 754 Standard [9]. The latency of the FP multiplier...

