Results 1  10
of
11
Bipartite Modular Multiplication
, 2005
"... This paper proposes a new fast method for calculating modular multiplication. The calculation is performed using a new representation of residue classes modulo M that enables the splitting of the multiplier into two parts. These two parts are then processed separately, in parallel, potentially doub ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
This paper proposes a new fast method for calculating modular multiplication. The calculation is performed using a new representation of residue classes modulo M that enables the splitting of the multiplier into two parts. These two parts are then processed separately, in parallel, potentially doubling the calculation speed. The upper part and the lower part of the multiplier are processed using the interleaved modular multiplication algorithm and the Montgomery algorithm respectively. Conversions back and forth between the original integer set and the new residue system can be performed at speeds up to twice that of the Montgomery method without the need for precomputed constants. This new method is suitable for both hardware implementation; and software implementation in a multiprocessor environment. Although this paper is focusing on the application of the new method in the integer field, the technique used to speed up the calculation can also easily be adapted for operation in the binary extended field GF (2 m).
A ComplexityEffective Version of Montgomery’s Algorithm
 in Workshop on Complexity Effective Designs, ISCA’02, May 2002, http://www.ee.rochester.edu:8080/ ∼ albonesi/wced02
, 2002
"... AbstractA new version of Montgomery’s algorithm for modular multiplication of large integers and its implementation in hardware is presented. It has been designed to meet the predominant requirements of most modern devices: small chip area and low power consumption. The algorithm is superior to th ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
AbstractA new version of Montgomery’s algorithm for modular multiplication of large integers and its implementation in hardware is presented. It has been designed to meet the predominant requirements of most modern devices: small chip area and low power consumption. The algorithm is superior to the original method by a factor of 2, with respect to both area and latency. The new method has a simple structure. It requires a small amount of precomputation and storage in order to reduce the number of neccessary additions by a factor of 2. Index terms—modulo multiplication, carry save addition, Montgomery algorithm A.
AUTOMATIC GENERATION OF MODULAR MULTIPLIERS FOR FPGA APPLICATIONS 1 Automatic Generation of Modular Multipliers for FPGA Applications
"... Abstract — Since redundant number systems allow constant time addition, they are often at the heart of modular multipliers designed for public key cryptography (PKC) applications. Indeed, PKC involves large operands (160 to 1024 bits) and several researchers proposed carrysave or borrowsave algori ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract — Since redundant number systems allow constant time addition, they are often at the heart of modular multipliers designed for public key cryptography (PKC) applications. Indeed, PKC involves large operands (160 to 1024 bits) and several researchers proposed carrysave or borrowsave algorithms. However, these number systems do not take advantage of the dedicated carry logic available in modern Field Programmable Gate Arrays (FPGAs). To overcome this problem, we suggest to perform modular multiplication in a highradix carrysave number system, where a sum bit of the carrysave representation is replaced by a sum word. Two digits are then added by means of a small CarryRipple Adder (CRA). Furthermore, we propose an algorithm which selects the best highradix carrysave representation for a given modulus, and generates a synthesizable VHDL description of the operator. I.
Modular multiplication of large integers on fpga
 in Proceedings of the 39th Asilomar Conference on Signals, Systems & Computers. IEEE Signal Processing Society
, 2005
"... Abstract — Public key cryptography often involves modular multiplication of large operands (160 up to 2048 bits). Several researchers have proposed iterative algorithms whose internal data are carrysave numbers. This number system is unfortunately not well suited to today’s Field Programmable Gate ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract — Public key cryptography often involves modular multiplication of large operands (160 up to 2048 bits). Several researchers have proposed iterative algorithms whose internal data are carrysave numbers. This number system is unfortunately not well suited to today’s Field Programmable Gate Arrays (FPGAs) embedding dedicated carry logic. We propose to perform modular multiplication in a highradix carrysave number system, where the sum bit of the wellknown carrysave representation is replaced by a sum word. Two digits are then added by means of a small CarryRipple Adder (CRA). The originality of our approach is to analyze the modulus in order to select the most efficient highradix carrysave representation. I.
Improving Cryptographic Architectures by Adopting Efficient Adders
 in their Modular Multiplication Hardware’, The 9th Annual Gulf Internet Symposium, Oct. 2003, Khobar, Saudi Arabia
"... This work studies and compares different modular multiplication algorithms with emphases on the underlying binary adders. The method of interleaving multiplication and reduction, Montgomery’s method, and highradix method were studied using the carrysave adder, carrylookahead adder and carryskip ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
This work studies and compares different modular multiplication algorithms with emphases on the underlying binary adders. The method of interleaving multiplication and reduction, Montgomery’s method, and highradix method were studied using the carrysave adder, carrylookahead adder and carryskip adder. Two recent implementations of the first two methods were modeled and synthesized for practical analysis. A modular multiplier following Koc’s implementation [6] based on carrysave adders and the use of carryskip adders in the final addition step is expected to be of a fast speed with fair area requirement and reduced power consumption. 1.
RSA encryption using extended modular arithmetic on the quicksilver COSM adaptive computing machine
 IEEE Symposium on Field Programmable Custom Computing Machines (FCCM 03
, 2003
"... Modular arithmetic is typically the computational bottleneck in a hardware implementation of public key cryptography algorithms. This paper focuses on an implementation of modular multiplication on the Quicksilver COSM adaptive computing machine as a runtimereconfigurable user authentication conte ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Modular arithmetic is typically the computational bottleneck in a hardware implementation of public key cryptography algorithms. This paper focuses on an implementation of modular multiplication on the Quicksilver COSM adaptive computing machine as a runtimereconfigurable user authentication context candidate. The design is targeted specifically to the COSM adaptive computing machine, taking into account the underlying architecture of the device
Design And Implementation of a Configurable Interleaver/Deinterleaver for Turbo Codes in 3GPP Standard
"... Abstract — During the last decade, Turbo codes have been taking an increasing importance in channel coding due to its good performance in error correction. One key component in Turbo codes is the interleaver/deinterleaver pair, often designed as reconfigurable coprocessors able to deal with requirem ..."
Abstract
 Add to MetaCart
Abstract — During the last decade, Turbo codes have been taking an increasing importance in channel coding due to its good performance in error correction. One key component in Turbo codes is the interleaver/deinterleaver pair, often designed as reconfigurable coprocessors able to deal with requirements of large data length variability found in the newest communication standards. In this work we introduce a configurable interleaver architecture for the turbo decoder in 3GPP standard. It is implemented under the idea of “iterative modulo computation ” presented in [1] and exploited in [2], but capable to handle all the datalength configurations. Additionally, the presented solution not only generates the interleaved addresses, but also deals with the flow of data streams through the interleaver. The architecture and FPGA implementation results are also presented. In Turbo codes, the interleaver is located, at the encoder side, between the two recursive systematic convolutional (RSC) encoders as shown in Fig. 1. At the decoder side, interleaver and deinterleavers are required, as shown in Fig. 2, for the LogMAPbased iterative decoding turbo decoder, as the one presented in [6]. Figure 1. Turbo coder diagram Keywords 3GPP Turbo code; Configurable Interleaver.
Exploring the DesignSpace for FPGAbased Implementation of RSA ⋆
"... In this paper we present two alternative architectures for implementing the RSA algorithm on reconfigurable hardware. Both architectures are innovative, especially with respect to the implementation of modular multiplication. As to the area vs time tradeoff, the two solutions are at the extremes o ..."
Abstract
 Add to MetaCart
In this paper we present two alternative architectures for implementing the RSA algorithm on reconfigurable hardware. Both architectures are innovative, especially with respect to the implementation of modular multiplication. As to the area vs time tradeoff, the two solutions are at the extremes of the designspace, since one adopts a word serial approach, while the other has a fully parallel organization. Based on the analysis of these architectures for different values of the serialization factor, we explore the designspace for the FPGAbased implementation of the RSA algorithm. We systematically analyze and compare the results of the two design processes with respect to two fundamental metrics, namely execution time and FPGA resource usage. We emphasize pros and cons and comment tradeoffs of the two design alternatives.
TRANSACTION ON COMPUTERS 1 Efficient Hardware Implementation of Fparithmetic for PairingFriendly Curves
"... Abstract—This paper describes a new method to speed up Fparithmetic in hardware for pairingfriendly curves, such as the well known BarretoNaehrig (BN) curves. We explore the characteristics of the modulus defined by these curves and choose curve parameters such that Fp multiplication becomes more ..."
Abstract
 Add to MetaCart
Abstract—This paper describes a new method to speed up Fparithmetic in hardware for pairingfriendly curves, such as the well known BarretoNaehrig (BN) curves. We explore the characteristics of the modulus defined by these curves and choose curve parameters such that Fp multiplication becomes more efficient. The proposed algorithm uses Montgomery reduction in a polynomial ring combined with a coefficient reduction phase using a pseudoMersenne number. As an application we show that the performance of pairings on BN curves in hardware can be significantly improved, resulting in a factor 2.5 speedup compared with stateoftheart hardware implementations.