Results 1  10
of
15
A Scalable Architecture for Modular Multiplication Based on Montgomery's Algorithm
 IEEE TRANSACTIONS ON COMPUTERS
, 2003
"... This paper presents a scalable architecture for the computation of modular multiplication, based on the Montgomery multiplication (MM) algorithm. A wordbased version of MM is presented and used to explain the main concepts in the hardware design. The proposed multiplier is able to work with any pr ..."
Abstract

Cited by 41 (2 self)
 Add to MetaCart
(Show Context)
This paper presents a scalable architecture for the computation of modular multiplication, based on the Montgomery multiplication (MM) algorithm. A wordbased version of MM is presented and used to explain the main concepts in the hardware design. The proposed multiplier is able to work with any precision of the input operands, limited only by memory or control constraints. Its architecture gives enough freedom to select the word size and the degree of parallelism to be used, according to the available area and/or desired performance. Design trade offs are analyzed in order to identify adequate hardware configurations for a given area or bandwidth requirement.
Flexible Hardware Design for RSA and Elliptic Curve Cryptosystems
 Proceedings of Topics in Cryptology  CTRSA 2004. Lecture Note in Computer Science
, 2004
"... Abstract. This paper presents a scalable hardware implementation of both commonly used public key cryptosystems, RSA and Elliptic Curve Cryptosystem (ECC) on the same platform. The introduced hardware accelerator features a design which can be varied from very small (less than 20 Kgates) targeting w ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
(Show Context)
Abstract. This paper presents a scalable hardware implementation of both commonly used public key cryptosystems, RSA and Elliptic Curve Cryptosystem (ECC) on the same platform. The introduced hardware accelerator features a design which can be varied from very small (less than 20 Kgates) targeting wireless applications, up to a very big design (more than 100 Kgates) used for network security. In latter option it can include a few dedicated large number arithmetic units each of which is a systolic array performing the Montgomery Modular Multiplication (MMM). The bound on the Montgomery parameter has been optimized to facilitate more secure ECC point operations. Furthermore, we present a new possibility for CRT scheme which is less vulnerable to sidechannel attacks.
S.: An improved unified scalable radix2 Montgomery multiplier
 In: Proc. the 17th IEEE Symposium on Computer Arithmetic (ARITH
, 2005
"... This paper describes an improved version of the TencaKoç unified scalable radix2 Montgomery multiplier with half the latency for small and moderate precision operands and half the queue memory requirement. Like the TencaKoç multiplier, this design is reconfigurable to accept any input precision i ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
(Show Context)
This paper describes an improved version of the TencaKoç unified scalable radix2 Montgomery multiplier with half the latency for small and moderate precision operands and half the queue memory requirement. Like the TencaKoç multiplier, this design is reconfigurable to accept any input precision in either GF(p) or GF(2n) up to the size of the onchip memory. An FPGA implementation can perform 1024bit modular exponentiation in 16 ms using 5598 4input lookup tables, making it the fastest unified scalable design yet reported. 1.
Using Bleichenbacher’s Solution to the Hidden Number Problem to Attack Nonce Leaks in 384Bit ECDSA
 IACR EPRINT
, 2013
"... In this paper we describe an attack against nonce leaks in 384bit ECDSA using an FFTbased attack due to Bleichenbacher. The signatures were computed by a modern smart card. We extracted the loworder bits of each nonce using a templatebased power analysis attack against the modular inversion of t ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
In this paper we describe an attack against nonce leaks in 384bit ECDSA using an FFTbased attack due to Bleichenbacher. The signatures were computed by a modern smart card. We extracted the loworder bits of each nonce using a templatebased power analysis attack against the modular inversion of the nonce. We also developed a BKZbased method for the range reduction phase of the attack, as it was impractical to collect enough signatures for the collision searches originally used by Bleichenbacher. We confirmed our attack by extracting the entire signing key using a 5bit nonce leak from 4000 signatures.
Montgomery reduction algorithm for modular multiplication using lowweight polynomial form integers
 In Kornerup and Muller [870
"... Abstract. We extend lowweight polynomial form integers (LWPFIs) presented in [5]. An LWPFI p is an integer expressed as a degreel, monic polynomial such that p = t l + fl−1t l + · · · + f1t + f0, where t can be any positive integer. In [5], fi’s are limited to 0 and ±1, but here we let fi  ≤ ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We extend lowweight polynomial form integers (LWPFIs) presented in [5]. An LWPFI p is an integer expressed as a degreel, monic polynomial such that p = t l + fl−1t l + · · · + f1t + f0, where t can be any positive integer. In [5], fi’s are limited to 0 and ±1, but here we let fi  ≤ ξ for some small positive integer ξ. In modular multiplication based on LWPFI, elements in Zp are expressed in polynomial in t and multiplication is performed in Z[t]/f(t). The coefficients must be reduced for subsequent modular multiplications. In [5], a coefficient reduction algorithm based on a division algorithm derived from the Barrett reduction algorithm is presented. In this report, we present a coefficient reduction algorithm based on the Montgomery reduction algorithm and its detailed analysis results. Bounds on the input and output of our coefficient reduction algorithm is carefully analyzed. We give conditions for eliminating the final subtractions at the end of the Montgomery reduction algorithm. In addition, we present efficient modular addition and subtraction methods using LWPFI moduli. 1
Parallelized radix4 scalable Montgomery multipliers
 JOURNAL OF INTEGRATED CIRCUITS AND SYSTEMS
, 2008
"... This paper describes a parallelized radix4 scalable Montgomery multiplier implementation. The design does not require hardware multipliers, and uses parallelized multiplication to shorten the critical path. By leftshifting the sources rather than rightshifting the result, the latency between proc ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
This paper describes a parallelized radix4 scalable Montgomery multiplier implementation. The design does not require hardware multipliers, and uses parallelized multiplication to shorten the critical path. By leftshifting the sources rather than rightshifting the result, the latency between processing elements is shortened from two cycles to nearly one. Multiplexers are used to select precomputed products. Carrysave adders propagate carry bits before words are discarded. The new design can perform 1024bit modular exponentiation in 9.4 ms and 256bit exponentiation in 0.38 ms using 4997 Virtex2 4input lookup tables, while consuming 30 % fewer LUTs than a previous parallelized radix4 design. This is comparable to radix2 for long multiplies and nearly twice as fast for short ones.
Coordinate Blinding over Large Prime Fields
"... Abstract. In this paper we propose a multiplicative blinding scheme for protecting implementations of a scalar multiplication over elliptic curves. Specifically, this blinding method applies to elliptic curves in the short Weierstraß form over large prime fields. The described countermeasure is show ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper we propose a multiplicative blinding scheme for protecting implementations of a scalar multiplication over elliptic curves. Specifically, this blinding method applies to elliptic curves in the short Weierstraß form over large prime fields. The described countermeasure is shown to be a generalization of the use of random curve isomorphisms to prevent sidechannel analysis, and the best configuration of this countermeasure is shown to be equivalent to the use of random curve isomorphisms. Furthermore, we describe how this countermeasure, and therefore random curve isomorphisms, can be efficiently implemented using Montgomery multiplication.
Faster and smaller hardware implementation of XTR
 In Proceedings of SPIE, Symposium on Optics & photonics, Advanced Signal Processing Algorithms, Architectures, and Implementations
, 2006
"... Modular multiplication is the core of most Public Key Cryptosystems and therefore its implementation plays a crucial role in the overall efficiency of asymmetric cryptosystems. Hardware approaches provide advantages over software in the framework of efficient dedicated accelerators. The concerns of ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Modular multiplication is the core of most Public Key Cryptosystems and therefore its implementation plays a crucial role in the overall efficiency of asymmetric cryptosystems. Hardware approaches provide advantages over software in the framework of efficient dedicated accelerators. The concerns of the designers are mainly the die size, frequency, latency (throughput) and power consumption of those solutions. We show in this paper how Booth recoding, pipelining, Montgomery modular multiplication and carry save adders offer an attractive solution for hardware modular multiplication. Although most of the hereafter techniques stand as stateoftheart, the combination described here is unique and particularly efficient in the context of constrained hardware design of XTR cryptosystem. Our solution is implemented on an FPGA platform and compared with previous results. The areatime ratio is improved by around a factor of 3.
GENERALISED MERSENNE NUMBERS REVISITED
"... Abstract. Generalised Mersenne Numbers (GMNs) were defined by Solinas in 1999 and feature in the NIST (FIPS 1862) and SECG standards for use in elliptic curve cryptography. Their form is such that modular reduction is extremely efficient, thus making them an attractive choice for modular multiplica ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Generalised Mersenne Numbers (GMNs) were defined by Solinas in 1999 and feature in the NIST (FIPS 1862) and SECG standards for use in elliptic curve cryptography. Their form is such that modular reduction is extremely efficient, thus making them an attractive choice for modular multiplication implementation. However, the issue of residue multiplication efficiency seems to have been overlooked. Asymptotically, using a cyclic rather than a linear convolution, residue multiplication modulo a Mersenne number is twice as fast as integer multiplication; this property does not hold for prime GMNs, unless they are of Mersenne’s form. In this work we exploit an alternative generalisation of Mersenne numbers for which an analogue of the above property — and hence the same efficiency ratio — holds, even at bitlengths for which schoolbook multiplication is optimal, while also maintaining very efficient reduction. Moreover, our proposed primes are abundant at any bitlength, whereas GMNs are extremely rare. Our multiplication and reduction algorithms can also be easily parallelised, making our arithmetic particularly suitable for hardware implementation. Furthermore, the field representation we propose also naturally protects against sidechannel attacks, including timing attacks, simple power analysis and differential power analysis, which is essential in many cryptographic scenarios, in constrast to GMNs. 1.
Parallelized BoothEncoded Radix4 Montgomery Multipliers
"... Abstract — This paper proposes two parallelized radix4 scalable Montgomery multiplier implementations. The designs do not require precomputed hard multiples of the operands, but instead uses Booth encoding to compute products. The designs use a novel method for propagating the sign bits for negativ ..."
Abstract
 Add to MetaCart
Abstract — This paper proposes two parallelized radix4 scalable Montgomery multiplier implementations. The designs do not require precomputed hard multiples of the operands, but instead uses Booth encoding to compute products. The designs use a novel method for propagating the sign bits for negative partial products. The first design right shifts operands to reduce critical path length when using Booth encoding. The second design left shifts operands to improve latency between processing elements and to decrease hardware usage. An FPGA implementation of the rightshifting design consumes 17 % more lookup tables (LUTs) and 25 % to 33 % more flipflops than a comparable nonBooth encoded design. It performs 1024bit modular exponentiation in 9.1 ms using 5959 LUTs and 5079 flipflops. The leftshifting design consumes 3 % fewer LUTs and 29 % to 33 % fewer REGs than nonBooth. Its clock speed is 25% slower than nonBooth, and it performs 1024bit modular exponentiation in 13 ms using 4852 LUTs and 2887 flipflops. I.