Results 1  10
of
13
A scalable and unified multiplier architecture for finite fields GF(p) and GF(2 m
 and GF (2 m ). In Cryptographic Hardware and Embedded Systems — CHES 2000, LNCS
, 2000
"... We describe a scalable and unified architecture for a Montgomery multiplication module which operates in both types of finite fields GF(p) and GF(2m). The unified architecture requires only slightly more area than that of the multiplier architecture for the field GF(p). The multiplier is scalable,wh ..."
Abstract

Cited by 44 (12 self)
 Add to MetaCart
We describe a scalable and unified architecture for a Montgomery multiplication module which operates in both types of finite fields GF(p) and GF(2m). The unified architecture requires only slightly more area than that of the multiplier architecture for the field GF(p). The multiplier is scalable,which means that a fixedarea multiplication module can handle operands of any size,and also,the wordsize can be selected based on the area and performance requirements. We utilize the concurrency in the Montgomery multiplication operation by employing a pipelining design methodology. We also describe a scalable and unified adder module to carry out concomitant operations in our implementation of the Montgomery multiplication. The upper limit on the precision of the scalable and unified Montgomery multiplier is dictated only by the available memory to store the operands and internal results,and the module is capable of performing infiniteprecision Montgomery multiplication in both types of finite fields. Key Words: Prime fields,binary extension fields,multiplication,Montgomery multiplication, scalability,hardware implementation.
A Scalable Architecture for Montgomery Multiplication
 Lecture Notes in Computer Science
, 1999
"... This paper describes the methodology and design of a scalable Montgomery multiplication module. There is no limitation on the maximum number of bits manipulated by the multiplier, and the selection of the wordsize is made according to the available area and/or desired performance. We describe t ..."
Abstract

Cited by 41 (7 self)
 Add to MetaCart
This paper describes the methodology and design of a scalable Montgomery multiplication module. There is no limitation on the maximum number of bits manipulated by the multiplier, and the selection of the wordsize is made according to the available area and/or desired performance. We describe the general view of the new architecture, analyze hardware organization for its parallel computation, and discuss design tradeo#s which are useful to identify the best hardware configuration.
A Scalable Architecture for Modular Multiplication Based on Montgomery's Algorithm
 IEEE TRANSACTIONS ON COMPUTERS
, 2003
"... This paper presents a scalable architecture for the computation of modular multiplication, based on the Montgomery multiplication (MM) algorithm. A wordbased version of MM is presented and used to explain the main concepts in the hardware design. The proposed multiplier is able to work with any pr ..."
Abstract

Cited by 34 (2 self)
 Add to MetaCart
This paper presents a scalable architecture for the computation of modular multiplication, based on the Montgomery multiplication (MM) algorithm. A wordbased version of MM is presented and used to explain the main concepts in the hardware design. The proposed multiplier is able to work with any precision of the input operands, limited only by memory or control constraints. Its architecture gives enough freedom to select the word size and the degree of parallelism to be used, according to the available area and/or desired performance. Design trade offs are analyzed in order to identify adequate hardware configurations for a given area or bandwidth requirement.
Systolic, LinearArray Multiplier for a Class of RightShift Algorithms
, 1994
"... A very simple multiplier cell is developed for use in a linear, purely systolic array forming a digitserial multiplier for unsigned or 2'complement operands. Each cell produces two digitproduct terms and accumulates these into a previous sum of the same weight, developing the product least signifi ..."
Abstract

Cited by 28 (0 self)
 Add to MetaCart
A very simple multiplier cell is developed for use in a linear, purely systolic array forming a digitserial multiplier for unsigned or 2'complement operands. Each cell produces two digitproduct terms and accumulates these into a previous sum of the same weight, developing the product least significant digit first. Grouping two terms per cell, the ratio of active elements to latches is low, and only cells are needed for a tidl n by n multiply. A modulomultiplier is then developed by incorporating a Montgomery type of moduloreduction. Two such multipliers interconnect to form a purely systolic modulo exponentiator, capable of performing RSA encryption at very high clock frequencies, but with a low gate count and small area. It is also shown how the multiplier, with some simple backend connections, can compute modular inverses and perform modular division for a ppwer of two as modulus.
An RNS Montgomery Modular Multiplication Algorithm
, 1998
"... We present a new RNS modular multiplication for very large operands. The algorithm is based on Montgomery's method adapted to mixed radix, and is performed using a Residue Number System. By choosing the moduli of the RNS system reasonably large, and implementing the system on a ring of fairly simple ..."
Abstract

Cited by 25 (3 self)
 Add to MetaCart
We present a new RNS modular multiplication for very large operands. The algorithm is based on Montgomery's method adapted to mixed radix, and is performed using a Residue Number System. By choosing the moduli of the RNS system reasonably large, and implementing the system on a ring of fairly simple processors, the carryfree nature of RNS arithmetic achieves an effect corresponding to a redundant highradix implementation. The algorithm can be implemented to run in O(n) time on O(n) processors, where n is the number of moduli in the RNS system. 1 Introduction Many cryptosystems employs modular multiplication with very large numbers [RSA78, FS86]. Different algorithms have been proposed in the literature [Bri90, Kor93, Wal93, Tak93, SV93, Oru95]. Most of them use redundant radix number systems and Montgomery 's modular multiplication [Mon85]. On the other hand the Residue Number System (RNS) is also of particular interest because of the parallel and carry free nature of its arithmeti...
Design and Implementation of a Coprocessor for Cryptography Applications
, 1997
"... In this paper, an ASIC suitable for cryptography applications based on modular arithmetic techniques, is presented. These applications, such as for example digital signature (DSA) and public key encryption and decryption (RSA), use, as basic operation, the modular exponentiation. This ASIC works as ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
In this paper, an ASIC suitable for cryptography applications based on modular arithmetic techniques, is presented. These applications, such as for example digital signature (DSA) and public key encryption and decryption (RSA), use, as basic operation, the modular exponentiation. This ASIC works as a coprocessor with a special set of instructions specialized on dealing with high accuracy integers, as well as on the rapid evaluation of modular multiplications and exponentiations. The algorithm, the hardware architecture, the design methodology and the results are described in detail. 1. Introduction Security has become a key issue in the world of electronic communication. Besides how fast data are transmitted, the security of these data through the communication channel arises as one of the most important problems. Though, the time overhead due to data encryption and decryption should not impose a bottleneck in the communication process. Public key cryptography (RSA), as well as othe...
Modular exponentiation using parallel multipliers
 Proceedings of the 2003 IEEE International Conference on Field Programmable Technology (FPT
, 2003
"... A field programmable gate array (FPGA) semisystolic implementation of a modular exponentiation unit, suitable for use in implementing the RSA public key cryptosystem is presented. The design is carefully matched with features of the FPGA architecture, utilizing embedded 18×18bit multipliers on the ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
A field programmable gate array (FPGA) semisystolic implementation of a modular exponentiation unit, suitable for use in implementing the RSA public key cryptosystem is presented. The design is carefully matched with features of the FPGA architecture, utilizing embedded 18×18bit multipliers on the FPGA and employing a carry save addition scheme. Using this architecture, a 1024bit modular exponentiation can operate at 90 MHz on a Xilinx XC2V30006 device and perform a 1024bit RSA decryption in 0.66 ms with the Chinese Remainder Theorem. 1
Modular Multiplication and Base Extensions in Residue Number Systems
 IN 15TH IEEE SYMPOSIUM ON COMPUTER ARITHMETIC
, 2001
"... We present a new RNS modular multiplication for very large operands. The algorithm is based on Montgomery's method adapted to residue arithmetic. By choosing the moduli of the RNS system reasonably large, an eect corresponding to a redundant highradix implementation is achieved, due to the carryfr ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
We present a new RNS modular multiplication for very large operands. The algorithm is based on Montgomery's method adapted to residue arithmetic. By choosing the moduli of the RNS system reasonably large, an eect corresponding to a redundant highradix implementation is achieved, due to the carryfree nature of residue arithmetic. The actual computation in the multiplication takes place in constant time, where the unit of time is a few simple residue operations. However, it is necessary twice to convert values from one residue system into another, operations which take O(n) time on O(n) processors, where n is the number of moduli in the RNS systems. Thus these conversions are the bottlenecks of the method, and any future improvements in RNS base conversions, or the use of particular residue systems, can immediately be applied.
N.: Bipartite Modular Multiplication
, 2005
"... Abstract. This paper proposes a new fast method for calculating modular multiplication. The calculation is performed using a new representation of residue classes modulo M that enables the splitting of the multiplier into two parts. These two parts are then processed separately, in parallel, potenti ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Abstract. This paper proposes a new fast method for calculating modular multiplication. The calculation is performed using a new representation of residue classes modulo M that enables the splitting of the multiplier into two parts. These two parts are then processed separately, in parallel, potentially doubling the calculation speed. The upper part and the lower part of the multiplier are processed using the interleaved modular multiplication algorithm and the Montgomery algorithm respectively. Conversions back and forth between the original integer set and the new residue system can be performed at speeds up to twice that of the Montgomery method without the need for precomputed constants. This new method is suitable for both hardware implementation; and software implementation in a multiprocessor environment. Although this paper is focusing on the application of the new method in the integer field, the technique used to speed up the calculation can also easily be adapted for operation in the binary extended field GF (2 m). 1
A Hardware Algorithm for Modular Multiplication/ Division
 IEEE TRANSACTIONS ON COMPUTERS
, 2005
"... A mixed radix4/2 algorithm for modular multiplication/division suitable for VLSI implementation is proposed. The algorithm is based on Montgomery method for modular multiplication and on the extended Binary GCD algorithm for modular division. Both algorithms are modified and combined into the propo ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
A mixed radix4/2 algorithm for modular multiplication/division suitable for VLSI implementation is proposed. The algorithm is based on Montgomery method for modular multiplication and on the extended Binary GCD algorithm for modular division. Both algorithms are modified and combined into the proposed algorithm so that almost all the hardware components are shared. The new algorithm carries out both calculations using simple operations such as shifts, additions, and subtractions. The radix2 signeddigit representation is used to avoid carry propagation in all additions and subtractions. A modular multiplier/divider based on the algorithm performs an nbit modular multiplication/division in OðnÞ clock cycles where the length of the clock cycle is constant and independent of n. The modular multiplier/divider has a linear array structure with a bitslice feature and can be implemented with much smaller hardware than that necessary to implement both multiplier and divider separately.