Results 1 - 10
of
18
Architectural Support for Fast Symmetric-Key Cryptography
- in Proc. Intl. Conf. ASPLOS
, 2000
"... The emergence of the Internet as a trusted medium for commerce and communication has made cryptography an essential component of modern information systems. Cryptography provides the mechanisms necessary to implement accountability, accuracy, and confidentiality in communication. As demands for secu ..."
Abstract
-
Cited by 41 (0 self)
- Add to MetaCart
The emergence of the Internet as a trusted medium for commerce and communication has made cryptography an essential component of modern information systems. Cryptography provides the mechanisms necessary to implement accountability, accuracy, and confidentiality in communication. As demands for secure communication bandwidth grow, efficient cryptographic processing will become increasingly vital to good system performance. In this paper, we explore techniques to improve the performance of symmetric key cipher algorithms. Eight popular strong encryption algorithms are examined in detail. Analysis reveals the algorithms are computationally complex and contain little parallelism. Overall throughput on a high-end microprocessor is quite poor, a 600 Mhz processor is incapable of saturating a T3 communication line with 3DES (triple DES) encrypted data. We introduce new instructions that improve the efficiency of the analyzed algorithms. Our approach adds instruction set support for fast substitutions, general permutations, rotates, and modular arithmetic. Performance analysis of the optimized ciphers shows an overall speedup of 59 % over a baseline machine with rotate instructions and 74 % speedup over a baseline without rotates. Even higher speedups are demonstrated with optimized substitutions (SBOXes) and additional functional unit resources. Our analyses of the original and optimized algorithms suggest future directions for the design of high-performance programmable cryptographic processors. 1
High-Radix Montgomery Modular Exponentiation on Reconfigurable Hardware
- IEEE Transactions on Computers
, 2001
"... to appear in the IEEE Transactions on Computers It is widely recognized that security issues will play a crucial role in the majority of future computer and communication systems. Central tools for achieving system security are cryptographic algorithms. This contribution proposes arithmetic architec ..."
Abstract
-
Cited by 30 (3 self)
- Add to MetaCart
to appear in the IEEE Transactions on Computers It is widely recognized that security issues will play a crucial role in the majority of future computer and communication systems. Central tools for achieving system security are cryptographic algorithms. This contribution proposes arithmetic architectures which are optimized for modern field programmable gate arrays (FPGAs). The proposed architectures perform modular exponentiation with very long integers. This operation is at the heart of many practical public-key algorithms such as RSA and discrete logarithm schemes. We combine a high–radix Montgomery modular multiplication algorithm with a new systolic array design. The designs are flexible, allowing any choice of operand and modulus. The new architecture also allows the use of high radices. Unlike previous approaches, we systematically implement and compare several variants of our new architecture for different bit lengths. We provide absolute area and timing measures for each architecture. The results allow conclusions about the feasibility and time-space trade-offs of our architecture for implementation on commercially available FPGAs. We found that 1024 bit RSA decryption can be done in 3.1 ms with our fastest architecture.
A Scalable Architecture for Modular Multiplication Based on Montgomery's Algorithm
- IEEE TRANSACTIONS ON COMPUTERS
, 2003
"... This paper presents a scalable architecture for the computation of modular multiplication, based on the Montgomery multiplication (MM) algorithm. A word-based version of MM is presented and used to explain the main concepts in the hardware design. The proposed multiplier is able to work with any pr ..."
Abstract
-
Cited by 27 (1 self)
- Add to MetaCart
This paper presents a scalable architecture for the computation of modular multiplication, based on the Montgomery multiplication (MM) algorithm. A word-based version of MM is presented and used to explain the main concepts in the hardware design. The proposed multiplier is able to work with any precision of the input operands, limited only by memory or control constraints. Its architecture gives enough freedom to select the word size and the degree of parallelism to be used, according to the available area and/or desired performance. Design trade offs are analyzed in order to identify adequate hardware configurations for a given area or bandwidth requirement.
A High-Performance Flexible Architecture for Cryptography
- 1717 in Lecture Notes in Computer Science
, 1999
"... . Cryptographic algorithms are more efficiently implemented in custom hardware than in software running on general-purpose processors. However, systems which use hardware implementations have significant drawbacks: they are unable to respond to flaws discovered in the implemented algorithm or to cha ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
. Cryptographic algorithms are more efficiently implemented in custom hardware than in software running on general-purpose processors. However, systems which use hardware implementations have significant drawbacks: they are unable to respond to flaws discovered in the implemented algorithm or to changes in standards. In this paper we show how reconfigurable computing offers high performance yet flexible solutions for cryptographic algorithms. We focus on PipeRench, a reconfigurable fabric that supports implementations which can yield better than custom-hardware performance and yet maintains all the flexibility of software based systems. PipeRench is a pipelined reconfigurable fabric which virtualizes hardware, enabling large circuits to be run on limited physical hardware. We present implementations for Crypton, IDEA, RC6, and Twofish on PipeRench and an extension of PipeRench, PipeRench . We also describe how various proposed AES algorithms could be implemented on PipeRe...
Efficient Architectures for implementing Montgomery Modular Multiplication and RSA Modular Exponentiation on Reconfigurable Logic
- In Tenth ACM International Symposium on Field-Programmable Gate Arrays
, 2002
"... This paper presents a review of some existing architectures for the implementation of Montgomery modular multiplication and exponentiation on FPGA (Field Programmable Gate Array). Some new architectures are presented, including a pipelined architecture exploiting the maximum carry chain length of th ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
This paper presents a review of some existing architectures for the implementation of Montgomery modular multiplication and exponentiation on FPGA (Field Programmable Gate Array). Some new architectures are presented, including a pipelined architecture exploiting the maximum carry chain length of the FPGA which is used to implement the modular exponentiation operation required for RSA encryption and decryption. Speed and area comparisons are performed on the optimised designs. The issues of targeting a design specifically for a reconfigurable device are considered, taking into account the underlying architecture imposed by the target technology.
Accelerating Private-Key Cryptography via Multithreading on Symmetric Multiprocessors
- Proc. IEEE Int’l Symp. Performance Analysis of Systems and Software (ISPASS 03), IEEE
, 2003
"... Achieving high performance in cryptographic processing is important due to the increasing connectivity among today's computers. Despite steady improvements in microprocessor and system performance, private-key cipher implementations continue to be slow. Irrespective of the cipher used, the main reas ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Achieving high performance in cryptographic processing is important due to the increasing connectivity among today's computers. Despite steady improvements in microprocessor and system performance, private-key cipher implementations continue to be slow. Irrespective of the cipher used, the main reason for the low performance is lack of parallelism, which fundamentally comes from encryption modes such as the Cipher Block Chaining (CBC) mode. In CBC, each plaintext block is XOR'ed with the previous ciphertext block and then encrypted, essentially inducing a tight recurrence through the ciphertext blocks. To deliver high performance while maintaining high level of security assurance in real systems, the cryptography community has proposed Interleaved Cipher Block Chaining (ICBC) mode.
An Improved Unified Scalable Radix-2 Montgomery Multiplier
"... This paper describes an improved version of the Tenca-Koç unified scalable radix-2 Montgomery multiplier with half the latency for small and moderate precision operands and half the queue memory requirement. Like the Tenca-Koç multiplier, this design is reconfigurable to accept any input precision i ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
This paper describes an improved version of the Tenca-Koç unified scalable radix-2 Montgomery multiplier with half the latency for small and moderate precision operands and half the queue memory requirement. Like the Tenca-Koç multiplier, this design is reconfigurable to accept any input precision in either GF(p) or GF(2 n) up to the size of the on-chip memory. An FPGA implementation can perform 1024-bit modular exponentiation in 16 ms using 5598 4-input lookup tables, making it the fastest unified scalable design yet reported. 1.
Hardware Implementation of a Montgomery Modular Multiplier in a Systolic Array
"... This paper describes a hardware architecture for modular multiplication operation which is efficient for bit-lengths suitable for both commonly used types of Public Key Cryptography (PKC) i.e. ECC and RSA Cryptosystems. The challenge of current PKC implementations is to deal with long numbers (160-2 ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
This paper describes a hardware architecture for modular multiplication operation which is efficient for bit-lengths suitable for both commonly used types of Public Key Cryptography (PKC) i.e. ECC and RSA Cryptosystems. The challenge of current PKC implementations is to deal with long numbers (160-2048 bits) in order to achieve system's efficiency, as well as security. RSA, still the most popular PKC, has at its root the modular exponentiation operation. Modular exponentiation consists of repeated modular multiplications, which is also the basic operation for ECC protocols. The solution proposed in this work uses a systolic array implementation and can be used for arbitrary precisions. We also present modular exponentiation based on the Montgomery's Multiplication Method (MMM).
Towards an FPGA Architecture Optimized for Public-Key Algorithms
- in The SPIE’s Symposium on Voice, Video, and Data Communications
, 1999
"... Cryptographic algorithms are constantly evolving to meet security needs, and modular arithmetic is an integral part of these algorithms, especially in the case of public-key cryptosystems. To achieve optimal system performance while maintaining physical security, it is desirable to implement cryptog ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Cryptographic algorithms are constantly evolving to meet security needs, and modular arithmetic is an integral part of these algorithms, especially in the case of public-key cryptosystems. To achieve optimal system performance while maintaining physical security, it is desirable to implement cryptographic algorithms in hardware. However, many public-key cryptographic algorithms require the implementation of modular arithmetic, specifically modular multiplication, for operands of 1024 bits in length. Additionally, algorithm agility is required to support algorithm independent protocols, a feature of most modern security protocols. Reprogrammability, particularly in-system reprogrammability, is critical in enabling the switching between cryptographic algorithms required for algorithm independent protocols. Field Programmable Gate Arrays (FPGAs) are a viable option for achieving this goal. Ideally, the targeted FPGA will have been designed with the architectural requirements for wide-oper...
modular multiplication algorithm on multi-core systems
- In Proc. IEEE Workshop on Signal Processing Systems (SIPS’07
, 2007
"... In this paper, we investigate the efficient software implementations of the Montgomery modular multiplication algorithm on a multi-core system. A HW/SW co-design technique is used to find the efficient system architecture and the instruction scheduling method. We first implement the Montgomery modul ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In this paper, we investigate the efficient software implementations of the Montgomery modular multiplication algorithm on a multi-core system. A HW/SW co-design technique is used to find the efficient system architecture and the instruction scheduling method. We first implement the Montgomery modular multiplication on a multi-core system with general purpose cores. We then speed up it by adopting the Multiply-Accumulate (MAC) operation in each core. As a result, the performance can be improved by a factor of 1.53 and 2.15 when 256-bit and 1024-bit Montgomery modular multiplication being performed, respectively.

