Results 1  10
of
57
Security as a new dimension in embedded system design
 In Proceedings of the 41st Design Automation Conference (DAC ’04
, 2004
"... The growing number of instances of breaches in information security in the last few years has created a compelling case for efforts towards secure electronic systems. Embedded systems, which will be ubiquitously used to capture, store, manipulate, and access data of a sensitive nature, pose several ..."
Abstract

Cited by 60 (4 self)
 Add to MetaCart
(Show Context)
The growing number of instances of breaches in information security in the last few years has created a compelling case for efforts towards secure electronic systems. Embedded systems, which will be ubiquitously used to capture, store, manipulate, and access data of a sensitive nature, pose several unique and interesting security challenges. Security has been the subject of intensive research in the areas of cryptography, computing, and networking. However, despite these efforts, security is often misconstrued by designers as the hardware or software implementation of specific cryptographic algorithms and security protocols. In reality, it is an entirely new metric that designers should consider throughout the design process, along with other metrics such as cost, performance, and power. This paper is intended to introduce embedded system designers and design tool developers to the challenges involved in designing
A Scalable Architecture for Modular Multiplication Based on Montgomery's Algorithm
 IEEE TRANSACTIONS ON COMPUTERS
, 2003
"... This paper presents a scalable architecture for the computation of modular multiplication, based on the Montgomery multiplication (MM) algorithm. A wordbased version of MM is presented and used to explain the main concepts in the hardware design. The proposed multiplier is able to work with any pr ..."
Abstract

Cited by 41 (2 self)
 Add to MetaCart
(Show Context)
This paper presents a scalable architecture for the computation of modular multiplication, based on the Montgomery multiplication (MM) algorithm. A wordbased version of MM is presented and used to explain the main concepts in the hardware design. The proposed multiplier is able to work with any precision of the input operands, limited only by memory or control constraints. Its architecture gives enough freedom to select the word size and the degree of parallelism to be used, according to the available area and/or desired performance. Design trade offs are analyzed in order to identify adequate hardware configurations for a given area or bandwidth requirement.
HighRadix Design of a Scalable Modular Multiplier
 in Cryptographic Hardware and Embedded Systems — CHES 2001, Ç. K. Koç and C. Paar, Eds. 2001, Lecture Notes in Computer Science
, 2001
"... This paper describes an algorithm and architecture based on an extension of a scalable radix2 architecture proposed in a previous work. The algorithm is proven to be correct and the hardware design is discussed in detail. Experimental results are shown to compare a radix8 implementation with a ..."
Abstract

Cited by 30 (8 self)
 Add to MetaCart
(Show Context)
This paper describes an algorithm and architecture based on an extension of a scalable radix2 architecture proposed in a previous work. The algorithm is proven to be correct and the hardware design is discussed in detail. Experimental results are shown to compare a radix8 implementation with a radix2 design. The scalable Montgomery multiplier is adjustable to constrained areas yet being able to work on any given precision of the operands. Similar to some systolic implementations, this design avoid the high load on signals that broadcast to several components, making the delay independent of operand's precision.
Instruction Set Extensions for Fast Arithmetic in Finite Fields GF(p) and GF(2m)
 CRYPTOGRAPHIC HARDWARE AND EMBEDDED SYSTEMS — CHES 2004
, 2004
"... Abstract. Instruction set extensions are a small number of custom instructions specifically designed to accelerate the processing of a given kind of workload such as multimedia or cryptography. Enhancing a generalpurpose RISC processor with a few applicationspecific instructions to facilitate the ..."
Abstract

Cited by 18 (6 self)
 Add to MetaCart
Abstract. Instruction set extensions are a small number of custom instructions specifically designed to accelerate the processing of a given kind of workload such as multimedia or cryptography. Enhancing a generalpurpose RISC processor with a few applicationspecific instructions to facilitate the inner loop operations of publickey cryptosystems can result in a significant performance gain. In this paper we introduce a set of five custom instructions to accelerate arithmetic operations in finite fields GF(p) and GF(2^m). The custom instructions can be easily integrated into a standard RISC architecture like MIPS32 and require only little extra hardware. Our experimental results show that an extended MIPS32 core is able to perform an elliptic curve scalar multiplication over a 192bit prime field in 36 msec, assuming a clock speed of 33 MHz. An elliptic curve scalar multiplication over the binary field GF(2^191) takes only 21 msec, which is approximately six times faster than a software implementation on a standard MIPS32 processor.
Instruction Set Extension for Fast Elliptic Curve Cryptography Over Binary Finite Fields GF(2m)
 IN PROCEEDINGS OF THE 14TH IEEE INTERNATIONAL CONFERENCE ON APPLICATIONSPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 2003)
, 2003
"... The performance of elliptic curve (EC) cryptosystems depends essentially on efficient arithmetic in the underlying finite field. Binary finite fields GF(2m) have the advantage of “carryfree” addition. Multiplication, on the other hand, is rather costly since polynomial arithmetic is not supported b ..."
Abstract

Cited by 13 (7 self)
 Add to MetaCart
The performance of elliptic curve (EC) cryptosystems depends essentially on efficient arithmetic in the underlying finite field. Binary finite fields GF(2m) have the advantage of “carryfree” addition. Multiplication, on the other hand, is rather costly since polynomial arithmetic is not supported by generalpurpose processors. In this paper we propose a combined hardware/software approach to overcome this problem. First, we outline that multiplication of binary polynomials can be easily integrated into a multiplier datapath for integers without significant additional hardware. Then, we present new algorithms for multipleprecision arithmetic in GF(2m) based on the availability of an instruction for singleprecision multiplication of binary polynomials. The proposed hardware/software approach is considerably faster than a “conventional” software implementation and well suited for constrained devices like smart cards. Our experimental results show that an enhanced 16bit RISC processor is able to generate a 191bit ECDSA signature in less than 650 msec when the core is clocked at 5 MHz.
Scalable and unified hardware to compute montgomery inverse
 in GF(p) and GF(2 n ),” Cryptographic Hardware and Embedded Systems  CHES 2002, 4th International Workshop
, 2003
"... Abstract. Computing the inverse of a number in finite fields GF(p) or GF(2 n) is equally important for cryptographic applications. This paper proposes a novel scalable and unified architecture for a Montgomery inverse hardware that operates in both GF(p) and GF(2 n) fields. We adjust and modify a GF ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
(Show Context)
Abstract. Computing the inverse of a number in finite fields GF(p) or GF(2 n) is equally important for cryptographic applications. This paper proposes a novel scalable and unified architecture for a Montgomery inverse hardware that operates in both GF(p) and GF(2 n) fields. We adjust and modify a GF(2 n) Montgomery inverse algorithm to accommodate multibit shifting hardware, making it very similar to a previously proposed GF(p) algorithm. The architecture is intended to be scalable, which allows the hardware to compute the inverse of long precision numbers in a repetitive way. After implementing this unified design it was compared with other designs. The unified hardware was found to be eight times smaller than another reconfigurable design, with comparable performance. Even though the unified design consumes slightly more area and it is slightly slower than the scalable inverter implementations for GF(p) only, it is a practical solution whenever arithmetic in the two finite fields is needed. 1
An efficient and scalable radix4 modular multiplier design using recoding techniques
 Proc Asilomar Conf. Signals, Systems, and Computers
, 2003
"... Abstract — This paper presents the algorithm and architecture of a scalable radix4 Montgomery Multiplier. The straightforward implementation of a radix4 design based on the techniques already published results in a poor solution. In this paper we present an algorithm and architecture for the scala ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
Abstract — This paper presents the algorithm and architecture of a scalable radix4 Montgomery Multiplier. The straightforward implementation of a radix4 design based on the techniques already published results in a poor solution. In this paper we present an algorithm and architecture for the scalable radix4 multiplier that makes use of two types of digit recoding in order to generate an efficient solution. The wordbyword algorithm used in the multiplier gives to the designer the freedom to select the level of parallelism according to the available area. Experimental results are shown to demonstrate that the proposed radix4 Montgomery Multiplier design has better area/performance tradeoff than previous radix2 and 8 scalable designs. I.
Hardware Implementation of a Montgomery Modular Multiplier in a Systolic Array
"... This paper describes a hardware architecture for modular multiplication operation which is efficient for bitlengths suitable for both commonly used types of Public Key Cryptography (PKC) i.e. ECC and RSA Cryptosystems. The challenge of current PKC implementations is to deal with long numbers (1602 ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
This paper describes a hardware architecture for modular multiplication operation which is efficient for bitlengths suitable for both commonly used types of Public Key Cryptography (PKC) i.e. ECC and RSA Cryptosystems. The challenge of current PKC implementations is to deal with long numbers (1602048 bits) in order to achieve system's efficiency, as well as security. RSA, still the most popular PKC, has at its root the modular exponentiation operation. Modular exponentiation consists of repeated modular multiplications, which is also the basic operation for ECC protocols. The solution proposed in this work uses a systolic array implementation and can be used for arbitrary precisions. We also present modular exponentiation based on the Montgomery's Multiplication Method (MMM).
A Performance Evaluation of ARM ISA Extension for Elliptic Curve Cryptography Over Binary Finite Fields
 in Proceedings of the Sixteenth Symposium on Computer Architecture and High Performance Computing — SBCPAD 2004, Foz do Iguaçu
"... In this paper, we present an evaluation of possible ARM instruction set extension for Elliptic Curve Cryptography (ECC) over binary finite fields GF(2 m). The use of elliptic curve cryptography is becoming common in embedded domain, where its reduced key size at a security level equivalent to standa ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
In this paper, we present an evaluation of possible ARM instruction set extension for Elliptic Curve Cryptography (ECC) over binary finite fields GF(2 m). The use of elliptic curve cryptography is becoming common in embedded domain, where its reduced key size at a security level equivalent to standard publickey methods (such as RSA) allows for power consumption savings and more efficient operation. ARM processor was selected because it is widely used for embedded system applications. We developed an ECC benchmark set with three widely used publickey algorithms: DiffieHellman for key exchange, digital signature algorithm, as well as ElGamal method for encryption/decryption. We analyzed the major bottlenecks at function level and evaluated the performance improvement, when we introduce some simple architectural support in the ARM ISA. Results of our experiments show that the use of a wordlevel multiplication instruction over binary field allows for an average 33 % reduction of the total number of dynamically executed instructions, while execution time improves by the same amount when projective coordinates are used. 1.
An efficient reconfigurable multiplier architecture for Galois field
 In Proceedings of the Microelectronics Journal
"... This paper describes an efficient architecture of a reconfigurable bitserial polynomial basis multiplier for Galois field GFð2mÞ; where 1, m # M: The value m; of the irreducible polynomial degree, can be changed and so, can be configured and programmed. The value of M determines the maximum size th ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
This paper describes an efficient architecture of a reconfigurable bitserial polynomial basis multiplier for Galois field GFð2mÞ; where 1, m # M: The value m; of the irreducible polynomial degree, can be changed and so, can be configured and programmed. The value of M determines the maximum size that the multiplier can support. The advantages of the proposed architecture are (i) the high order of flexibility, which allows an easy configuration for different field sizes, and (ii) the low hardware complexity, which results in small area. By using the gated clock technique, significant reduction of the total multiplier power consumption is achieved.