Results 1  10
of
12
Faster and TimingAttack Resistant AESGCM. IACR Cryptology ePrint Archive, report 2009/129
, 2009
"... Abstract. We present a bitsliced implementation of AES encryption in counter mode for 64bit Intel processors. Running at 7.81 cycles/byte on a Core 2, it is up to 25 % faster than previous implementations, while simultaneously offering protection against timing attacks. In particular, it is the onl ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
(Show Context)
Abstract. We present a bitsliced implementation of AES encryption in counter mode for 64bit Intel processors. Running at 7.81 cycles/byte on a Core 2, it is up to 25 % faster than previous implementations, while simultaneously offering protection against timing attacks. In particular, it is the only cachetimingattack resistant implementation offering competitive speeds for stream as well as for packet encryption: for 576byte packets, we improve performance over previous bitsliced implementations by more than a factor of 2. We also report more than 30 % improved speeds for lookuptable based Galois/Counter mode authentication, achieving 11.51 cycles/byte for authenticated encryption. Furthermore, we present the first constanttime implementation of AESGCM that has a reasonable speed of 22.19 cycles/byte, thus offering a full suite of timinganalysis resistant software for authenticated encryption. Keywords: AES, Galois/Counter mode, cachetiming attacks, fast implementations 1
Batch binary Edwards
 In Crypto 2009, volume 5677 of LNCS
, 2009
"... Abstract. This paper sets new software speed records for highsecurity DiffieHellman computations, specifically 251bit ellipticcurve variablebasepoint scalar multiplication. In one second of computation on a $200 Core 2 Quad Q6600 CPU, this paper’s software performs 30000 251bit scalar multipli ..."
Abstract

Cited by 19 (8 self)
 Add to MetaCart
(Show Context)
Abstract. This paper sets new software speed records for highsecurity DiffieHellman computations, specifically 251bit ellipticcurve variablebasepoint scalar multiplication. In one second of computation on a $200 Core 2 Quad Q6600 CPU, this paper’s software performs 30000 251bit scalar multiplications on the binary Edwards curve d(x + x 2 + y + y 2) = (x + x 2)(y + y 2) over the field F2[t]/(t 251 + t 7 + t 4 + t 2 + 1) where d = t 57 + t 54 + t 44 + 1. The paper’s fieldarithmetic techniques can be applied in much more generality but have a particularly efficient interaction with the completeness of addition formulas for binary Edwards curves. Keywords. Scalar multiplication, Diffie–Hellman, batch throughput, vectorization, Karatsuba, Toom, elliptic curves, binary Edwards curves, differential addition, complete addition formulas 1
Highspeed highsecurity signatures
"... Abstract. This paper shows that a $390 massmarket quadcore 2.4GHz Intel Westmere (Xeon E5620) CPU can create 109000 signatures per second and verify 71000 signatures per second on an elliptic curve at a 2 128 security level. Public keys are 32 bytes, and signatures are 64 bytes. These performance ..."
Abstract

Cited by 14 (4 self)
 Add to MetaCart
(Show Context)
Abstract. This paper shows that a $390 massmarket quadcore 2.4GHz Intel Westmere (Xeon E5620) CPU can create 109000 signatures per second and verify 71000 signatures per second on an elliptic curve at a 2 128 security level. Public keys are 32 bytes, and signatures are 64 bytes. These performance figures include strong defenses against software sidechannel attacks: there is no data flow from secret keys to array indices, and there is no data flow from secret keys to branch conditions.
NEON crypto
"... Abstract. NEON is a vector instruction set included in a large fraction of new ARMbased tablets and smartphones. This paper shows that NEON supports highsecurity cryptography at surprisingly high speeds; normally data arrives at lower speeds, giving the CPU time to handle tasks other than cryptogr ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
(Show Context)
Abstract. NEON is a vector instruction set included in a large fraction of new ARMbased tablets and smartphones. This paper shows that NEON supports highsecurity cryptography at surprisingly high speeds; normally data arrives at lower speeds, giving the CPU time to handle tasks other than cryptography. In particular, this paper explains how to use a single 800MHz Cortex A8 core to compute the existing NaCl suite of highsecurity cryptographic primitives at the following speeds: 5.60 cycles per byte (1.14 Gbps) to encrypt using a shared secret key, 2.30 cycles per byte (2.78 Gbps) to authenticate using a shared secret key, 527102 cycles (1517/second) to compute a shared secret key for a new public key, 650102 cycles (1230/second) to verify a signature, and 368212 cycles (2172/second) to sign a message. These speeds make no use of secret branches and no use of secret memory addresses.
Kummer strikes back: new DH speed records
 In Cryptology ePrint Archive, Report 2014/134
, 2014
"... Abstract. This paper introduces highsecurity constanttime variablebasepoint Diffie–Hellman soft ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract. This paper introduces highsecurity constanttime variablebasepoint Diffie–Hellman soft
Investigating the potential of custom instruction set extensions for SHA3 candidates on a 16bit microcontroller architecture,” Cryptology ePrint Archive, Report 2012/050
, 2012
"... Abstract. In this paper, we investigate the benefit of instruction set extensions for software implementations of all five SHA3 candidates. To this end, we start from optimized assembly code for a common 16bit microcontroller instruction set architecture. By themselves, these implementations provi ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper, we investigate the benefit of instruction set extensions for software implementations of all five SHA3 candidates. To this end, we start from optimized assembly code for a common 16bit microcontroller instruction set architecture. By themselves, these implementations provide reference for complexity of the algorithms on 16bit architectures, commonly used in embedded systems. For each algorithm, we then propose suitable instruction set extensions and implement the modified processor core. We assess the gains in throughput, memory consumption, and the area overhead. Our results show that with less than 10 % additional area, it is possible to increase the execution speed on average by almost 40%, while reducing memory requirements on average by more than 40%. In particular, the Grøstl algorithm, which was one of the slowest algorithms in previous reference implementations, ends up being the fastest implementation by some margin, once minor (but dedicated) instruction set extensions are taken into account.
McBits: fast constanttime codebased cryptography
"... Abstract. This paper presents extremely fast algorithms for codebased publickey cryptography, including full protection against timing attacks. For example, at a 2 128 security level, this paper achieves a reciprocal decryption throughput of just 60493 cycles (plus cipher cost etc.) on a single Iv ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. This paper presents extremely fast algorithms for codebased publickey cryptography, including full protection against timing attacks. For example, at a 2 128 security level, this paper achieves a reciprocal decryption throughput of just 60493 cycles (plus cipher cost etc.) on a single Ivy Bridge core. These algorithms rely on an additive FFT for fast root computation, a transposed additive FFT for fast syndrome computation, and a sorting network to avoid cachetiming attacks.
XBX Benchmarking Results January 2012
"... We benchmarked many implementations of all remaining SHA3 candidate algorithms on several platforms. The benchmarking method used in this report is called XBX, short for eXternal Benchmarking eXtension, an extension of the SUPERCOPeBASH framework [7] that allows benchmarking small devices. For det ..."
Abstract
 Add to MetaCart
(Show Context)
We benchmarked many implementations of all remaining SHA3 candidate algorithms on several platforms. The benchmarking method used in this report is called XBX, short for eXternal Benchmarking eXtension, an extension of the SUPERCOPeBASH framework [7] that allows benchmarking small devices. For details on how XBX works, please see [3]. The main sources of candidate implementations were SUPERCOP, the
Curve41417: Karatsuba revisited
"... Abstract. This paper introduces constanttime ARM CortexA8 ECDH software that (1) is faster than the fastest ECDH option in the latest version of OpenSSL but (2) achieves a security level above 2200 using a prime above 2400. For comparison, this OpenSSL ECDH option is not constanttime and has a se ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. This paper introduces constanttime ARM CortexA8 ECDH software that (1) is faster than the fastest ECDH option in the latest version of OpenSSL but (2) achieves a security level above 2200 using a prime above 2400. For comparison, this OpenSSL ECDH option is not constanttime and has a security level of only 280. The new speeds are achieved in a quite different way from typical primefield ECC software: they rely on a synergy between Karatsuba’s method and choices of radix smaller than the CPU word size.