Results 1 -
7 of
7
T.: Exploiting the Power of GPUs for Asymmetric Cryptography
, 2008
"... Abstract. Modern Graphics Processing Units (GPU) have reached a dimension with respect to performance and gate count exceeding conventional Central Processing Units (CPU) by far. Many modern computer systems include – beside a CPU – such a powerful GPU which runs idle most of the time and might be u ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Abstract. Modern Graphics Processing Units (GPU) have reached a dimension with respect to performance and gate count exceeding conventional Central Processing Units (CPU) by far. Many modern computer systems include – beside a CPU – such a powerful GPU which runs idle most of the time and might be used as cheap and instantly available co-processor for general purpose applications. In this contribution, we focus on the efficient realisation of the computationally expensive operations in asymmetric cryptosystems on such off-the-shelf GPUs. More precisely, we present improved and novel implementations employing GPUs as accelerator for RSA and DSA cryptosystems as well as for Elliptic Curve Cryptography (ECC). Using a recent Nvidia 8800GTS graphics card, we are able to compute 813 modular exponentiations per second for RSA or DSA-based systems with 1024 bit integers. Moreover, our design for ECC over the prime field P-224 even achieves the throughput of 1412 point multiplications per second.
Ultra High Performance ECC over NIST Primes on Commercial FPGAs
- In Proceedings of CHES
, 2008
"... Abstract. Elliptic Curve Cryptosystems (ECC) have gained increasing acceptance in practice due to their significantly smaller bit size of the operands compared to other public-key cryptosystems. Since their computational complexity is often lower than in the case of RSA or discrete logarithm schemes ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Abstract. Elliptic Curve Cryptosystems (ECC) have gained increasing acceptance in practice due to their significantly smaller bit size of the operands compared to other public-key cryptosystems. Since their computational complexity is often lower than in the case of RSA or discrete logarithm schemes, ECC are often chosen for high performance publickey applications. However, despite a wealth of research regarding highspeed software and high-speed FPGA implementation of ECC since the mid 1990s, providing truly high-performance ECC on readily available (i.e., non-ASIC) platforms remains an open challenge. This holds especially for ECC over prime fields, which are often preferred over binary fields due to standards in Europe and the US. This work presents a new architecture for an FPGA-based ultra high performance ECC implementation over prime fields. Our architecture makesintensiveuseoftheDSPblocksinmodernFPGAs,whichare embedded arithmetic units actually intended to accelerate digital signal processing algorithms. We describe a novel architecture and algorithms for performing ECC arithmetic and describe the actual implementation of standard compliant ECC based on the NIST primes P-224 and P-256. We show that ECC on Xilinx’s Virtex-4 SX55 FPGA can be performed at a rate of more than 37,000 point multiplications per second. Our architecture outperforms all single-chip hardware implementations over prime fields in the open literature by a wide margin.
Enhancing COPACOBANA for Advanced Applications in Cryptography and Cryptanalysis
"... Cryptanalysis of symmetric and asymmetric ciphers is a challenging task due to the enormous amount of involved computations. To tackle this computational complexity, usually the employment of specialpurpose hardware is considered as best approach. We have built a massively parallel cluster system (C ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Cryptanalysis of symmetric and asymmetric ciphers is a challenging task due to the enormous amount of involved computations. To tackle this computational complexity, usually the employment of specialpurpose hardware is considered as best approach. We have built a massively parallel cluster system (COPA-COBANA) based on low-cost FPGAs as a cost-efficient platform primarily targeting cryptanalytical operations with these high computational efforts but low communication and memory requirements. However, some parallel applications in the field of cryptography are too complex for low-cost FPGAs and also require the availability of at least moderate communication and memory facilities. Particularly, this holds true for arithmetic intensive application as well as ones with a highly complex data flow. In this contribution, we describe a novel architecture for a more versatile and reliable COPACOBANA capable to host advanced cryptographic applications like high-performance digital signature generation according to the Elliptic Curve Digital Signature Algorithm (ECDSA) and integer factorization based on the Elliptic Curve Method (ECM). In addition to that, the new cluster design allows even to run more supercomputing applications beyond the field of cryptography. 1.
Offline Submission with RSA Time-Lock Puzzles
"... Abstract—We introduce a non-interactive RSA time-lock puzzle scheme whose level of difficulty can be arbitrarily chosen by artificially enlarging the public exponent. Solving a puzzle for a message m means for Bob to encrypt m with Alice’s public puzzle key by repeated modular squaring. The number o ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract—We introduce a non-interactive RSA time-lock puzzle scheme whose level of difficulty can be arbitrarily chosen by artificially enlarging the public exponent. Solving a puzzle for a message m means for Bob to encrypt m with Alice’s public puzzle key by repeated modular squaring. The number of squarings to perform determines the puzzle complexity. This puzzle is non-parallelizable. Thus, the solution time cannot be shortened significantly by employing many machines and it varies only slightly across modern CPUs. Alice can quickly verify the puzzle solution by decrypting the ciphertext with a regular private key operation. Our main contribution is an offline submission protocol which enables an author being currently offline to commit to his document before the deadline by continuously solving an RSA puzzle based on that document. When regaining Internet connectivity, he submits his document along with the puzzle solution which is a proof for the timely completion of the document. We have implemented a platform-independent tool performing all parts of our offline submission protocol: puzzle benchmark, issuing a time-lock RSA certificate, solving a puzzle and finally verifying the solution for a submitted document. Two other applications we propose for RSA time-lock puzzles are trial certificates from a well-known CA and a CEO disclosing the signing private key to his deputy. I.
Non-Parallelizable and Non-Interactive Client Puzzles from Modular Square Roots
"... Abstract—Denial of Service (DoS) attacks aiming to exhaust the resources of a server by overwhelming it with bogus requests have become a serious threat. Especially protocols that rely on public key cryptography and perform expensive authentication handshakes may be an easy target. A well-known coun ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract—Denial of Service (DoS) attacks aiming to exhaust the resources of a server by overwhelming it with bogus requests have become a serious threat. Especially protocols that rely on public key cryptography and perform expensive authentication handshakes may be an easy target. A well-known countermeasure against DoS attacks are client puzzles. The victimized server demands from the clients to commit computing resources before it processes their requests. To get service, a client must solve a cryptographic puzzle and submit the right solution. Existing client puzzle schemes have some drawbacks. They are either parallelizable, coarse-grained or can be used only interactively. In case of interactive client puzzles where the server poses the challenge an attacker might mount a counterattack on the clients by injecting fake packets containing bogus puzzle parameters. In this paper we introduce a novel scheme for client puzzles which relies on the computation of square roots modulo a prime. Modular square root puzzles are non-parallelizable, i. e., the solution cannot be obtained faster than scheduled by distributing the puzzle to multiple machines or CPU cores, and they can be employed both interactively and non-interactively. Our puzzles provide polynomial granularity and compact solution and verification functions. Benchmark results demonstrate the feasibility of our approach to mitigate DoS attacks on hosts in 1 or even 10 GBit networks. In addition, we show how to raise the efficiency of our puzzle scheme by introducing a bandwidth-based cost factor for the client. Keywords—client puzzles, Denial of Service (DoS), network protocols, authentication, computational puzzles
Establishing Dedicated Functions on FPGA Devices for High-Performance Cryptography
"... Abstract — This work presents a unique design approach to implement standardized symmetric and asymmetric cryptosystems on modern FPGA devices. While most other FPGA implementations optimize cryptosystems on an algorithmic level for being optimally placed in the generic logic, our primary goal is to ..."
Abstract
- Add to MetaCart
Abstract — This work presents a unique design approach to implement standardized symmetric and asymmetric cryptosystems on modern FPGA devices. While most other FPGA implementations optimize cryptosystems on an algorithmic level for being optimally placed in the generic logic, our primary goal is to shift as many cryptographic operations as possible into specific hard cores that have become available on modern reconfigurable devices. Such dedicated functions provide, for example, large blocks of memory or accelerated arithmetic functions for digital signal processing applications. Using these dedicated function, we present specific design approaches that enable a performance for the symmetric AES block cipher (FIPS 197) of up to 55 GBit/s and a throughput of more than 30.000 scalar multiplications per second for asymmetric Elliptic Curve Cryptography over NIST’s P-224 prime (FIPS 186-3).

