## On software parallel implementation of cryptographic pairings (2008)

Venue: | In Selected Areas in Cryptography – SAC 2008, number 5381 in Lecture Notes in Computer Science |

Citations: | 12 - 0 self |

### BibTeX

@INPROCEEDINGS{Grabher08onsoftware,

author = {P. Grabher and J. Großschädl},

title = {On software parallel implementation of cryptographic pairings},

booktitle = {In Selected Areas in Cryptography – SAC 2008, number 5381 in Lecture Notes in Computer Science},

year = {2008},

pages = {34--49},

publisher = {Springer}

}

### OpenURL

### Abstract

Abstract. A significant amount of research has focused on methods to improve the efficiency of cryptographic pairings; in part this work is motivated by the wide range of applications for such primitives. Although numerous hardware accelerators for pairing evaluation have used parallelism within extension field arithmetic to improve efficiency, similar techniques have not been examined in software thus far. In this paper we focus on parallelism within one pairing evaluation (intra-pairing), and parallelism between different pairing evaluations (inter-pairing). We identify several methods for exploiting such parallelism (extending previous results in the context of ECC) and show that it is possible to accelerate pairing evaluation by a significant factor in comparison to a naive approach. 1

### Citations

559 | Short signatures from the weil pairing
- Lynn, Shacham
- 2001
(Show Context)
Citation Context ...R = ∏n−1 i=0 e(Pi,Qi), actually capitalising on the parallelism between disjoint pairings is less well examined. This is despite the fact that numerous instances exist, verification of BLS signatures =-=[9]-=- to name one, where this could be useful. Identifying parallelism in algorithms for the pairing and constituent arithmetic is only the first step: in order to exploit said parallelism, one must have e... |

413 |
Modular Multiplication Without Trial Division
- Montgomery
- 1985
(Show Context)
Citation Context ...r (i.e. non-SIMD) as well as SIMD (i.e. MMX, SSE) instruction sets. Both implementations have in common that the modular multiplication (resp. squaring) operation is realised via Montgomery reduction =-=[34]-=-. The inversion is performed using the Extended Euclidean Algorithm (EEA). 4.1 Field Arithmetic with the IA32/IA64 Instruction Set The IA32 architecture provides an add-with-carry instruction (adc) an... |

376 | The state of elliptic curve cryptography - Koblitz, Menezes, et al. |

291 | Efficient Algorithms for PairingBased Cryptosystems
- Barreto, Kim, et al.
- 2002
(Show Context)
Citation Context ... topic, see the excellent description by Scott [38]. In short, improvement of seminal but unpublished work by Miller [33] resulted in the first practical algorithms for evaluation of the Tate pairing =-=[5, 19]-=-. These results were further optimised by Duursma and Lee [15] who developed an inexpensive, closed form for specific parameterisations later improved by Kwon [29]. Their techniques were generalised a... |

149 | Pairing-friendly elliptic curves of prime order
- Barreto, Naehrig
- 2006
(Show Context)
Citation Context ...ptic curve E(Fp) whose order n is divisible by some large prime r. Letk, the embedding degree of the curve, be the smallest positive integer such that r | p k − 1. A Barreto-Naehrig curve or BN-curve =-=[6]-=- of the form E(Fp) :y 2 = x 3 + b where b ̸= 0, satisfies these requirements. In particular, such a curve has prime order, i.e. r = n, and embedding degree k = 12. Additionally, the trace, curve order... |

142 |
Implementing the Tate pairing
- Galbraith, Harrison, et al.
- 2002
(Show Context)
Citation Context ... topic, see the excellent description by Scott [38]. In short, improvement of seminal but unpublished work by Miller [33] resulted in the first practical algorithms for evaluation of the Tate pairing =-=[5, 19]-=-. These results were further optimised by Duursma and Lee [15] who developed an inexpensive, closed form for specific parameterisations later improved by Kwon [29]. Their techniques were generalised a... |

130 | Efficient pairing computation on supersingular abelian varieties
- Barreto, Galbraith, et al.
(Show Context)
Citation Context ...timised by Duursma and Lee [15] who developed an inexpensive, closed form for specific parameterisations later improved by Kwon [29]. Their techniques were generalised and extended to produce the Eta =-=[4]-=- and Ate [23] pairings, currently considered the fastest means of evaluation. However, as well as the pairing itself, one depends on lower-level algorithms for arithmetic in the fields Fp, Fpk/2 and F... |

89 | The Eta Pairing Revisited
- Hess, Smart, et al.
- 2006
(Show Context)
Citation Context ...uursma and Lee [15] who developed an inexpensive, closed form for specific parameterisations later improved by Kwon [29]. Their techniques were generalised and extended to produce the Eta [4] and Ate =-=[23]-=- pairings, currently considered the fastest means of evaluation. However, as well as the pairing itself, one depends on lower-level algorithms for arithmetic in the fields Fp, Fpk/2 and Fpk. Previous ... |

86 |
Tate pairing implementation for hyperelliptic curves y 2 = x p − x + d
- Duursma, Lee
- 2003
(Show Context)
Citation Context ...provement of seminal but unpublished work by Miller [33] resulted in the first practical algorithms for evaluation of the Tate pairing [5, 19]. These results were further optimised by Duursma and Lee =-=[15]-=- who developed an inexpensive, closed form for specific parameterisations later improved by Kwon [29]. Their techniques were generalised and extended to produce the Eta [4] and Ate [23] pairings, curr... |

77 | Pairing-based cryptography at high security levels
- Koblitz, Menezes
- 2005
(Show Context)
Citation Context ...hereof uses the information at its sole risk and liability. ⋆⋆ The work described in this paper has been supported in part by EPSRC grant EP/E001556/1.realisation of said algorithms; see for example =-=[27, 20, 13]-=-. One can readily identify two types of parallelism within these algorithms and within pairing based cryptosystems more generally: that within a single pairing evaluation (intra-pairing) or between se... |

75 | Analyzing and comparing Montgomery multiplication algorithms
- Koç, Acar, et al.
- 1996
(Show Context)
Citation Context ...erformance at the expense of a slight increase in code footprint. Algorithm 3 shows the Coarsely Integrated Operand Scanning (CIOS) method for calculating the Montgomery product Z = A · B · 2−n mod M =-=[28]-=-. The n-bit operands A, B, M are represented by arrays of s single-precision w-bit words. The algorithm has a nested loop structure with two inner loops; the first contributes to the calculation of th... |

60 | A fast new DES implementation in software
- Biham
- 1997
(Show Context)
Citation Context ...tially mask) the bits so they are aligned at the same index ready for combination via a native, component-wise XOR. The technique of bit-slicing, proposed by Biham for efficient implementation of DES =-=[8]-=-, offers a way to reduce the associated overhead. Instead of representing the w-bit value x as one machine word, we represent x using w machine words where word i contains xi aligned at the same fixed... |

46 | Efficient arithmetic in finite field extensions with application in elliptic curve cryptography
- Bailey, Paar
(Show Context)
Citation Context ...he flexibility of ECC parameterisations helps somewhat in resolving this problem. One might view specific field representations such as Residue Number Systems (RNS) and Optimal Extension Fields (OEF) =-=[3]-=- as more suitable for vectorisation; parameterisation and parallel implementation over F2n has also been effective [7] since carries are essentially eliminated by the nature of arithmetic. Motivated b... |

28 | High security pairing-based cryptography revisited
- Granger, Page, et al.
- 2006
(Show Context)
Citation Context ...hereof uses the information at its sole risk and liability. ⋆⋆ The work described in this paper has been supported in part by EPSRC grant EP/E001556/1.realisation of said algorithms; see for example =-=[27, 20, 13]-=-. One can readily identify two types of parallelism within these algorithms and within pairing based cryptosystems more generally: that within a single pairing evaluation (intra-pairing) or between se... |

22 | Efficient hardware for the tate pairing calculation in characteristic three
- Kerins, Marnane, et al.
(Show Context)
Citation Context ...ropriate groups; our focus is on parallelism within algorithms for the pairing and constituent arithmetic. Efficient implementation of pairings in hardware have used this feature to great effect; see =-=[26]-=- for an example design where extension field arithmetic is realised using several parallel computational units to reduce latency. In the second case, the aim is to compute all n pairings Ri = e(Pi,Qi)... |

20 | R.: Implementing cryptographic pairings over Barreto-Naehrig curves
- Devegili, Scott, et al.
- 2007
(Show Context)
Citation Context ...(pk−1)/n of Fp can be parameterised by x as follows t(x) =6x 2 +1 n(x) =36x 4 − 36x 3 +18x 2 − 6x +1 p(x) =36x 4 − 36x 3 +24x 2 − 6x +1. We closely follow the excellent description of Devegili et al. =-=[14]-=- who show that by selecting x = −6917529027641089837 for example, one specifies a 256-bit value p and associated curve where n is of low Hamming weight. Selecting such an x makes the notation t(x), fo... |

20 | A Cipher for Multimedia Architectures?,” Selected Areas in Cryptography ’98, LNCS 1556, Henk Meijer, Eds
- Lipmaa, “IDEA
- 1998
(Show Context)
Citation Context ...trade-off between provision of computational resources and their utilisation. Use of SWAR style instruction sets have been successful used to accelerate kernels in symmetric cryptography; for example =-=[11, 10, 31, 36, 32]-=-. Although exploiting parallelism within point multiplication in vanilla Elliptic Curve Cryptography (ECC) is possible [2, 25], vectorisation of the public-key cryptography is often more problematic. ... |

15 |
Efficient Tate Pairing Computation for Elliptic Curves over Binary Fields
- Kwon
- 2005
(Show Context)
Citation Context ...or evaluation of the Tate pairing [5, 19]. These results were further optimised by Duursma and Lee [15] who developed an inexpensive, closed form for specific parameterisations later improved by Kwon =-=[29]-=-. Their techniques were generalised and extended to produce the Eta [4] and Ate [23] pairings, currently considered the fastest means of evaluation. However, as well as the pairing itself, one depends... |

14 | Performance Analysis and Parallel Implementation of Dedicated
- Nakajima, Matsui
- 2003
(Show Context)
Citation Context ...trade-off between provision of computational resources and their utilisation. Use of SWAR style instruction sets have been successful used to accelerate kernels in symmetric cryptography; for example =-=[11, 10, 31, 36, 32]-=-. Although exploiting parallelism within point multiplication in vanilla Elliptic Curve Cryptography (ECC) is possible [2, 25], vectorisation of the public-key cryptography is often more problematic. ... |

12 | A Design for Parallel Architectures
- Bosslaers, Govaerts, et al.
- 1997
(Show Context)
Citation Context ...trade-off between provision of computational resources and their utilisation. Use of SWAR style instruction sets have been successful used to accelerate kernels in symmetric cryptography; for example =-=[11, 10, 31, 36, 32]-=-. Although exploiting parallelism within point multiplication in vanilla Elliptic Curve Cryptography (ECC) is possible [2, 25], vectorisation of the public-key cryptography is often more problematic. ... |

9 | PLX: A fully subword-parallel instruction set architecture for fast scalable multimedia processing
- Lee, Fiskiran
- 2002
(Show Context)
Citation Context ...h, enhancements over SSE3 such as the pshufb instruction help to reduce said overhead but the instruction set still lacks features which could improve performance of our results. For example, the PLX =-=[30]-=- processor eases the issue of shuffles between sub-words by including odd and even multiplication, i.e. both and r2i+1...2i+0 = x2i+0 · y 2i+0 r2i+1...2i+0 = x2i+1 · y 2i+1 for i ∈{0, 1}. Another impr... |

8 |
R.: Multiplication and squaring on pairing-friendly fields. Cryptology ePrint Archive
- Devegili, hÉigeartaigh, et al.
- 2006
(Show Context)
Citation Context ...hereof uses the information at its sole risk and liability. ⋆⋆ The work described in this paper has been supported in part by EPSRC grant EP/E001556/1.realisation of said algorithms; see for example =-=[27, 20, 13]-=-. One can readily identify two types of parallelism within these algorithms and within pairing based cryptosystems more generally: that within a single pairing evaluation (intra-pairing) or between se... |

6 | Fast elliptic curve multiplications with SIMD operations,” in - Izu, Takagi - 1998 |

5 |
Elliptic curve cryptography on embedded multicore systems," Design Automation for Embedded Systems
- Fan, Sakiyama, et al.
- 2008
(Show Context)
Citation Context ...one on each core. The use of multi-core processors is an emerging research topic in the context of cryptographic implementation, for example Fan et al. investigate modular multiplication [16] and ECC =-=[17]-=- on this type of platform. Intra-pairing parallelism is clearly possible at the field arithmetic level as evidenced by related hardware based approaches [26]. In software however, the overhead of thre... |

5 |
On the power of bitslice implementation on
- Matsui, Nakakima
(Show Context)
Citation Context |

4 |
Vector Implementation of Multiprecision Arithmetic
- Crandall, Klivington
(Show Context)
Citation Context ... Motivated by application in RSA as well as ECC, there is a similar effort to accelerate arithmetic in Fp (or more exactly modulo some integer p). Work by Acar [1] and reports by Intel [24] and Apple =-=[12]-=- all investigate the use of SIMD parallelism for implementing multi-precision integer arithmetic. Acar states that his implementation of RSA on a processor with an MMX instruction set runs significant... |

3 | High-Speed Algorithms & Architectures For Number-Theoretic Cryptosystems
- Acar
- 1997
(Show Context)
Citation Context ... eliminated by the nature of arithmetic. Motivated by application in RSA as well as ECC, there is a similar effort to accelerate arithmetic in Fp (or more exactly modulo some integer p). Work by Acar =-=[1]-=- and reports by Intel [24] and Apple [12] all investigate the use of SIMD parallelism for implementing multi-precision integer arithmetic. Acar states that his implementation of RSA on a processor wit... |

3 |
Elliptic curve arithmetic using SIMD
- Aoki, Hoshino, et al.
- 2001
(Show Context)
Citation Context ...to accelerate kernels in symmetric cryptography; for example [11, 10, 31, 36, 32]. Although exploiting parallelism within point multiplication in vanilla Elliptic Curve Cryptography (ECC) is possible =-=[2, 25]-=-, vectorisation of the public-key cryptography is often more problematic. Consider two n-bit multi-precision integers x and y represented by l = ⌈n/w⌉ machine words where xi denotes the i-th such w-bi... |

3 |
Optimizing a Fast Stream Cipher for VLIW
- Clapp
- 1997
(Show Context)
Citation Context |

3 |
Parallel cryptographic arithmetic using a redundant montgomery representation
- Page, Smart
(Show Context)
Citation Context ... approach outlined above. This seems to have been first investigated by Montgomery in the context of ECM based factoring [35] and then rediscovered and applied in the context of RSA by Page and Smart =-=[37]-=-. Following the example above, the basic idea is that instead of representing an l-word multi-precision integer x by packing the digits xi into one SWAR vector, we slice the digits into l separate SWA... |

2 |
modular multiplication algorithm on multi-core systems
- Montgomery
- 2007
(Show Context)
Citation Context ...in parallel, one on each core. The use of multi-core processors is an emerging research topic in the context of cryptographic implementation, for example Fan et al. investigate modular multiplication =-=[16]-=- and ECC [17] on this type of platform. Intra-pairing parallelism is clearly possible at the field arithmetic level as evidenced by related hardware based approaches [26]. In software however, the ove... |

2 |
Short programs for functions on curves. Available at http://crypto.stanford.edu/miller, 1986. Accelerators for the Modified Tate Pairing over
- Miller
(Show Context)
Citation Context ...e most significant in terms of efficiency; for an overview of the evolution of this topic, see the excellent description by Scott [38]. In short, improvement of seminal but unpublished work by Miller =-=[33]-=- resulted in the first practical algorithms for evaluation of the Tate pairing [5, 19]. These results were further optimised by Duursma and Lee [15] who developed an inexpensive, closed form for speci... |

1 |
Efficient Galois Arithmetic on SIMD Architectures
- Bhaskar, Dubey, et al.
(Show Context)
Citation Context ...entations such as Residue Number Systems (RNS) and Optimal Extension Fields (OEF) [3] as more suitable for vectorisation; parameterisation and parallel implementation over F2n has also been effective =-=[7]-=- since carries are essentially eliminated by the nature of arithmetic. Motivated by application in RSA as well as ECC, there is a similar effort to accelerate arithmetic in Fp (or more exactly modulo ... |

1 |
Vector Microprocessors for Cryptography
- Fournier
- 2007
(Show Context)
Citation Context .... both and r2i+1...2i+0 = x2i+0 · y 2i+0 r2i+1...2i+0 = x2i+1 · y 2i+1 for i ∈{0, 1}. Another improvement would be provision of hardware support for add-withcarry via vector-carry registers; Fournier =-=[18]-=- investigates this approach within the context of a dedicated vector processor. The upcoming SSE5 instruction set offers an alternative approach by departing from purely 3-address instructions by addi... |

1 |
Using Streaming SIMD Extensions (SSE2) to Perform Big Multiplications
- Cooperation
- 2000
(Show Context)
Citation Context ... of arithmetic. Motivated by application in RSA as well as ECC, there is a similar effort to accelerate arithmetic in Fp (or more exactly modulo some integer p). Work by Acar [1] and reports by Intel =-=[24]-=- and Apple [12] all investigate the use of SIMD parallelism for implementing multi-precision integer arithmetic. Acar states that his implementation of RSA on a processor with an MMX instruction set r... |

1 |
Vectorization of the Elliptic Curve Method. Available at: ftp://ftp.cwi.nl/pub/ pmontgom/ecmvec.psl.gz
- Montgomery
(Show Context)
Citation Context ... digit-sliced SWAR, represents the digit based analogy of the bit based slicing approach outlined above. This seems to have been first investigated by Montgomery in the context of ECM based factoring =-=[35]-=- and then rediscovered and applied in the context of RSA by Page and Smart [37]. Following the example above, the basic idea is that instead of representing an l-word multi-precision integer x by pack... |

1 |
Implementing Cryptographic Pairings. Available at: ftp://ftp.computing.dcu.ie/pub/ resources/crypto/pairings.pdf
- Scott
(Show Context)
Citation Context ...high-level algorithms that relate to the pairing itself are clearly the most significant in terms of efficiency; for an overview of the evolution of this topic, see the excellent description by Scott =-=[38]-=-. In short, improvement of seminal but unpublished work by Miller [33] resulted in the first practical algorithms for evaluation of the Tate pairing [5, 19]. These results were further optimised by Du... |