- •Guide to Elliptic Curve Cryptography
- •Contents
- •List of Algorithms
- •List of Tables
- •List of Figures
- •Acronyms
- •Preface
- •1 Introduction and Overview
- •1.1 Cryptography basics
- •1.2.3 Elliptic curve systems
- •1.3 Why elliptic curve cryptography?
- •1.4 Roadmap
- •2 Finite Field Arithmetic
- •2.2.1 Addition and subtraction
- •2.2.2 Integer multiplication
- •2.2.3 Integer squaring
- •2.2.4 Reduction
- •2.2.5 Inversion
- •2.3.1 Addition
- •2.3.2 Multiplication
- •2.3.3 Polynomial multiplication
- •2.3.4 Polynomial squaring
- •2.3.5 Reduction
- •2.4.1 Addition and subtraction
- •2.4.2 Multiplication and reduction
- •2.4.3 Inversion
- •3 Elliptic Curve Arithmetic
- •3.1 Introduction to elliptic curves
- •3.1.2 Group law
- •3.1.3 Group order
- •3.1.4 Group structure
- •3.2.1 Projective coordinates
- •3.3 Point multiplication
- •3.3.1 Unknown point
- •3.3.2 Fixed point
- •3.3.3 Multiple point multiplication
- •3.4 Koblitz curves
- •3.4.1 The Frobenius map and the ring Z[τ ]
- •3.4.2 Point multiplication
- •3.6 Point multiplication using halving
- •3.6.1 Point halving
- •3.6.3 Point multiplication
- •3.7 Point multiplication costs
- •4 Cryptographic Protocols
- •4.1 The elliptic curve discrete logarithm problem
- •4.2.3 Determining the number of points on an elliptic curve
- •4.4 Signature schemes
- •4.4.1 ECDSA
- •4.4.2 EC-KCDSA
- •4.5.1 ECIES
- •4.5.2 PSEC
- •4.6.1 Station-to-station
- •4.6.2 ECMQV
- •5 Implementation Issues
- •5.1 Software implementation
- •5.1.1 Integer arithmetic
- •5.1.5 Timings
- •5.2 Hardware implementation
- •5.3 Secure implementation
- •5.3.1 Power analysis attacks
- •5.3.2 Electromagnetic analysis attacks
- •5.3.4 Fault analysis attacks
- •5.3.5 Timing attacks
- •A.1 Irreducible polynomials
- •A.2 Elliptic curves
- •A.2.2 Random elliptic curves over F2m
- •A.2.3 Koblitz elliptic curves over F2m
- •C.1 General-purpose tools
- •C.2 Libraries
- •Bibliography
- •Index
3.3. Point multiplication |
109 |
where v(2w − 1) points P[v ][u] for v [0, v − 1] and u [1, 2w − 1] are precomputed. A point multiplication algorithm based on this method is expected to require approxi-
|
− |
|
≈ wv − |
|
w − |
2− |
|
|||
mately e |
|
1 |
|
t |
|
1 point doublings and ( |
t |
|
1) 2w w |
1 point additions. Algorithms |
|
|
|
|
|
3.44 and 3.45 are the cases v = 1 and v = 2, respectively.
3.3.3Multiple point multiplication
One method to potentially speed the computation of k P + l Q is simultaneous multiple point multiplication (Algorithm 3.48), also known as Shamir’s trick. If k and l are t-bit numbers, then their binary representations are written in a 2×t matrix known as the exponent array. Given width w, the values i P + j Q are calculated for 0 ≤ i, j < 2w . At each of t/w steps, the accumulator receives w doublings and an addition from the table of values i P + j Q determined by the contents of a 2×w window passed over the exponent array; see Figure 3.7.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
w bits |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
k P |
= |
K d−1 |
· · · |
|
K i |
|
|
· · · |
|
K 0 |
P |
|
Precomputation |
|
|
|
|
|
|||||||||||||||
|
|
|
|
|
|
|
Ld−1 |
|
Li |
|
|
|
L0 |
Q |
|
(2 |
2w |
− 1 points) |
|
|
|
|
|
|||||||||||||||
|
|
|
|
|
l Q = |
· · · |
|
|
|
· · · |
|
|
|
|
|
|
|
|
||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
0P |
+ |
1Q |
|
|
|
|
|
|||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
lookup |
|
|
|
|
|
|
|
|
|
. |
|
|
|
|
|
||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
R ← 2w R + |
(K i P + Li Q) |
|
|
(2w −1) P+(2w −1)Q |
|
|
|
|
|
|||||||||||||||||||
|
|
|
|
|
|
Figure 3.7. Simultaneous point multiplication accumulation step. |
|
|
|
|||||||||||||||||||||||||||||
* |
Algorithm 3.48 has an expected running time of approximately |
+ |
|
− |
+ |
|||||||||||||||||||||||||||||||||
|
· |
|
− |
|
− |
|
− |
|
− |
|
|
+ |
|
|
|
− |
|
− |
|
|
− |
|
+ |
+ * |
22w |
|
|
− |
|
|||||||||
|
(3 |
|
22(w |
|
1) |
|
2w |
|
1 |
|
1) A |
|
(22(w |
|
|
1) |
|
2w |
|
1)D |
|
|
22w −1 |
d |
1 A |
|
|
(d |
|
1)w D , |
||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||||||||
and requires storage for 22w − 1 points. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(3.30) |
||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Algorithm 3.48 Simultaneous multiple point multiplication
INPUT: Window width w, k = (kt−1, . . . , k0)2, l = (lt−1, . . . , l0)2, P, Q E(Fq ).
OUTPUT: k P + l Q.
1.Compute i P + j Q for all i, j [0, 2w − 1].
2.Write k = (K d−1, . . . , K 1, K 0) and l = (Ld−1, . . . , L1, L0) where each K i , Li is a bitstring of length w, and d = t/w .
3.R ← ∞.
4.For i from d − 1 downto 0 do
4.1R ← 2w R.
4.2R ← R + (K i P + Li Q).
5.Return(R).
110 3. Elliptic Curve Arithmetic
Algorithm 3.48 can be improved by use of a sliding window. At each step, placement of a window of width at most w is such that the right-most column is nonzero. Precomputation storage is reduced by 22(w−1) − 1 points. The improved algorithm is expected to have t/(w + (1/3)) point additions in the evaluation stage, a savings of
approximately 9% (in evaluation stage additions) compared with Algorithm 3.48 for w {2, 3}.
Joint sparse form
If k and l are each written in NAF form, then the expected number of zero columns in the exponent array increases, so that the expected number of additions in the evaluation stage of a suitably modified Algorithm 3.48 (processing one column at a time) is 5t/9. The expected number of zero columns can be increased by choosing signed binary expansions of k and l jointly. The joint sparse form (JSF) exponent array of positive integers k and l is characterized by the following properties.
1.At least one of any three consecutive columns is zero.
2.Consecutive terms in a row do not have opposite signs.
3.If k j+1k j = 0 then l j+1 = 0 and l j = 0. If l j+1l j = 0 then k j+1 = 0 and k j = 0.
The representation has minimal weight among all joint signed binary expansions, where the weight is defined to be the number of nonzero columns.
Example 3.49 (joint sparse form) |
The following table gives exponent arrays for k = |
|||||||||||||||||
53 and l = 102. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
binary |
|
|
|
|
NAF |
|
|
|
joint sparse form |
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
k = 53 |
0 |
1 |
1 0 |
1 0 |
1 |
0 1 |
0 |
−1 0 |
1 |
0 1 |
1 |
0 0 |
−1 0 |
−1 |
−1 |
|
|
|
l = 102 |
1 |
1 |
0 0 |
1 1 |
0 |
1 |
0 |
−1 |
0 1 |
0 |
−1 0 |
1 |
1 0 |
1 0 |
−1 |
0 |
|
|
weight |
|
|
6 |
|
|
|
|
|
8 |
|
|
|
|
5 |
|
|
|
If Algorithm 3.48 is modified to use JSF, processing a single column in each iteration, then t/2 additions (rather than 5t/9 using NAFs) are required in the evaluation stage. Algorithm 3.50 finds the joint sparse form for integers k1 and k2 . Although it is written in terms of integer operations, in fact only simple bit arithmetic is required; for example, evaluation modulo 8 means that three bits must be examined, and ki /2 discards the rightmost bit.
3.3. Point multiplication |
111 |
Algorithm 3.50 Joint sparse form
INPUT: Nonnegative integers k1 and k2, not both zero.
OUTPUT: JSF(k2, k1), the joint sparse form of k1 and k2.
1.l ← 0, d1 ← 0, d2 ← 0.
2.While (k1 + d1 > 0 or k2 + d2 > 0) do
2.11 ← d1 + k1, 2 ← d2 + k2.
2.2For i from 1 to 2 do
If i is even then u ← 0; Else
u ← i mods 4.
If i ≡ ±3 (mod 8) and 3−i ≡ 2 (mod 4) then u ← − u. kli ← u.
2.3 For i from 1 to 2 do
If 2di = 1 + kli then di ← 1 − di . ki ← ki /2 .
2.4 l ← l + 1. |
|
k |
1 |
|
, . . . , k |
1 |
|
3. Return JSF(k2 , k1) |
= |
|
l−1 |
|
0 . |
||
|
kl2 |
1 , . . . , k02 |
|
||||
|
|
|
− |
|
|
|
|
Interleaving
The simultaneous and comb methods process multiple point multiplications using precomputation involving combinations of the points. Roughly speaking, if each precomputed value involves only a single point, then the associated method is known as interleaving.
In the calculation of k j Pj for points Pj and integers k j , interleaving allows different methods to be used for each k j Pj , provided that the doubling step can be done jointly. For example, width-w NAF methods with different widths can be used, or some point multiplications may be done by comb methods. However, the cost of the doubling is determined by the maximum number of doublings required in the methods for k j Pj , and hence the benefits of a comb method may be lost in interleaving.
Algorithm 3.51 is an interleaving method for computing |
v |
k j P |
j |
, where a width- |
||||||
j=1 |
||||||||||
j |
w j |
− |
1 |
|
|
|
|
|||
w j NAF is used on k . Points i Pj |
jfor odd i < 2 |
|
are |
calculated in a precomputation |
||||||
|
|
|
|
|
|
|||||
phase. The expansions NAFw j (k |
) are processed jointly, left to right, with a single |
doubling of the accumulator at each stage; Figure 3.8 illustrates the case v = 2. The algorithm has an expected running time of approximately
*|{ : |
|
v |
|
|
− 1) A+ |
|
|
v |
|
|
+ |
|
|
− |
+ |
|
|
|
|||||
|
j > 2}|D + j 1(2 |
*1≤ j≤v l j D + j 1 w j + 1 |
|
||||||||
j |
w |
|
w j |
2 |
|
|
max |
|
l j |
A |
(3.31) |
= |
|
|
|
|
= |
|
|||||
|
|
|
|
|
|
|
|
|
|
v |
2w j −2 points. |
where l j denotes the length of NAFw j (k j ), and requires storage for j=1 |
112 |
3. Elliptic Curve Arithmetic |
|
|
|
|
|
|
|
|
|
|
||||||||||
|
|
|
|
|
|
NAFw j (k j ) |
|
|
|
|
|
|
|
|
|
|
|||||
k |
1 |
|
1 |
|
· · · |
|
|
1 |
· · · |
1 |
P1 |
|
|
Precomputation |
|||||||
|
P1 = |
kt |
|
|
|
ki |
k0 |
|
(2 |
w |
2 |
w |
2 |
points) |
|||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||
k2 P2 = |
kt2 |
|
· · · |
|
|
ki2 |
· · · |
k02 |
P2 |
|
1− |
|
+ 2 2− |
|
|||||||
|
|
|
|
1P1 |
|
|
|
|
1P2 |
||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3P |
|
|
|
|
3P |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
lookup |
|
|
|
|
|
|
. 1 |
|
|
|
|
. 2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
. |
|
|
|
|
. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
. |
|
|
|
|
. |
|
|
|
|
Q ← 2Q + |
ki1 P1 + ki2 P2 |
|
|
(2w1−1−1) P1 |
(2w2−1−1) P2 |
Figure 3.8. Computing k1 P1 + k2 P2 using interleaving with NAFs. The point multiplication accumulation step is shown for the case v = 2 points. Scalar k j is written in width-w j NAF form.
Algorithm 3.51 Interleaving with NAFs
INPUT: v, integers k j , widths w |
|
and points P |
, 1 |
≤ |
j |
≤ |
v. |
|
|
|
|
|
|
||||||||||||||
OUTPUT: |
|
v |
k j |
Pj |
|
|
|
j |
|
|
|
|
j |
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
j=1 |
|
|
|
|
{ |
|
|
|
− |
1 |
|
|
|
≤ |
|
≤ |
|
|
|
|
|
|
|
|
1. |
Compute i P |
j |
for i |
1, 3, . . . , 2w j |
|
1 , 1 |
j |
v. |
|
|
|
|
|
|
|||||||||||||
|
|
|
|
|
|
|
|
|
− } |
|
|
|
|
|
|
|
|
||||||||||
2. |
Use Algorithm 3.30 to compute NAF |
|
(k j ) |
|
|
l j −1 k j |
2i , 1 |
|
j |
|
v. |
||||||||||||||||
3. |
Let l = maxj {l j |
: 1 ≤ j |
≤ v}. |
|
|
w j |
|
|
= |
i=0 |
i |
|
≤ |
|
≤ |
|
|||||||||||
4. |
Define ki |
= 0 for l j ≤ i < l, 1 ≤ j ≤ v. |
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||
5. |
Q ← ∞. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
6. |
For i from l − 1 downto 0 do |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||
|
6.1 |
Q ← 2Q. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
6.2 |
For j from 1 to v do |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||
|
|
If kij |
= 0 then |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||
|
|
|
|
If kij |
> 0 then Q ← Q + kij |
Pj ; |
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||
7. |
|
|
|
Else Q ← Q − kij Pj . |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||
Return(Q). |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
Note 3.52 (comparison with simultaneous methods) |
Consider the calculation of k P + |
l Q, where k and l are approximately the same bitlength. The simultaneous sliding and interleaving methods require essentially the same number of point doublings regardless of the window widths. For a given w, simultaneous sliding requires 3 · 22(w−1) points of storage, and approximately t/(w + (1/3)) point additions in the evaluation stage, while interleaving with width 2w + 1 on k and width 2w on l requires the same amount of storage, but only (4w + 3)t/(4w2 + 5w + 2) < t/(w + (1/2)) additions in evaluation. Interleaving may also be preferable at the precomputation phase, since operations involving a known point P may be done in advance (encouraging the use of a wider width for NAFw(k)), in contrast to the joint computations required in the simultaneous method. Table 3.6 compares operation counts for computing k P + l Q in the case that P (but not Q) is known in advance.
3.3. Point multiplication |
113 |
In the case that storage for precomputation is limited to four points (including P and Q), interleaving with width-3 NAFs or use of the JSF give essentially the same performance, with interleaving requiring one or two more point doublings at the precomputation stage. Table 3.6 gives some comparative results for small window sizes.
method |
w |
storage |
additions |
|
doubles |
|
Alg 3.48 |
1 |
3 |
1 + 3t/4 ≈ 1 |
+ .75t |
t |
|
Alg 3.48 |
2 |
15 |
9 + 15t/32 ≈ 9 |
+ .47t |
2 + t |
|
Alg 3.48 with sliding |
2 |
12 |
9 + 3t/7 ≈ 9 |
+ .43t |
2 + t |
|
Alg 3.48 with NAF |
|
4 |
2 + 5t/9 |
≈ 2 |
+ .56t |
t |
Alg 3.48 with JSF |
|
4 |
2 + t/2 |
≈ 2 |
+ .5t |
t |
interleave with 3-NAF |
3, 3 |
2+2 |
1 + t/2 |
≈ 1 |
+ .5t |
1 + t |
interleave with 5-NAF & 4-NAF 5, 4 |
8+4 |
3 + 11t/30 |
≈ 3 |
+ .37t |
1 + t |
Table 3.6. Approximate operation counts for computing k P +l Q, where k and l are t-bit integers. The precomputation involving only P is excluded.
Interleaving can be considered as an alternative to the comb method (Algorithm 3.44) for computing k P. In this case, the exponent array for k is processed using
interleaving (Algorithm 3.51), with k j given by k = w= k j 2( j−1)d and points Pj
j 1
given by Pj = 2( j−1)d P, 1 ≤ j ≤ w, where d is defined in Algorithm 3.44. Table 3.7 compares the comb and interleaving methods for fixed storage.
method |
rows |
storage |
additions |
doubles |
|
|
|
|
|
|
|
comb |
2 |
3 |
3t/8 ≈ .38t |
t/2 |
|
interleave (3, 3) |
2 |
4 |
t/4 ≈ .25t |
t/2 |
|
comb |
4 |
15 |
15t/64 ≈ .23t |
t/4 |
|
comb (two-table) |
3 |
14 |
7t/24 ≈ .29t |
t/6 |
|
interleave (4, 4, 4, 4) |
4 |
16 |
t/4 ≈ .25t |
t/4 |
|
interleave (4, 4, 4, 3, 3) |
5 |
16 |
11t/50 ≈ .22t |
t/5 |
|
comb |
5 |
31 |
31t/160 |
≈ .19t |
t/5 |
comb (two-table) |
4 |
30 |
15t/64 |
≈ .23t |
t/8 |
interleave (5, 5, 5, 4, 4) |
5 |
32 |
9t/50 |
≈ .18t |
t/5 |
interleave (5, 5, 4, 4, 4, 4) |
6 |
32 |
17t/90 |
≈ .19t |
t/6 |
Table 3.7. Approximate operation counts in comb and interleaving methods for computing k P, P known in advance. The bitlength of k is denoted by t. The interleaving methods list the widths used on each row in calculating the NAF.