- •Guide to Elliptic Curve Cryptography
- •Contents
- •List of Algorithms
- •List of Tables
- •List of Figures
- •Acronyms
- •Preface
- •1 Introduction and Overview
- •1.1 Cryptography basics
- •1.2.3 Elliptic curve systems
- •1.3 Why elliptic curve cryptography?
- •1.4 Roadmap
- •2 Finite Field Arithmetic
- •2.2.1 Addition and subtraction
- •2.2.2 Integer multiplication
- •2.2.3 Integer squaring
- •2.2.4 Reduction
- •2.2.5 Inversion
- •2.3.1 Addition
- •2.3.2 Multiplication
- •2.3.3 Polynomial multiplication
- •2.3.4 Polynomial squaring
- •2.3.5 Reduction
- •2.4.1 Addition and subtraction
- •2.4.2 Multiplication and reduction
- •2.4.3 Inversion
- •3 Elliptic Curve Arithmetic
- •3.1 Introduction to elliptic curves
- •3.1.2 Group law
- •3.1.3 Group order
- •3.1.4 Group structure
- •3.2.1 Projective coordinates
- •3.3 Point multiplication
- •3.3.1 Unknown point
- •3.3.2 Fixed point
- •3.3.3 Multiple point multiplication
- •3.4 Koblitz curves
- •3.4.1 The Frobenius map and the ring Z[τ ]
- •3.4.2 Point multiplication
- •3.6 Point multiplication using halving
- •3.6.1 Point halving
- •3.6.3 Point multiplication
- •3.7 Point multiplication costs
- •4 Cryptographic Protocols
- •4.1 The elliptic curve discrete logarithm problem
- •4.2.3 Determining the number of points on an elliptic curve
- •4.4 Signature schemes
- •4.4.1 ECDSA
- •4.4.2 EC-KCDSA
- •4.5.1 ECIES
- •4.5.2 PSEC
- •4.6.1 Station-to-station
- •4.6.2 ECMQV
- •5 Implementation Issues
- •5.1 Software implementation
- •5.1.1 Integer arithmetic
- •5.1.5 Timings
- •5.2 Hardware implementation
- •5.3 Secure implementation
- •5.3.1 Power analysis attacks
- •5.3.2 Electromagnetic analysis attacks
- •5.3.4 Fault analysis attacks
- •5.3.5 Timing attacks
- •A.1 Irreducible polynomials
- •A.2 Elliptic curves
- •A.2.2 Random elliptic curves over F2m
- •A.2.3 Koblitz elliptic curves over F2m
- •C.1 General-purpose tools
- •C.2 Libraries
- •Bibliography
- •Index
3.3. Point multiplication |
103 |
(kt−1kt−2 · · · kt− j kt−( j+1) kt−( j+2) · · · k1k0)2 P |
||
|
|
|
↓↓
|
[l P, (l + 1) P] → |
[2l P, l P + (l + 1) P], |
if kt−( j+1) = 0 |
|
|
||||
|
|
|
|
[l P + (l + 1) P, 2(l + 1) P], if kt−( j+1) = 1 |
|
|
|||
Figure |
3.4. One |
iteration |
in Montgomery point |
multiplication. After j iterations, |
the |
||||
x-coordinates of |
l P and (l + 1) P are |
known for |
l = (kt−1 · · · kt− j )2. Iteration j + 1 re- |
||||||
quires |
a doubling and an |
addition to |
find the x-coordinates of l P and (l + 1) P |
for |
l = (kt−1 · · · kt−( j+1))2.
Equation (3.24) is derived using the addition formula for computing the x-coordinate x2 of (k + 1) P from k P = (x1, y1) and P = (x, y).
Algorithm 3.40 is presented using standard projective coordinates (see §3.2.1); only the X- and Z-coordinates of points are computed in steps 1 and 2. The approximate running time is
6m M + (1I + 10M). |
(3.25) |
One advantage of Algorithm 3.40 is that it does not have any extra storage requirements. Another advantage is that the same operations are performed in every iteration of the main loop, thereby potentially increasing resistance to timing attacks and power analysis attacks (cf. §5.3).
Algorithm 3.40 Montgomery point multiplication (for elliptic curves over F2m )
INPUT: k = (kt−1, . . . , k1, k0)2 with kt−1 = 1, P = (x, y) E(F2m ).
OUTPUT: k P.
1.X1 ← x, Z1 ← 1, X2 ← x4 + b, Z2 ← x2. {Compute ( P, 2P)}
2.For i from t − 2 downto 0 do
2.1If ki = 1 then
T ← Z1, Z1 ← (X1 Z2 + X2 Z1)2, X1 ← x Z1 + X1 X2T Z2.
T ← X2, X2 ← X24 + bZ24, Z2 ← T 2 Z22.
2.2 Else
T ← Z2, Z2 ← (X1 Z2 + X2 Z1)2, X2 ← x Z2 + X1 X2 Z1 T .
T ← X1, X1 ← X14 + bZ14, Z1 ← T 2 Z12.
3. x3 ← X1/Z1.
4. y3 ← (x+ X1/Z1)[(X1+x Z1)(X2+x Z2) + (x2+y)(Z1 Z2)](x Z1 Z2)−1 + y. 5. Return(x3, y3).
3.3.2 Fixed point
If the point P is fixed and some storage is available, then the point multiplication operation k P can be accelerated by precomputing some data that depends only on P.
104 3. Elliptic Curve Arithmetic
For example, if the points 2P, 22 P, . . ., 2t−1 P are precomputed, then the right-to-left binary method (Algorithm 3.26) has expected running time (m/2) A (all doublings are eliminated).1
Fixed-base windowing methods
Brickell, Gordon, McCurley and Wilson proposed the following refinement to the simple method of precomputing every multiple 2i P. Let (Kd−1, . . . , K1, K0)2w be the
base-2w representation of k, where d |
= |
t/w |
, and let Q |
|
= i:Ki = j |
2wi P for each j , |
||||||||||
1 ≤ j ≤ 2w − 1. Then |
|
|
|
|
|
|
|
|
j |
|
||||||
d−1 |
2w −1 |
|
j 2wi P = |
2w −1 |
|
|
||||||||||
|
|
|
|
|
||||||||||||
k P = i |
= |
0 Ki (2wi P) = |
j |
= |
1 j i |
Ki |
= |
j |
= |
1 |
j Q j |
|
||||
|
|
|
|
: |
|
|
|
|
|
|
|
|
= Q2w −1 + (Q2w −1 + Q2w −2) + · · · + (Q2w −1 + Q2w −2 + · · · + Q1).
Algorithm 3.41 is based on this observation. Its expected running time is approximately
(2w + d − 3) A |
(3.26) |
where d = t/w and t ≈ m.
Algorithm 3.41 Fixed-base windowing method for point multiplication
INPUT: Window width w, d = t/w , k = (Kd−1, . . . , K1, K0)2w , P E(Fq ).
OUTPUT: k P.
1.Precomputation. Compute Pi = 2wi P, 0 ≤ i ≤ d − 1.
2.A ← ∞, B ← ∞.
3.For j from 2w − 1 downto 1 do
3.1For each i for which Ki = j do: B ← B + Pi . {Add Q j to B}
3.2A ← A + B.
4.Return( A).
Algorithm 3.42 modifies Algorithm 3.41 by using NAF(k) instead of the binary representation of k. In Algorithm 3.42, NAF(k) is divided into {0, ±1}-strings Ki each of the same length w:
NAF(k) = Kd−1 · · · K1 K0.
Since each Ki is in non-adjacent form, it represents an integer in the interval [−I, I ] where I = (2w+1 − 2)/3 if w is even, and I = (2w+1 − 1)/3 if w is odd. The expected
running time of Algorithm 3.42 is approximately |
|
|||
|
2w+1 |
+ d − 2 A |
(3.27) |
|
3 |
||||
|
|
|
where d = (t + 1)/w .
1Recall the following notation: t is the bitlength of k, and m = log2 q . Also, we assume that t ≈ m.
3.3. Point multiplication |
105 |
Algorithm 3.42 Fixed-base NAF windowing method for point multiplication
INPUT: Window width w, positive integer k, P E(Fq ).
OUTPUT: k P.
1.Precomputation. Compute Pi = 2wi P, 0 ≤ i ≤ (t + 1)/w .
2.Use Algorithm 3.30 to compute NAF(k) = l−1 k 2i .
i=0 i
3.d ← l/w .
4.By padding NAF(k) on the left with 0s if necessary, write (kl−1 , . . . , k1, k0) =
Kd−1 · · · K1 K0 where each Ki is a {0, ±1}-string of length d.
5.If w is even then I ← (2w+1 − 2)/3; else I ← (2w+1 − 1)/3.
6.A ← ∞, B ← ∞.
7.For j from I downto 1 do
7.1For each i for which Ki = j do: B ← B + Pi . {Add Q j to B}
7.2For each i for which Ki = − j do: B ← B − Pi . {Add −Q j to B}
7.3A ← A + B.
8.Return( A).
Note 3.43 (selection of coordinates) If field inversion is sufficiently expensive, then projective coordinates will be preferred for one or both of the accumulators A and B in Algorithms 3.41 and 3.42. In the case of curves over prime fields, Table 3.3 shows that Chudnovsky coordinates for B and Jacobian coordinates for A is the preferred selection if projective coordinates are used, in which case Algorithm 3.42 has mixed Chudnovsky-affine additions at steps 7.1 and 7.2, and mixed Jacobian-Chudnovsky addition at step 7.3.
Fixed-base comb methods
Let d = t/w . In the fixed-base comb method (Algorithm 3.44), the binary representation of k is first padded on the left with dw − t 0s, and is then divided into w bit strings each of the same length d so that
k = K w−1 · · · K 1 K 0.
The bit strings K j are written as rows of an exponent array
K 0
.
.
.
K w
.
..
K w−1
|
|
. |
|
|
|
|
|
K 0 |
|
|
|
|
d−1 |
|
|
|
|
. |
|
= |
− |
|
||
|
|
. |
|
|
|
|
|
. |
|
|
. |
1 |
||
|
|
Kdw |
||
|
|
|
|
|
|
|
K w |
1 |
|
|
|
|
d −1 |
|
|
|
|
. |
|
|
− |
|
· · · |
. |
|
|
|
|
. |
|
|
· · · |
|
|
. |
|
|
|
|
K00 |
|
|
|
|
kd−1 |
|
k0 |
|
|
|||||||
|
. |
|
|
|
|
. |
|
|
|
|
|
. |
|
|
|
|
· · · |
. |
|
= |
+ − |
· · · |
|
|
. |
|
|
||||||
|
|
|
|
|
. |
|
|
|
|
|
|
|
|
|||
|
. |
|
|
|
|
. |
|
|
|
|
|
. |
|
|
|
|
|
. |
|
|
. |
|
|
|
|
|
. |
|
|
||||
|
K0w |
|
|
|
k(w |
1)d 1 |
|
|
kw d |
|
||||||
|
K w |
1 |
|
|
|
k |
|
|
|
|
k |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
. |
|
|
|
||||
|
. |
|
|
|
|
. |
|
|
|
|
|
|
|
|
||
· · · |
0 − |
|
|
|
|
− |
|
· · · |
|
|
− |
|
||||
|
|
|
|
|
wd |
|
1 |
|
|
(w |
|
|
1)d |
|
106 3. Elliptic Curve Arithmetic
whose columns are then processed one at a time. In order to accelerate the computation, the points
[aw−1, . . . , a2, a1, a0] P = aw−12(w−1)d P + · · · + a222d P + a12d P + a0 P
are precomputed for all possible bit strings (aw−1, . . . , a1, a0).
Algorithm 3.44 Fixed-base comb method for point multiplication
INPUT: Window width w, d = t/w , k = (kt−1, . . . , k1, k0)2, P E(Fq ).
OUTPUT: k P.
1.Precomputation. Compute [aw−1, . . . , a1, a0] P for all bit strings (aw−1, . . . , a1, a0) of length w.
2. By padding k on the left with 0s if necessary, write k = K
where each K j is a bit string of length d. Let Kij denote the ith bit of K j .
3.Q ← ∞.
4.For i from d − 1 downto 0 do
4.1Q ← 2Q.
4.2Q ← Q + [Kiw−1, . . . , Ki1, Ki0] P.
5.Return(Q).
The expected running time of Algorithm 3.44 is |
− |
|
|
|||||||
|
2w |
− |
|
|
+ |
|
|
|
||
|
2w − 1 |
d |
|
1 |
A |
|
(d |
|
1)D. |
(3.28) |
|
|
|
|
|
For w > 2, Algorithm 3.44 has approximately the same number of point additions as point doubles in the main loop. Figure 3.5 illustrates the use of a second table of precomputation in Algorithm 3.45, leading to roughly half as many point doubles as point additions.
Algorithm 3.45 Fixed-base comb method (with two tables) for point multiplication
INPUT: Window width w, d = t/w , e = d/2 , k = (kt−1, . . . , k0)2, P E(Fq ).
OUTPUT: k P.
1.Precomputation. Compute [aw−1, . . . , a1, a0] P and 2e[aw−1, . . . , a1, a0] P for all bit strings (aw−1, . . . , a1, a0) of length w.
2.By padding k on the left with 0s if necessary, write k = K w−1 · · · K 1 K 0, where each K j is a bit string of length d. Let Kij denote the ith bit of K j .
3.Q ← ∞.
4.For i from e − 1 downto 0 do
4.1Q ← 2Q.
4.2Q ← Q + [Kiw−1, . . . , Ki1, Ki0] P + 2e[Kiw+−e1, . . . , Ki1+e, Ki0+e] P.
5.Return(Q).
3.3. Point multiplication |
107 |
|
|
|
|
|
|
|
|
|
|
w ×d exponent array |
|||
|
|
|
|
|
|
|
|
|
|
−−−−−−−−−−−−→ |
|||
|
Kd0−1 · · · |
Ki0+e |
|
· · · Ke0 |
Ke0−1 · · · |
||||||||
|
. |
|
|
|
|
|
. |
|
|
. |
. |
|
|
. |
· · · |
|
|
. |
· · · |
. |
. |
· · · |
|||||
|
|
− |
|
|
+ |
. |
− |
||||||
|
. |
|
|
|
|
|
. |
|
|
. |
|
||
K w−1 |
K w−1 |
|
|
K w−1 |
K w−1 |
||||||||
|
d |
|
1 |
|
|
|
|
i e |
|
|
e |
e |
1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Precomp |
|
|
|
|
|
|
|
|
|
|
|
|
|
(2w −1 elements) |
|
|
lookup |
|
|
|
|
|
|||||
|
|
|
|
|
|
2e[aw−1, . . . , a0]P
|
|
|
|
|
|
. |
|
|
. |
|
|
|
|
||
|
Ki0 |
· · · |
K00 |
|
|||
|
. |
|
|
|
|
. |
|
|
i |
· · · |
0 |
||||
|
. |
|
|
|
|
. |
|
|
K w−1 |
|
|
|
|
K w−1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Precomp |
|
|
lookup |
|
|
|
|
(2w −1 elements) |
|
|
|
|
|
|
|||
|
|
|
|
|
[aw−1, . . . , a0]P |
||
|
|
|
|
Q |
← |
2Q |
+ |
2e[K w−1, . . . , K 0 |
|
]P |
+ |
[K w−1, . . . , K 0]P |
|
|
|||||||||
|
|
|
|
i+e |
|
i+e |
|
|
|
i |
i |
|
|
|||||||
Figure 3.5. One iteration in Algorithm 3.45. The w |
×d exponent array is |
processed left-to-right in |
||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
e |
[aw−1, . . . , a0] P |
|||||||||
e = d/2 iterations to find k P. Precomputation finds [aw−1, . . . , a0] P and 2 |
||||||||||||||||||||
for all w-bit values (aw−1, . . . , a0), where [aw−1, . . . , a0] = aw−12(w−1)d + · · · + a12d + a0. |
||||||||||||||||||||
The expected running time of Algorithm 3.45 is approximately |
|
|
||||||||||||||||||
|
|
|
|
|
2w |
|
− |
|
|
+ |
|
|
− |
|
|
|
|
|||
|
|
|
|
|
|
2w − 1 |
2e |
1 |
|
A |
|
|
(e |
|
|
1)D. |
|
(3.29) |
||
|
|
|
|
|
|
|
|
|
|
|
|
|
For a fixed w, Algorithm 3.45 requires twice as much storage for precomputation as Algorithm 3.44. For a given amount of precomputation, Algorithm 3.45 is expected to outperform Algorithm 3.44 whenever
2w−1(w − 2) |
≥ |
A |
, |
2w − w − 1 |
|
||
D |
where w is the window width used in Algorithm 3.44 (and hence width w − 1 is used with Algorithm 3.45). As an example, LD coordinates in the binary field case give A/ D ≈ 2, requiring (roughly) w ≥ 6 in Algorithm 3.44 in order for the two-table method to be superior. For the NIST curves over prime fields, A/D ≈ 1.4 with Jacobian coordinates and S = .8M, requiring w ≥ 4.
Note 3.46 (Algorithm 3.45 with simultaneous addition) If storage for an additional e points (which depend on k) can be tolerated, then the values
T |
K w−1, . . . , K 1 |
, K 0 |
] |
P |
+ |
2e |
K w−1, . . . , K 1 |
, K 0 |
] |
P, 0 |
≤ |
i < e, |
||
i ← [ |
i |
i |
i |
|
[ |
i+e |
i+e |
i+e |
|
|
at step 4.2 of Algorithm 3.45 can be determined in a (data-dependent) precomputation phase. The strategy calculates the points Ti in affine coordinates, using the method of simultaneous inversion (Algorithm 2.26) to replace an expected e = (1− 1/2w )2e field inverses with one inverse and 3(e − 1) field multiplications.
108 3. Elliptic Curve Arithmetic
If Q is maintained in projective coordinates, then e mixed-coordinate additions at step 4.2 are replaced by e simultaneous additions in the new precomputation phase. With the coordinates discussed in §3.2, this translates into the following approximate field operation counts.
|
|
|
e additions in E(F2m ) |
e additions in E(F pm ), p > 3 |
|
mixed-coordinate |
simultaneous |
mixed-coordinate simultaneous |
|
|
|
8e M |
I + (5e − 3)M |
8e M + 3e S I + (5e − 3)M + e S |
For curves of practical interest from §3.2 over fields where I/M is expected to be small (e.g., binary fields and OEFs), a roughly 15% reduction in point multiplication time is predicted.
Note 3.47 (comb methods) Algorithms 3.44 and 3.45 are special cases of exponentiation methods due to Lim and Lee. For given parameters w and v, a t-bit integer
k is |
written as an exponent array of w × d |
bits where d = t/w , as illustrated |
in Figure 3.6. A typical entry Kvw consists |
of the e = d/v bits of k given by |
|
Kvw |
= (kl+e−1 , . . . , kl+1 , kl ) where l = dw + ev (with zeros replacing some entries |
|
if v |
= v − 1 and v d). |
|
K 0v−1
...
w Kvw−1
...
K w−1
v−1
· · · (Kv0 ,e |
− |
1, . . . , Kv0 ,0) |
· · · |
K00 |
||||||
|
|
|
|
. |
|
|
|
. |
||
|
|
|
|
|
. |
|
|
|
. |
|
· · · (Kvw,e |
|
|
. |
|
|
|
. |
|||
− |
1, . . . , Kvw,0) |
· · · |
K0w |
|||||||
|
|
|
|
. |
|
|
|
. |
||
|
|
|
|
|
. |
|
|
|
. |
|
|
|
|
|
|
. |
|
|
|
. |
|
· · · |
(K w−1 |
|
, . . . , K w−1) |
· · · |
K w−1 |
|||||
|
v ,e−1 |
|
|
v ,0 |
|
0 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
e= d/v bits |
|
|
|
|
|
||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
d= t/w bits |
|
|
|
|
|
||||||
Figure 3.6. The |
exponent array in Lim-Lee combing methods. Given parameters w and v, a |
t-bit |
||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||
integer k is written as a w×d bit array for d = t/w . Entries Kvw |
have e = d/v bits. |
|
||||||||||||||||||||||||
If K w denotes the integer formed from the bits in row w , then |
|
|||||||||||||||||||||||||
|
|
w−1 |
|
|
|
|
|
|
|
|
w−1 v−1 |
2dw P |
|
|||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||
|
k P = w |
= |
0 K w 2dw P = w |
= |
0 v |
= |
0 Kvw 2ev |
|
||||||||||||||||||
|
|
w−1 |
v−1 |
|
e−1 |
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||
|
= w |
= |
0 v |
= |
0 e |
= |
0 Kvw,e 2e 2ev 2dw P |
|
|
|
||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
e−1 v−1 |
|
|
|
w−1 |
|
|
|
|
|
|
|
|
|
|||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||
|
= e |
= |
0 2e v |
= |
0 2ev |
w |
= |
0 Kvw,e 2dw P |
|
|
|
|||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
P[v ][Kv ,e ]