Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Санкт-Петербургский государственный университет телекоммуникаций им. проф. М.А. Бонч-Бруевича

Предмет:

Профессионально-ориентированный английский язык

Файл:

Guide to Elliptic Curve Cryptography.pdf

Скачиваний:

Добавлен:

15.03.2015

Размер:

4.58 Mб

Скачать

☆

<<< < Предыдущая 22 23 24 25 26 27 28 29 30 31 32 33 34 35 3637 / 4637 38 39 40 41 42 43 44 45 46 > Следующая >>>

5.1. Software implementation

219

for inline assembler to cooperate with surrounding code is a design weakness compared with that of GNU C or the Intel compilers, which have the additional advantage that they can be used on Unix-like systems.

5.1.5Timings

Selected ﬁeld operation timings are presented for Intel Pentium family processors and the Sun UltraSPARC, commonly used in workstations. The NIST recommended binary and prime ﬁelds (§A.2) are the focus, although some data for an OEF (§2.4) is presented for comparison.

It is acknowledged that timings can be misleading, and are heavily inﬂuenced by the programmer’s talent and effort (or lack thereof), compiler selection, and the precise method of obtaining the data. The timings presented here should be viewed with the same healthy dose of skepticism prescribed for all such data. Nonetheless, timings are essential for algorithm analysis, since rough operation counts are often insufﬁcient to capture platform characteristics. For the particular timings presented here, there has generally been independent “sanity check” data available from other implementations.

Tables 5.3–5.5 give basic comparisons for the NIST recommended binary and prime ﬁelds, along with a selected OEF. Inversion and multiplication times for binary ﬁelds on two platforms appear in Table 5.6, comparing compilers, inversion algorithms, and 32-bit vs 64-bit code. The 64-bit code on the Intel Pentium III is via special-purpose registers. These capabilities were extended in the Pentium 4, and Table 5.7 includes timings for prime ﬁeld multiplication via these registers along with an approach using ﬂoating-point registers.

Field arithmetic comparisons

Timings for the smallest of the NIST recommended binary and prime ﬁelds, along with an OEF, are presented in Table 5.3. Speciﬁcally, these are the binary ﬁeld F2163

with reduction polynomial f (z) = z163 + z7 + z6 + z3 + 1, the prime ﬁeld F p192 with p192 = 2192 − 264 − 1, and the OEF F p6 with prime p = 231 − 1 and reduction poly-

nomial f (z) = z6 − 7. Realistic branch misprediction penalties are obtained using a sequence of pseudo randomly generated ﬁeld elements, and the timings include framework overhead such as function calls. The Intel compiler version 6 along with the Netwide Assembler (NASM) were used on an Intel Pentium III running the Linux 2.2 operating system.

Algorithms for binary ﬁelds were coded entirely in C except for a one-line assembler fragment used in polynomial degree calculations in inversion. Assembly coding may be required in prime ﬁelds and OEFs in order to use hardware multipliers producing a 64-bit product from 32-bit input, and to directly access the carry bit, both of which are essential to performance in conventional methods. The ﬁrst of the F p192 columns in Table 5.3 gives timings for code written primarily in C. For most entries, a signiﬁ-

220	5. Implementation Issues

		F2163	a	F p192	F(231−1)6
		F2163	F p192	F p192	F(231−1)6

	Addition	0.04	0.18	0.07	0.06
	Reduction	0.11b	0.25c	0.11c
	Fast reduction	0.11b	0.25c	0.11c	N/A
	Barrett reduction (Algorithm 2.14)	N/A	1.55d	0.49	N/A
	Multiplication (including fast reduction)	1.30e	0.57d,f	0.42f	0.40g
	Squaring (including fast reduction)	0.20h	—	0.36i	0.32g
	Inversion	10.5j	58.3k	25.2k	2.9l
	I/M	8.1	102.3	60.0	7.3

	aCoded primarily in C. bAlgorithm 2.41. cAlgorithm 2.27. dUses a 32×32 multiply-and-add.
	eAlgorithm 2.36. fAlgorithm 2.10. gExample 2.56. hAlgorithm 2.39.
	iAlgorithm 2.13. jAlgorithm 2.48. kAlgorithm 2.22. lAlgorithm 2.59.

Table 5.3. Timings (in µs) for ﬁeld arithmetic on an 800 MHz Intel Pentium III. The binary ﬁeld F2163 = F2[z]/(z163 + z7 + z6 + z3 + 1) and the prime ﬁeld F p192 for p192 = 2192 − 264 − 1 are from the NIST recommendations (§A.2). The rightmost column is the optimal extension ﬁeld F p6 = F p[z]/(z6 − 7) for prime p = 231 − 1.

cant penalty is seen relative to the timings with assembly. However, the multiplication routine uses an in-line assembly fragment for a 32×32 multiply with a three-word accumulation. If reduction is excluded, the time is very close to that obtained with the assembly language version, an indication that the Intel compiler handles insertion of short in-line assembly fragments well.

Reduction Barrett reduction does not exploit the special form of the NIST prime, and the entries can be interpreted as rough cost estimates of reduction with a random 192-bit prime. In contrast to special primes, this estimate shows that reduction is now a very signiﬁcant part of ﬁeld multiplication timings, encouraging the use of Montgomery (§2.2.4) and other multiplication methods. Signiﬁcant performance degradation in the C version of the fast reduction algorithm is largely explained by the many conditionals in the clumsy handling of carry.

OEF The OEF F(231−1)6 in the rightmost column of Table 5.3 is roughly the same size as F p192 . The multiplication is accomplished with an accumulation method (Example 2.56) resembling the method used in F p192 , and the resulting times are comparable. As expected, inversion is signiﬁcantly faster for the OEF.

NIST ﬁelds Tables 5.4 and 5.5 provide timings for the NIST recommended binary and prime ﬁelds. Note that optimizations in the larger ﬁelds were limited to techniques employed for F2163 and F p192 . In particular, Karatsuba-Ofman methods were not competitive in our tests on this platform for the smaller ﬁelds, but were not examined carefully in the larger ﬁelds.

		5.1.	Software implementation			221

	F2163	F2233	F2283	F2409	F2571

Addition	0.04	0.04	0.04	0.06	0.07
Reduction (Algorithms 2.41–2.45)	0.11	0.13	0.19	0.14	0.33
Multiplication (Algorithm 2.36)	1.30	2.27	2.92	5.53	10.23
Squaring (Algorithm 2.39)	0.20	0.23	0.32	0.31	0.56

Inversion (Algorithm 2.48)	10.5	18.6	28.2	53.9	96.4
I/M	8.1	8.2	9.7	9.8	9.4

Table 5.4. Timings (in µs) for binary ﬁeld arithmetic on an 800 MHz Intel Pentium III, including

reduction to canonical form. The ﬁelds are from the NIST recommendations (§A.2) with reduction polynomials z163 + z7 + z6 + z3 + 1, z233 + z74 + 1, z283 + z12 + z7 + z5 + 1, z409 + z87 + 1, and z571 + z10 + z5 + z2 + 1, respectively.

	F p192	F p224	F p256	F p384	F p521

Addition	0.07	0.07	0.08	0.10	0.10
Reduction (Algorithms 2.27–2.31)	0.11	0.12	0.30	0.38	0.20

Multiplication (Algorithm 2.10)	0.42	0.52	0.81	1.47	2.32
Squaring (Algorithm 2.13)	0.36	0.44	0.71	1.23	1.87

Inversion (Algorithm 2.22)	25.2	34.3	44.3	96.3	163.8
I/M	60.0	70.0	54.7	65.5	70.6

Table 5.5. Timings (in µs) for prime ﬁeld arithmetic on an 800 MHz Intel Pentium III, in-

cluding reduction to canonical form. The ﬁelds are from the NIST recommendations (§A.2)

with p192 = 2192 − 264 − 1, p224 = 2224 − 296 + 1, p256 = 2256 − 2224 + 2192 + 296 − 1, p384 = 2384 − 2128 − 296 + 232 − 1, and p521 = 2521 − 1.

Multiplication and inversion in binary ﬁelds

In point multiplication on elliptic curves (§3.3), the cost of ﬁeld inversion relative to ﬁeld multiplication is of particular interest. This section presents estimates of the ratio for the NIST binary ﬁelds (where the ratio is expected to be relatively small) for two platforms. The three inversion methods discussed in §2.3.6 are compared, along with timings for 32-bit and 64-bit code. The results also show signiﬁcant differences among the compilers used.

Table 5.6 gives comparative timings on two popular platforms, the Intel Pentium III and Sun UltraSPARC IIe. Both processors are capable of 32and 64-bit operations, although only the UltraSPARC is 64-bit. The 64-bit operations on the Pentium III are via the single-instruction multiple-data (SIMD) registers, introduced on the Pentium MMX (see Table 5.1). The inversion methods are the extended Euclidean algorithm (EEA) in Algorithm 2.48, the binary Euclidean algorithm (BEA) in Algorithm 2.49, and the almost inverse algorithm (AIA) in Algorithm 2.50. The example ﬁelds are

taken from the NIST recommendations, with reduction polynomials f (z) = z163 + z7 + z6 + z3 + 1 and f (z) = z233 + z74 + 1. Both allow fast reduction, but only the latter

is favourable to the almost inverse algorithm. Field multiplication based on the comb

222	5. Implementation Issues

		Pentium III (800 MHz)			SPARC (500 MHz)
		32-bit		64-bit	32-bit		64-bit
	Algorithm	gcc	icc	mmx	gcc	cc	cc


	Arithmetic in F2163
	multiplication	1.8	1.3	.7	1.9	1.8	.9
	Euclidean algorithm	10.9	10.5	7.1	21.4	14.8	—
	binary Euclidean algorithm	20.7	16.0	—	16.8	14.9	10.6
	almost inverse	16.4	15.2	—	22.6	15.2	—
	I/M	6.1	8.1	9.8	8.8	8.2	12.1

	Arithmetic in F2233
	multiplication	3.0	2.3	—	4.0	2.9	1.7
	Euclidean algorithm	18.3	18.8	—	45.5	25.7	—
	binary Euclidean algorithm	36.2	28.9	—	42.0	34.0	16.9
	almost inverse	22.7	20.1	—	36.8	24.7	—
	I/M	6.1	8.2	—	9.2	8.5	9.9

Table 5.6. Multiplication and inversion times for the Intel Pentium III and Sun UltraSPARC IIe. The compilers are GNU C 2.95 (gcc), Intel 6 (icc), and Sun Workshop 6U2 (cc). The 64-bit “multimedia” registers were employed for the entries under “mmx.” Inversion to multiplication (I/M) uses the best inversion time.

method (Algorithm 2.36) appears to be fastest on these platforms. A width-4 comb was used, and the times include reduction. Other than the MMX code and a one-line assembler fragment for EEA, algorithms were coded entirely in C.

Some table entries are as expected, for example, the relatively good times for almost inverse in F2233 . Other entries illustrate the signiﬁcant differences between platforms or compilers on a single platform. Apparent inconsistencies remain in Table 5.6, but we believe that the fastest times provide meaningful estimates of inversion and multiplication costs on these platforms.

Division The timings do not make a very strong case for division using a modiﬁcation of the BEA (§2.3.6). For the 32-bit code, unless EEA or AIA can be converted to efﬁciently perform division, then only the entry for F2163 on the SPARC supports use of BEA-like division. Furthermore, the ratio I/M is at least 8 in most cases, and hence the savings from use of a division algorithm would be less than 10%. With such a ratio, elliptic curve methods will be chosen to reduce the number of inversions, so the savings on a point multiplication k P would be signiﬁcantly less than 10%.

On the other hand, if afﬁne-only arithmetic is in use in a point multiplication method based on double-and-add, then a fast division would be especially welcomed even if I/M is signiﬁcantly larger than 5. If BEA is the algorithm of choice, then division has essentially the same cost as inversion.

Implementation notes General programming considerations for the implementations used here are covered in §5.1.4. In particular, to obtain acceptable multiplication times

5.1. Software implementation

223

with gcc on the Sun SPARC, code was tuned to be more “gcc-friendly.” Limited tuning for gcc was also performed on the inversion code. Optimizing the inversion code is tedious, in part because rough operation counts at this level often fail to capture processor or compiler characteristics adequately.

Multimedia registers The Intel Pentium family (all but the original and the Pentium Pro) and AMD processors possess eight 64-bit “multimedia” registers that were employed for the timings in the column marked “mmx.” Use of these capabilities for ﬁeld arithmetic is discussed in §5.1.3.

EEA Algorithm 2.48 requires polynomial degree calculations. On the SPARC, degree was found by binary search and table lookup, once the nonzero word of interest is located. On the Pentium, a bit scan instruction (bsr) that ﬁnds the position of the most signiﬁcant bit in a word was employed via in-line assembly, resulting in an improvement of approximately 15% in inversion times.

The code tracks the lengths of u and v using t fragments of similar code, each fragment corresponding to the current “top” of u and v. Here, t was chosen to be the number of words required to represent ﬁeld elements.

BEA Algorithm 2.49 was implemented with a t-fragment split to track the lengths of u and v efﬁciently. Rather than the degree calculation indicated in step 3.3, a simpler comparison on the appropriate words was used.

AIA Algorithm 2.50 allows efﬁcient tracking of the lengths of g1 and g2 (in addition to the lengths of u and v). A total of t2 similar fragments of code were used, a significant amount of code expansion unless t is small. As with BEA, a simple comparison replaces the degree calculations. Note that only the reduction polynomial for F2233 is favourable to the almost inverse algorithm.

Prime ﬁeld multiplication methods

For prime ﬁelds, traditional approaches for ﬁeld multiplication are often throttled by limitations of hardware integer multipliers and carry propagation. Both the UltraSPARC and the Pentium family processors suffer from such limitations. The Intel Pentium 4 is in fact much slower (in terms of processor cycles) in some operations than the preceding generation of Pentium processors. As an example, ﬁeld multiplication in F p224 using Algorithm 2.10 with code targeted at the Pentium II/III appears in Table 5.5 (from a Pentium III) and Table 5.7 (from a Pentium 4). Despite a factor 2 clock speed advantage for the Pentium 4, the timing is in fact slower than obtained on the Pentium III.

Karatsuba-Ofman Methods based on Karatsuba-Ofman do not appear to be competitive with classical methods on the Pentium II/III for ﬁelds of this size. Table 5.7 includes times on the Pentium 4 using a depth-2 approach outlined in Example 2.12.

<<< < Предыдущая 22 23 24 25 26 27 28 29 30 31 32 33 34 35 3637 / 4637 38 39 40 41 42 43 44 45 46 > Следующая >>>

Соседние файлы в предмете Профессионально-ориентированный английский язык

#
15.03.20151.37 Mб56Google Secrets - How To Get A Top 10 Ranking.pdf
#
15.03.2015114.88 Кб27Google Spamguide.pdf
#
15.03.2015135.12 Кб17GRC.com Denial Of Service Attack.pdf
#
15.03.2015830.4 Кб25Groupware & Security.pdf
#
15.03.2015590.68 Кб46GSM and GPRS.pdf
#
15.03.20154.58 Mб58Guide to Elliptic Curve Cryptography.pdf
#
15.03.201512.39 Mб75Hack Proofing Linux.pdf
#
15.03.20155.14 Mб57Hackers Beware.pdf
#
15.03.2015534.15 Кб44Hackers Desk Reference.pdf
#
15.03.201510.83 Mб142Hacking GPS.pdf
#
15.03.201512.74 Mб80Hacking Windows XP.pdf