Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Sebery J.Cryptography.An introduction to computer security.1989

.pdf
Скачиваний:
43
Добавлен:
23.08.2013
Размер:
3.94 Mб
Скачать

Complexity of Computing

51

Question: Is there a binary vector v0 2 V such that v0 S = B?

In general, any NPC problem consists of easy instances which are solvable in polynomial time and diÆcult ones for which there is no polynomial-time algorithm unless P = NP. When an intractable problem is used in a cryptographic design to thwart some possible attacks, it is essential to make sure that almost all instances applied are diÆcult ones.

2.3.9 Probabilistic Computations

As mentioned before, the eÆciency of algorithms depends on the encoding scheme used to represent instances and the model of computation. As there is no substantial room for improvement if you use a reasonable encoding scheme, the only way to increase eÆciency of algorithms is to apply a di erent model of computations.

Probabilistic methods in computations are mostly used to simulate large systems which work in a probabilistic environment. By its nature probabilistic computations do not guarantee \error free" computations. Sometimes when the error rate can be made as small as required, probabilistic computations may be an attractive alternative.

The reader should be warned that there is no consensus about de nitions of probabilistic algorithms discussed below. Our de nitions are in line with those accepted by Brassard and Bratley [60] and Gill [202].

A Monte Carlo algorithm is a probabilistic algorithm which solves a decision problem. A yes-biased Monte Carlo algorithm never makes mistakes if it deals with \yes" instances. If the algorithm handles a \no" instance, it may make a mistake with the probability ". A no-biased algorithm correctly solves \no" instances making mistakes for \yes" instances with probability ".

A Las Vegas algorithm is a probabilistic algorithm which for any instance of the problem may either give the correct answer with the probability 1 " or fail with probability ". If the algorithm fails, it returns \no answer."

The primality testing (the PRIM problem) calculated using DTM requires sub-exponential time. It can be run faster, as a matter of fact, in polynomial time if we are ready to accept a small chance of mistake in computations. The probabilistic algorithm for solving PRIM returns either \prime" or \composite." If the tested integer is prime, the algorithm never makes mistakes. If, however, the tested integer is composite, it returns \prime" with some probability " <

52 2 BACKGROUND THEORY

1. The algorithm can be repeated n times for n independent inputs. If the algorithm consistently has answered \prime," we can assume that the integer is prime. The probability of mistake (i.e. the integer is in fact composite) is "n.

2.3.10 Quantum Computing

A new paradigm in computation which has an explosive potential to revolutionize the theory and practice of computation is quantum computing. Unlike classical computers, the quantum ones are based on the rules of quantum mechanics. The idea of quantum computing emerged in early 1980 when Richard Feynman [181] asked about the suitability of classical computers to simulate physics. He also gave an abstract model of a quantum simulator. In 1985 David Deutsch [150] gave a model of universal quantum computer. The power of the quantum computer comes from the phenomenon of quantum parallelism. Roughly speaking, a classical bit can be either 0 or 1 while a quantum bit is a superposition of both 0 and 1. Thus a register of n quantum bits can be seen as a collection of 2n potential states existing at the same time. A con rmation of the extraordinary power of quantum computers came when Peter Shor showed that the factoring and discrete logarithm problems are computable in polynomial time! This development has a dramatic impact on the RSA cryptosystem as the security of RSA depends upon the intractibility of factoring.

The crucial issue related to the quantum computer is its implementation. We can build some components, such as negation gates and 2-bit quantum gates. The full general-purpose quantum computer is still far away. We may expect that some specialized quantum computers will be easier to implement. These include a factoring engine. Readers who would like to nd out more about the topic are referred to [59, 244, 534].

2.4 Elements of Information Theory

It would be diÆcult to discuss any matter concerning cryptography without referring to the fundamental precepts of information theory. Claude Shannon, who is seen as the father of this discipline, published in 1948 the seminal work [467], in which he formulated principles of reliable communication via a noisy channel. One year later Shannon extended his information theoretic approach to secrecy systems [468] and provided the greater portion of the theoretical

Elements of Information Theory

53

foundation for modern cryptography. The principal tools of secure communications across a channel are codes and ciphers. A code is a xed predetermined \dictionary," where for every valid message, there is an equivalent encoded message called a codeword. Coding theory addresses itself to the \noisy channel" problem, where by selecting a particular code, if a message M is distorted to M0 during transmission, this error can be detected and hopefully corrected to the original message. On the other hand, ciphers are a general method of transforming messages into a format whose meaning is not apparent.

2.4.1 Entropy

An information source is one of the basic components of any communication and secrecy system. It generates messages which are later transmitted over a communication channel. In most cases, a probabilistic model of the information source seems to be adequate. So the source is represented by a random variable S with the collection of source states (also called messages) S = fs1; s2; : : : ; skg and associated probabilities P (S = si) = p(si) for each state i = 1; : : : ; k. The entropy of a discrete message source is de ned as:

 

k

 

1

 

 

H(S) =

X

p(si) log2

:

(2.23)

i=1

p(si)

 

 

 

 

 

 

 

 

 

Each log2(p(si) 1) term represents the number of bits needed to encode the message optimally. When all the messages are equally likely, i.e. p(s1) = p(s2) =

: : : = p(sk) = k1 , then H(S) is log2 k. If k = 2n, then n bits are needed to encode each message. The value of H(S) ranges between its maximum value log2 k and

its minimum of zero when there is a single message with the probability 1. Note that when there is only one message, there is no choice of messages. The entropy of a source H(S) also measures its uncertainty, in that it indicates the number of bits of information that must be acquired to recover the message. The uncertainty of a message cannot exceed log2 k bits, where k is the possible number of messages. 1

Consider a random variable that takes on two values s1 and s2 with probabilities,

1If a message is known to contain a marital status, either married or single, then the uncertainty is only one bit, since there are only two possibilities for the rst character, and once this is determined the message can be recovered. If the message was a student number, then the uncertainty is greater than one bit but will not exceed log2 k bits.

54 2 BACKGROUND THEORY

p(s1) = " and p(s2) = 1 ":

What is the maximum entropy of the random variable, and how does the entropy behave as a function of "? First of all, we apply the de nition of entropy to obtain,

 

 

 

 

2

 

 

 

 

 

 

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

X

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

H(S) =

p(si) log2 p(si) = " log2 " (1 ") log2(1 "):

i=1

As H(S) is a function of ", we nd its derivative:

 

 

 

dH(S)

=

log2

 

" + log2(1 "):

 

 

 

 

 

 

 

 

 

 

 

d"

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Clearly,

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

dH(S) = 0 for " =

1:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

d"

 

 

 

 

 

 

 

 

 

 

 

 

2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

As the second derivative,

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

d2H(S)

 

 

1

 

 

 

1

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

d"2

 

 

 

=

 

(

" +

 

 

)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ln 2

1 "

 

 

 

 

 

 

 

 

 

 

 

 

is negative for " =

 

1

, H(S) can have its maximum at " =

1

unless it has its

2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2

 

maximum at " = 0 or " = 1. We calculate the values of H(S) at points

H(S)

 

 

 

 

= lim

 

"log

1

+ (1

 

") log

 

 

 

1

 

 

 

 

j"=0

2 "

 

2 1

"

 

 

 

and,

 

"!0

 

 

 

 

 

 

 

 

 

 

 

H(S)

 

"=1= lim

 

 

 

"log

1

+ (1

 

") log

 

 

 

1

 

:

 

 

j

 

2 "

 

2 1

"

 

 

 

 

 

 

 

"!1

 

 

 

 

 

 

 

 

j"=1= 0. In other words the

Now, as lim"!0(" log2 ") = 0, H(S)

j"=0= H(S)

maximum entropy of the random variable with just two values is attained for

the uniform distribution p(s ) = p(s

) =

1

, and then H(S) = log 2 = 1 (Figure

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

 

 

 

 

2

2

 

 

 

 

2

2.5).

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Let S and X be two random variables de ned over the sets S = fs1; : : : ; skg

and X = fx1; : : : ; xng, respectively. Assume that for every value of xj 2 X, we

know conditional probabilities p(si j xj ) = P(S = si j X = xj). Now we can

de ne the entropy in S conditional on xj as follows: H(S j xj) =

ik=1 p(si j

xj) log2 p(si

j

xj ). The conditional entropy

P

 

n

H(S j X) = H(S j xj)P(X = xj)

 

 

 

j=1

 

 

 

X

 

is also called equivocation and characterizes the uncertainty about S knowing X. The following properties hold for the entropy:

Elements of Information Theory

55

y

log 2

x

0

0.5

1

Fig. 2.5. The entropy function y = H(x)

1.H(S j X) H(S) { any side information X never increases the entropy of

S.

2.H(S; X) = H(S) + H(X j S) = H(X) + H(S j X) { the joint entropy of the pair (S; X) is the sum of uncertainty in X and the uncertainty in S provided X is known.

3.If S and X are independent random variables then H(S; X) = H(S)+H(X).

4.If (X1; : : : ; Xn) is a collection of random variables, then H(X1; : : : ; Xn) =

H(X1) + H(X2 j X1) + : : : + H(Xn j X1; : : : ; Xn 1).

The properties can be easily proven and the proofs are left to the reader as an exercise.

2.4.2 Hu man Codes

Assume we have a discrete source that is represented by the random variable S. Clearly, H(S) speci es the average number of bits necessary to represent messages from the set S = fs1; : : : ; skg. Can we encode messages in such a way that their average length is as short as possible and hopefully equal to H(S)? The Hu man algorithm gives the answer to this question.

Hu man code { produces an optimal binary representation of messages of the source de ned by S = fs1; : : : ; skg with their probabilities p(s1); : : : ; p(sk) (recursive algorithm).

H1. Recursive step: Assume that there are j messages (j k). Order the messages according to their occurrence probabilities. The ordered collection of

56

2 BACKGROUND THEORY

s1

0

s 3

0

1

s 4

0

1

s 2

1

Fig. 2.6. An example of Hu man code

messages is x1; : : : ; xj (j k) and p(x1) p(x2) : : : p(xj ). Suppose further that the encodings of the original messages s1; : : : ; sk have their partial codes b1; : : : ; bk. At the beginning, all bis are empty. Choose the two rst messages and merge them creating a new message x1;2 with its occurrence probability p(x1;2) = p(x1) + p(x2). If si has been merged into x1, put the pre x 0 to the encoding, i.e. bi becomes 0kbi. If si has been merged into x2, put the pre x 1 to the encoding i.e. bi becomes 1kbi. Otherwise the encodings are unchanged. Note that k stands for the concatenation. Repeat the algorithm for the collection of x1;2; x3; : : : ; xj messages.

H2. Stopping step: There are two messages x1 and x2 only. Create the nal codes by putting 0 at the front of all encodings of messages which have been merged into x1 and 1 otherwise.

Consider a source S = fs1; s2; s3; s4g with their corresponding probabilities 18 ; 12 ; 18 ; 14 . The messages (s1; s3; s4; s2) are ordered and their probabilities are 18 ; 18 ; 14 ; 12 . The partial encodings are b1 = 0 and b3 = 1 and s1 and s3 are merged into x1 and x1 occurs with the probability 14 . The messages (x1; s4; s2) are already in the requested order so x2 is a merge of x1 and s4. The partial encodings are b1 = 00, b3 = 01 and b4 = 1. Finally x2 is merged with s2 and the codes are: b1 = 000, b3 = 001, b4 = 01, b2 = 1. It is easy to check that H(S) = 148 = 1:75 and the average length of the Hu man code is L = 3 18 + 3 18 + 2 14 + 12 = 1:75. The code is illustrated in Figure 2.6

Elements of Information Theory

57

2.4.3 Redundancy of the Language

Any natural language can be treated as a message source with a complex structure. Entropy is a convenient tool that can be used to specify probabilistic behavior of the language. Let S(k) be a random variable de ned over the set

S S : : : S with the corresponding probability distribution for sequences of

|

 

{z

 

 

}

 

 

 

k

 

 

 

length k where

= N indicates the number of letters in the alphabet. The

rate of languagejSS(kj) for messages of length k is de ned as

 

rk =

H(S(k))

;

 

 

k

 

 

 

 

 

 

which denotes the average number of bits of information in each character. For English, when k is large, rk has been estimated to lie between 1.0 and 1.5 bits/letter. The absolute rate of a language is the maximum number of bits of information that could be encoded in each character assuming that all combinations of characters are equally likely. If there are j S j= N letters in the alphabet, then the absolute rate is given by R = log2 N, which is the maximum entropy of the individual characters. For a 26-character alphabet this is 4.7 bits/letter. The actual rate of English is much less as it is highly redundant, like all natural languages. Redundancy stems from the underlying structure of a language. In particular, certain letters and combinations of letters occur frequently, while others have a negligible likelihood of occurring (e.g, in English the letters e, t, and a occur very frequently, as do the pairs, or digrams th and en, while z and x occur less frequently).2

The redundancy of a language with rate r is de ned as D = R r. When r = 1 and R = 4:7 then the ratio DR shows that English is about 79 percent redundant.

Consider a language that consists of the 26 letters of the set S = fA; B; C; D; E; F; G; H; I; J; K; L; M; N; O; P; Q; R; S; T; U; V; W; X; Y; Zg = fs1; s2; : : : ; s26g. Suppose the language is characterized by the following sequence of probabilities:

2In coding theory, redundancy refers to that portion of a codeword that is used to transmit check symbols, so as to allow error detection and possible correction. This portion of the codeword contains no information.

58 2 BACKGROUND THEORY

P(s1) =

1

;

 

P (s2) =

1

 

2

 

 

 

 

 

 

4

 

P(si) =

1

 

 

for

i = 3; 4; 5;6; 7; 8;9; 10

64

 

 

 

 

 

 

 

 

P(si) =

1

 

for

i = 11; : : : ; 26

128

The entropy of our single letter language is:

r1 = H(S)

 

 

 

=

P

26

P (si) log

1

 

P(si)

 

 

i=1

2

= 12 log2 2 + 14 log2 4 + 8(641 log2 64) + 16(1281 log2 128) = 2:625

Now the language has N = 26 letters so, R = log2 26 = 4:7 :

In other words, the redundancy of the language calculated for single letter probabilities is

D1 = R r1 = 4:7 2:625 = 2:075:

This example only applies to a language structure that is described by the probability distribution of single letters only. This description should be treated as the very rst approximation of language's statistical structure. As a matter of fact, a natural language's structure is far more complex and the second approximation of a language's structure gives better picture of statistical properties.

Let our language be de ned as in the previous example and let its conditional probability distribution be given as follows:

P(s0i+1 j si) = P (si0+2 j si) =

1

for i = 1; : : : ; 24

2

P(s260

j

s25) = P (s10

j

s25) =

1

 

2

 

P(s10 j s26) = P (s20

j s26) =

1

 

2

 

where P (s0i+1 j si) means the conditional probability that the second letter is si+1 provided that the rst letter is si.

Now, we calculate the probability distribution of two letter sequences:

 

 

 

 

 

Elements of Information Theory

59

P(s1; s20 ) = P (s20

j s1)P (s1)

=

1

 

 

 

4

 

 

 

P(s1; s30 ) = P (s30

j

s1)P (s1)

=

1

 

 

 

4

 

 

 

P(s2; s30 ) = P (s30

j s2)P (s2)

=

1

 

 

 

8

 

 

 

P(s2; s40 ) = P (s40 ;

j

s2)P(s2)

=

1

 

 

 

8

 

 

 

P(si; si0+1) = P (si0+1 j si)P (si)

=

1

for i=3, . . . , 10

 

128

 

P(si; si0+2) = P (si0+2 j si)P (si)

=

1

for i=3, . . . , 10

 

128

 

P(si; si0+1) = P (si0+1 j si)P (si)

=

1

for i=11, . . . , 24

 

256

 

P(si; si0+2) = P (si0+2 j si)P (si)

=

1

for i=11, . . . , 24

 

256

 

P(s25; s026) = P (s25; s01) = P(s26; s01) = P(s26; s20 ) = 2561

All other probabilities are equal zero and, in this case, the entropy of two-letter language sequences is equal to:

H(S(2)) =

P

26

P (si; s0

) log

 

 

1

 

 

 

 

 

 

 

 

 

 

 

 

0

)

 

 

 

 

 

 

 

 

 

 

i;j=1

 

j

 

2

 

P(si;s

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

j

 

 

 

 

 

 

 

 

 

= 2(

1

log

2

4) + 2(

1

log

2

8) + 16(

1

 

log

2

128) + 32(

1

log 256)

4

8

128

256

 

 

 

 

 

 

 

 

 

 

 

2

= 3:625:

Consider, for the moment, the entropy H(S) from the previous example and compare it to H(S(2)). We can immediately state that H(S(2)) H(S) = 1.

This equation means that, having the rst letter, we can obtain the second one using one bit only. This results from the fact that there are two equally probable candidates. For example, if the rst letter is s3 = C, then the second letter may

be either s04 = D or s05 = E.

Returning to our example, we calculate the rate of the language for messages of length 2, namely,

r2 = 12H(S(2)) = 1:8125:

As the absolute rate of our language is xed and depends on the number of letters only, the redundancy D2 is:

D2 = R r2 = 2:9

We can now state that the language considered is 60 percent redundant.

60 2 BACKGROUND THEORY

We note that the more redundant a language is, the stronger the statistical relations between the letters in a sequence. On the other hand, if a language has no redundancy, then occurrences of subsequent letters are statistically independent.

Once we have dealt with a natural language, we can easily calculate the entropy of a single letter r1 = H(S). Also the entropy r2 = H(S(2))=2 of

two-letter words can be found relatively easily. Unfortunately, the amount of calculation for rn = H(S(n))=n grows exponentially as a function of n. So, the real redundancy of language, which can be expressed as

H(S(n)) r1 = lim n ;

n!1

is estimated using several earlier evaluated entropies.

2.4.4 Key Equivocation and Unicity Distance

Consider the encryption system from Figure 2.7. The cryptosystem consists of three basic components: the message source, the key generator, and the encryption block. The message source is characterized by the random variable M and describes statistical properties of the language generated by the source. The set of all characters in the language alphabet is M. The key generator selects keys randomly with uniform probability from the whole set K. Once chosen, the key stays xed for some time. The encryption block uses a publicly known algorithm to encrypt messages into cryptograms under the control of the secret key. The set of possible cryptograms is denoted by C.

The sender applies the cryptosystem (or cipher) for n consecutive messages and produces n corresponding cryptograms (ciphertexts). An enemy cryptan-

MESSAGE

M

ENCRYPTION

C

 

 

SOURCE

 

E

 

 

 

K

 

GENERATOR

OF KEYS

Fig. 2.7. Graphical presentation of the interrelations between messages and cryptograms in a binary cryptosystem

Соседние файлы в предмете Электротехника