Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Jones N.D.Partial evaluation and automatic program generation.1999

.pdf
Скачиваний:
9
Добавлен:
23.08.2013
Размер:
1.75 Mб
Скачать

Three mini-languages 51

datatype lambda =

 

 

 

Int of int

 

(* Constant

*)

| Var of string

 

(* Variable

*)

| Abs of string * lambda

 

(* Abstraction

*)

| Apply of lambda * lambda

(* Application

*)

| Op of string * lambda list

(* Base application *)

| If of lambda * lambda * lambda

(* Conditional

*)

and value =

 

 

 

Numb of int

 

 

 

| Closure of lambda * (string list * value list)

 

fun lookup (x, (n::ns, v::vs)) = if n = x then v

 

 

else lookup(x, (ns, vs))

 

fun eval (Int n,

env) = Numb n

 

| eval (Var x,

env) = lookup(x, env)

 

| eval (Abs(x,e),

env) = Closure(Abs(x,e), env)

 

| eval (Apply(e,f),

env) =

 

 

let val f1 = eval(f, env)

 

 

val Closure(Abs(x,e1), (ns, vs)) = eval(e, env)

 

in eval (e1, (x::ns, f1::vs)) end

 

 

|eval (Op("+",[e1, e2]), env) = let val Numb v1 = eval (e1, env)

val Numb v2 = eval (e2, env)

in Numb (v1

+ v2) end

 

 

 

 

| eval (If (e,f,g),

env) =

 

 

 

case eval(e, env) of

 

 

 

 

(Numb

1)

=> eval(f, env)

(*

1 is true

*)

| (Numb

_)

=> eval(g, env)

(*

non-1 is false *)

fun interpret e = eval (e, ([], []))

Figure 3.1: Interpreter for call-by-value lambda calculus.

Environments represent delayed substitution

An obvious approach would be to program the reduction rules and substitution directly in ML, but this is rather ine cient. Computing [N/x]M by substituting N into M at every -reduction is impractical, since it may build extremely large expressions by repeatedly copying N.

A better alternative is to use an environment to map the free variables of an expression to their values. This amounts to a `delayed' substitution: instead of computing [N/x]M explicitly, we represent it by a pair

(M, env[x7!N])

where the updated environment env[x7!N] records the fact that x will, when

52 Programming Languages and Interpreters

referenced, be replaced by N. Such a pair of an expression and an environment is called a closure, and is widely used in practical implementations of functional languages. Note that the substitution is not actually done; but the environment remembers that it is to be e ectuated when variable x is used in M.

As a consequence the interpreter will manipulate two types of values: numbers and closures. Whenever a reduction ( x.M)(N) ) [N/x]M is to be performed, the evaluator updates the environment and then evaluates M in the new environment. All references to x within M are looked up in the environment.

The environment is represented by a pair of a name list and a value list, such that valuei is the value of the variable called namei :

([name1,. . .,namen],[value1,. . .,valuen])

The interpreter written in ML is shown in Figure 3.1. In the conditional, 1 is used for `true' and all other numbers for `false'. The interpreter performs call-by-value reduction of the lambda calculus; namely, in the Apply branch, ML evaluates the binding let val f1 = eval(f, env) of the argument f1 before the lambda body e1.

The interpreter will fail with an error on a lambda expression which attempts to apply a number, or add a closure to something, or determine the truth value of a closure. This is a kind of graceless dynamic typechecking. Also, it will fail (in lookup) if an unbound variable is referenced, so it works only for programs which are closed lambda expressions: those without free variables.

3.3.2An interpreter for rst-order recursion equations

Syntax

Here we describe a language of call-by-value recursion equation systems. The function de ned by the equation system is the one given by the rst equation.

hProgrami

::=

hEquationi, . . . , hEquationi

Function de nitions

hEquationi

::=

hFuncNamei (hVarlisti) = hExpri

hVarlisti

::=

hVari, . . . , hVari

Formal parameters

hExpri

::=

hConstanti

Constant

 

j

hVari

Variable

 

j

if hExpri then hExpri else hExpri

 

j

hFuncNamei (hArglisti)

Function application

 

j

hOpi hExpri . . . hExpri

Base application

hArglisti

::=

hExpri, . . . , hExpri

Argument expressions

As in the lambda calculus, we leave unspeci ed the set of possible constant values hConstanti and the base functions hOpi operating on them. However, note that the interpreter in Figure 3.2 implements only integer constants. We allow expressions to be written using customary precedence and associative rules, in x notation, etc.

Three mini-languages 53

datatype expr =

 

 

 

Int of int

 

(* Constant

*)

| Var of string

 

(* Variable

*)

| If of expr * expr * expr

(* Conditional

*)

| Call of string * expr list

(* Function call

*)

| Op of string * expr list

(* Base application

*)

fun lookup (x, (n::ns, v::vs)) =

 

if x = n then v else lookup(x, (ns, vs))

 

fun eval (Int n,

env, pgm) = n

 

| eval (Var x,

env, pgm) = lookup(x, env)

 

| eval (Call(f, exps),

env, pgm) =

 

let val vals = evlist(exps,env,pgm)

 

val (vars, exp) = lookup(f, pgm)

 

in eval(exp, (vars, vals), pgm) end

 

| eval (Op("+",[e1,e2]), env, pgm) =

(* similar for *,-,... *)

eval(e1,env,pgm) + eval(e2,env,pgm)

 

 

| eval (If(e,f,g),

env, pgm) =

 

 

case eval(e, env, pgm) of

 

 

1 => eval(f, env, pgm)

(* 1 is true

*)

| _ => eval(g, env, pgm)

(* non-1 is false *)

and evlist ([],

_, _) = []

 

| evlist (e::es,

env, pgm) =

 

 

eval(e, env, pgm) :: evlist(es, env, pgm)

fun interpret (pgm, args) =

let val (_, (vars, exp)::_) = pgm in eval(exp, (vars, args), pgm) end

Figure 3.2: Interpreter for call-by-value recursion equations.

Scope

In an equation

f(x1,. . . ,xn) = expression

the scope of x1,. . . ,xn is the expression. This means that variables x1,. . . ,xn bound on the left hand side can be used only inside the expression. Moreover, these are the only variables that can be used in the expression. The variables x1,. . . ,xn must all be distinct.

Call-by-name and call-by-value in recursion equations

The distinction between call-by-value and call-by-name (see Section 3.2.5) applies to recursion equations as well as to the lambda expressions. An example where termination behaviour di ers is:

54 Programming Languages and Interpreters

h(x,y) = if y 1 then y else h(h(x+1,y),y-2)

Using call-by-name, h(1,2) evaluates to 0, but using call-by-value it is unde ned.

Operational semantics

Figure 3.2 shows an interpreter for recursion equations, written in ML. A program is represented as a pair of lists. The rst is a list of function names [f1,. . . ,fn] and the second is a list of corresponding pairs ([xi1,. . . ,xiai ], bodyi ), where [xi1,. . . ,xiai ] are the parameters of function fi and bodyi is its body.

In ML the type of a program can be described by

type funcname = string type var = string

type program = (funcname list) * ((var list * expr) list)

The interpreter passes round the entire program in parameter pgm; this allows tond the de nition of a called function. The interpreter uses an environment env just as the previous interpreter did.

Remarks

1.A function arguments are evaluated by call-by-value. Namely, in the Call branch, ML evaluates the binding let val vals = evlist(exps,env,pgm) of the argument values before the function body exp. A call-by-name semantics can also be de ned, by letting the environment bind variables to closures that contain unevaluated argument expressions; these (sometimes called thunks or suspensions) are evaluated when needed, e.g. to perform a test or as arguments of a base function that demands evaluated arguments (e.g. addition).

2.We have not allowed functions as expression values or function arguments, but this is easily accommodated using closures, as in Section 3.3.1.

3.The interpreter fails if an unbound variable is referenced.

3.3.3An interpreter for ow chart programs

Flow charts form a simple imperative language much closer to traditional machine architectures than the functional languages just discussed. Their essential characteristic is that computation proceeds sequentially by execution of a series of commands, each of which updates some component of an implicit program state. This state usually consists of the values of certain registers (accumulators, index registers, etc.) together with a store or memory which maps variables or their corresponding storage locations to the values currently stored in them.

Three mini-languages 55

Syntax

Following is a grammar for ow chart programs:

hProgrami

::=

read hVari, . . . , hVari; hBasicBlocki+

hBasicBlocki

::=

hLabeli: hAssignmenti hJumpi

hAssignmenti

::=

hVari := hExpri;

hJumpi

::=

goto hLabeli;

 

 

j

if hExpri goto hLabeli else hLabeli;

 

j

return hExpri;

 

hExpri

::=

hConstanti

 

 

j

hOpi hExpri . . .

hExpri

hLabeli

::=

any identi er or number

The set hConstanti of constants and the set hOpi of base functions are left unspec- i ed, and again the interpreter in Figure 3.3 implements only integer constants. The values of the variables mentioned in the initial read statement are supplied by the input. The values of all other (numeric) variables are initially zero.

Example

A program to compute the greatest common divisor of natural numbers x and y:

read x, y;

1:if x = y goto 7 else 2

2:if x < y goto 5 else 3

3:x := x - y goto 1

5:y := y - x goto 1

7: return x

Operational semantics

As for the previous two languages we give the semantics for ow chart programs operationally by an interpreter written in ML. As before we use an abstract program syntax. The store behaves much like the environment in the previous interpreter, but with a di erence: it must also be possible to change the value bound to a variable, i.e. to update an existing store. Equivalence with the mathematical semantics given in the next section can be proven by induction on the length of a computation.

The kernel of the interpreter in Figure 3.3 is the command execution function run, which returns the program's nal answer, provided it terminates. The rst argument is the current `point of control', and the second argument is the command to be executed. If control is not transferred, then the command is performed, the store is updated, and the instruction counter is incremented. In all cases, the function nth is used to nd the next command to be executed.

As seen in function lookup, the initial value of a non-input variable is 0. The

56 Programming Languages and Interpreters

datatype expr =

 

 

 

Int of int

(* Integer constant

*)

| Var of string

(* Variable

*)

| Op of string * expr list

(* Base application

*)

and command =

 

 

 

Goto of int

(* Goto command

*)

| Assign of string * expr

(* Assignment command

*)

| If of expr * int * int

(* If-then-else command *)

| Return of expr

(* Return command

*)

and program = Read of string list * command list

 

fun nth (c::cs, 1) = c

 

 

 

| nth (c::cs, n) = nth(cs, n-1)

 

 

fun lookup (x, ([], []))

= 0

(* Initial value *)

|lookup (x, (n::ns, v::vs)) =

if x = n then v else lookup(x, (ns, vs))

fun update (([], []), x, w)

= ([x], [w])

|update ((n::ns, v::vs), x, w) = if x = n then (n::ns, w::vs)

else let val (ns1, vs1) = update((ns,vs), x, w) in (n::ns1, v::vs1) end

fun

eval (Int n,

s) =

n

|

eval (Var x,

s) =

lookup(x, s)

|

eval (Op("+",[e1,e2]), s) =

eval(e1, s) + eval(e2, s)

fun

run (l, Goto n,

s, p)

= run(n, nth (p, n), s, p)

|run (l, Assign(x, e), s, p) =

let val s1 = update(s, x, eval(e, s))

in run(l+1, nth(p, l+1), s1, p) end

| run (l, If(e,m,n),

s, p) =

if eval(e, s) = 0

 

then run(m, nth(p, m), s, p) else run(n, nth(p, n), s, p)

|

run (l, Return e,

s, p) = eval(e, s)

fun interpret (pgm, args) =

 

let val Read (vars, cmds) = pgm

 

val

(c1 :: _)

= cmds

 

val

store

= (vars, args)

in run(1, c1, store, cmds) end

Figure 3.3: Interpreter for ow chart language.

xk . It

Three mini-languages 57

interpreter fails if a jump to an unde ned address is attempted.

The intention is that the value of run(lab, cmd, s, program) is the nal output resulting from executing program with store s, beginning at command cmd which has label lab.

3.3.4Mathematical semantics of ow charts

Suppose one is given a ow chart program p with input variables x1 . . .

consists of a read-statement read x1 . . . xk; followed by a sequence of labelled basic blocks: l0:bb0 l1:bb1 . . . ln:bbn. If the program has variables x1, . . . , xm (m k) then a store v can be represented by an m-tuple of values v = (v1, . . . , vm).

The program's store (memory) is a mapping from variables to their current values. Input to a program p is a list of values d = (v1 . . . vk), initially bound to x1 . . . xk. All the non-input variables xk+1, . . . , xm have initial value 0, so the initial store is the tuple: (v1, . . . , vk, 0, . . . , 0).

Base functions must have no side e ects. Assignments, conditionals, and jumps are executed in the usual way, and return exp terminates execution, yielding the value of exp as the value of the program execution [[p]]L d .

The meaning of each basic block bb` is a store transformer w`: Valuem ! Valuem computing the e ect of assignments on the store. If bb` = a1; . . .; an; <Jump>, the store transformer w` is de ned by:

w`(v) = (t[[an]] . . . t[[a1]]) v

t[[xj := e]] v = (v1, . . . , vj,1, eval[[e]]v, vj+1, . . . , vm) where v = (v1, . . ., vm)

The control function c`: Valuem ! fl0, . . . , lng [ Value returns either the label of the basic block to be executed after bb`, or the program's nal result in case a return-statement is executed. The function c` is de ned by:

c`(

 

) =

eval[[e]](w`(

 

))

if bb` = . . . ; return e

v

v

=

lj

if bb` = . . . ; goto lj

=

lj

if bb` = . . . ; if e goto lj

 

 

 

 

 

 

and eval[[e]](w`(

 

)) = true

 

 

 

 

 

 

v

=

lk

if bb` = . . . ; if e goto lj

 

 

 

 

 

 

and eval[[e]](w`(

 

)) = false

 

 

 

 

 

 

v

else lk

else lk

A ( nite) computation is a ( nite) sequence

(pp0, v0) ! (pp1, v1) ! . . . ! (ppi, vi) !(ppi+1, vi+1) !

where v0 holds the values d of the input variables x1, . . . , xk, and pp0 = l0. If ppi = l`, then vi+1 = w`(vi), and ppi+1 = c`(vi). Finite computations end with ppt 2 Value for some t.

58 Programming Languages and Interpreters

This de nes the standard evaluation of a ow chart program. If it terminates, the value computed is [[p]]L d = ppt, else [[p]]L d = ?.

3.4Compiling compilers

3.4.1Compiler bootstrapping, an example of self-application

The term bootstrapping comes from the phrase `to pull oneself up by one's bootstraps' and refers to the use of compilers to compile themselves. The technique is widely used in practice, including industrial applications. Suppose we have extended a known language S to a larger one called S0, such that S-programs S0-programs. Suppose also that the extension is conservative, so every S program has the same semantics in both languages. Finally, assume that we already have a compiler from source language S to target language T, and that the compiler is available both in source form h 2 S-programs and in target form t 2 T-programs:

high-level compiler h 2

S

-

T

 

 

 

S

 

 

 

 

 

 

 

 

 

 

 

low-level compiler t 2

S

-

T

 

 

 

 

 

 

T

 

 

 

 

 

The two versions of the compiler must agree, that is, [[h]]S = [[t]]T. Then h and t can be used to create a compiler from S0 to T as follows:

1.Extend the existing compiler h into a compiler h0 2 S-programs for S0. This

must be a conservative extension of h, so that [[[[h]]S p ]]T = [[[h0]]S p ]]T for all S-programs p:

high-level compiler h0 2

S0

-

T

 

 

 

S

 

 

 

 

 

 

 

 

2. Now use t on h0 to obtain an S0 compiler t10 in target language form:

 

S0

-

 

T

 

S0

-

T

 

 

 

 

 

high-level compiler h0 2

S

 

S

-

T

T

3 low-level compiler t10

 

 

 

 

 

 

 

 

 

 

 

 

low-level compiler t 2

 

T

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Compiling compilers 59

3.

Use t10 to obtain an S0 compiler t20 in target language form:

 

 

 

 

 

 

 

 

 

 

 

 

 

S0

-

 

T

 

S0

-

T

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

high-level compiler h0 2

S

 

S0

-

T

T

3 low-level compiler t20

 

 

 

 

 

 

 

 

 

 

 

 

 

 

low-level compiler t10 2

T

 

 

 

 

 

 

 

 

 

 

 

 

 

 

4.

Finally, use t20 to obtain an S0 compiler t30 in target language form:

 

 

 

 

 

 

 

 

 

 

 

S0

-

 

T

 

S0

-

T

 

 

 

 

 

 

 

high-level compiler h0 2

S

 

S0

-

T

T

3 low-level compiler t30

 

 

 

 

 

 

 

 

 

 

 

 

 

 

low-level compiler t20

2

T

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

These runs can be written more concisely in the notation for program execution introduced in Chapter 1:

t10

=

[[t]]

 

h0

 

 

 

T

 

 

t20

=

[[t10

]]

h0

t30

 

[[t20

 

T

h0

=

]]

 

 

 

 

T

 

Now t10 and t20 (and t30) are semantically equivalent since they are all obtained from the same source program, h0.

[[t10]] =

[[[[h]] h0 ]]

T

 

 

 

 

by de nition of t10

T

S

 

 

 

 

 

 

 

 

=

[[[[h0]]

h0 ]]

 

 

 

 

 

since h0 is a conservative extension of h

 

S

 

h0

T

 

h0

 

 

 

=

[[[[[[t]]

 

]]

T

]]

T

since t is a compiler from S to T

 

T

 

 

 

 

 

 

=

[[[[t10]]

 

 

h0 ]]

T

 

 

by de nition of t10

 

T

 

 

 

 

 

 

=

[[t20]]

 

 

 

 

 

 

 

 

 

by de nition of t20

 

T

 

 

 

 

 

 

 

 

 

 

Note that t10 and t20 may not be textually identical since they were produced by two di erent compilers, t and t10, and it is quite possible that the extended language S0 may require di erent target code than S.

However, it should be clear that t20 and t30 are textually identical since the compilers used to compile them are semantically equivalent | at least, if one assumes the extended compiler h0 to be correct. If they are not textually identical, then h0 (or, less likely, t) must be wrong.

Note that bootstrapping involves self-application in the sense that (compiled versions of) h0 are used to compile h0 itself. Moreover, self-application is useful: it gives a simple way to prove h0 wrong.

60 Programming Languages and Interpreters

3.4.2Compiler generation from interpreters

Later in this book we shall see that it is possible to convert an interpreter into a compiler:

S

 

S

-

L

,!

 

L

 

L

 

 

 

 

 

 

 

 

 

by partial evaluation. As already argued in the introduction, this is interesting for several reasons:

In practice, interpreters are smaller, easier to understand, and easier to debug than compilers.

An interpreter is a (low-level form of) operational semantics, and so can serve as a de nition of a programming language.

The question of compiler correctness is completely avoided, since the compiler will always be faithful to the interpreter from which it was generated.

3.5The central problems of compilation

Above we have seen how to interpret, that is, evaluate, programs in three rather di erent mini-languages. The remainder of this book shows how to specialize, that is, partially evaluate, programs in these and more complex languages.

Specialization of an interpreter with respect to a source program amounts to compilation, as already argued in Chapter 1. While a very promising direction for automating compilation, partial evaluation has not yet successfully accomplished all the tasks done by traditional handwritten compilers. Those that have been achieved with considerable success include:

1.Removal of as much interpretation overhead as possible.

2.Generation of target code in which as little computation as possible is done at run-time. Ideally, one wishes to execute at run-time only those computations `that the programmer ordered when he wrote the program'.

3.Going from an interpreter's general-purpose implementation mechanisms to target code that is tailored to execute only one particular source program.

Some achievements of traditional handwritten compilers are, even though generally well-understood, as yet not fully automated. Following are some challenges on which work is currently being done: