Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Constantinides G.A., Cheung P.Y.K., Luk W. - Synthesis and optimization of DSP algorithms (2004)(en).pdf
Скачиваний:
20
Добавлен:
15.08.2013
Размер:
1.54 Mб
Скачать

6.3 Optimum Solutions

117

The cost function used for each resource is given in terms of its wordlength in (6.3), where k R is a technology-dependent constant representing the relative cost of adder and multiplier implementations.

cost(r) =

k

·· r, r N

N × N

(6.3)

 

p

q, r = (p, q)

 

 

Example 6.4. As a motivational example, consider again the sequencing graph representing data-dependencies shown in Fig. 6.2. An area-optimal schedule, binding and word-length selection for this sequencing graph is illustrated in Fig. 6.3 for the case k = 1 and no operation pipelining. This resource allocation consists of two adders: one of 25-bits and one of 19-bits, and three multipliers: one is a 19 × 17-bit multiplier, one a 33 × 17-bit multiplier, and one a 40 × 12bit multiplier. The graphical matrix illustrates which resource is being used by which operation at which time step.

Note that in Fig. 6.3 resources can perform operations up to the wordlength of the resource, even if implementation in a larger resource leads to a longer latency than a ‘tight-fitting’ resource would require. For example operation m4 is implemented in a resource of latency 7 cycles, although its word-length only requires a 22 × 16-bit multiplier which would take only 5 cycles to complete. This ‘stretching’ of operations that are not on the critical path can conceivably lead to significantly reduced area, by exposing possibilities for resource sharing.

6.3 Optimum Solutions

Integer Linear Programming (ILP) [GN72] has been used in high-level synthesis for some time [HLH91, Ach93, LP93, DeM94, LMD94]. This section presents an extension to these ILP formulations in order to solve Problem 6.3 [CCL00c]. Formulation as an ILP is useful from an analytical perspective, because it formalizes the problem and its constraints. In addition, for small problem instances, ILP solvers such as lp solve [Sch97] may be used to obtain globally optimum solutions to the synthesis problem. These optimum solutions are valuable references for comparison with heuristic approaches.

6.3.1 Resources, Instances and Control Steps

Before presenting the ILP formulation of Problem 6.3, it is necessary to define certain quantities and notations, to be used in the following sections.

The starting point for the ILP approach is a sequencing graph P (V, D) and a target overall latency constraint λ. The latency constraint corresponds to a user-specified upper bound on the number of clock cycles which may elapse

118 6 Scheduling and Resource Binding

 

 

Time Step

 

Resource Type

 

25

19

(19,17)

(33,21)

(40,12)

0

a2

1

 

2

 

3

 

4

 

5

 

6

 

7

 

8

 

9

 

10

 

11

 

12

 

13

 

14

a4

15

 

16

a3

17

 

m1

m2

a1

m3

m4

 

m 5

 

 

 

Fig. 6.3. An optimum scheduling, resource binding, and word-length selection for the sequencing graph illustrated in Fig. 6.2

between the start of the first operation in the sequencing graph, and the end of the last operation in the sequencing graph.

Let Vm = {v V : type(v) = mult} and similarly let Va = {v V : type(v) = add}.

Any resource of the correct type, and large enough in word-length, can perform an operation. For example a resource type (p, q) can perform any p × q -bit multiplication, so long as p ≤ p and q ≤ q. However the searchspace for area-e cient implementations may be trimmed significantly by observing that area-optimal resource bindings will only ever use the resource word-length that is just large enough to cover all operations assigned to that

6.3 Optimum Solutions

119

resource. For an adder to which operations V V have been assigned, this corresponds to a word-length of max bA(a). For a multiplier to which op-

erations V

 

 

a V

 

 

V have been assigned, this corresponds to a word-length of

tion

 

"

#

 

"

#

 

 

( max π

bM

max π

 

bM (m) ), where π ( ) and π ( ) are the projec-

m V 1

(m) , m V

2

 

1

·

2 ·

 

operators (6.4).

 

 

 

 

 

 

 

 

 

 

π1

(p, q) = p

 

(6.4)

 

 

 

 

 

π2(p, q) = q

 

 

 

 

 

 

 

 

There are therefore only certain resource types which can arise from the optimal sharing of resources between operations. Let RA(a) denote the set of adders which could implement the addition a Va and RM (m) denote the set of multipliers which could implement the multiplication m Vm. Then RA(a) and RM (m) are given by (6.5) and (6.6) respectively. Together, these resource types form a resource-set R(v) for each operation v V (6.7). R denotes the set of all such resource types (6.8).

RA(a) = {p bA(Va) : p ≥ bA(a)}

(6.5)

RM (m) = {(p, q)| (p, b) bM (Vm), (c, q) bM (Vm) :

M

(m)}

p ≥ c q ≥ b p ≥ d q ≥ e where (d, e) = b

 

RM (v), v Vm

 

 

R(v) = RA(v), v Va

 

 

$

 

 

R = R(v)

 

 

v V

 

 

(6.6)

(6.7)

(6.8)

An upper bound I(r) may be placed on the number of instances of each resource type that could arise. For an adder resource, there can be as many instances of a w-bit adder as there are w-bit addition operations (6.9). For a multiplier resource, each p × q-bit resource can only arise due to resource sharing of a p × b-bit and a c × q-bit multiplication with p ≥ c and q ≥ b. The number of these pairings is bounded by (6.10).

I(r) = |{a Va : bA(a) = r}|, for r R(Va)

(6.9)

M

 

I(p, q) = min { |{m Vm : q ≥ e where (p, e) = b M(m)}|,

(6.10)

|{m Vm : p ≥ d where (d, q) = b (m)}| } ,

for (p, q) R(Vm)

From (6.2) it is possible to define the maximum latency max(v) and minimum latency min(v) of each operation v V according to (6.11, 6.12).

120 6 Scheduling and Resource Binding

min(v) =

min L(r)

(6.11)

 

r R(v)

 

max(v) =

max L(r)

(6.12)

 

r R(v)

 

In order to bound the possible execution control steps of each operation, it is necessary to utilize as-soon-as-possible (ASAP) and as-late-as-possible (ALAP) scheduling [Par99], provided in Algorithms ASAP and ALAP for completeness.

Algorithm 6.1 Algorithm ASAP

Input: A sequencing graph P (V, D) and a latency (v) for each operation v V Output: A schedule S : V → N {0}

begin

mark(v) false for all v V foreach v V : (v , v) D do

S(v) 0 mark(v) true

end foreach do

foreach v V : (v , v) D, mark(v ) = true do

S(v) max {S(v ) + (v )}

(v ,v) D

mark(v) true end foreach

while v V : mark(v) = false end

Algorithm 6.2 Algorithm ALAP

Input: A sequencing graph P (V, D), latency constraint λ, and a latency (v) for each operation v V

Output: A schedule S : V → N {0} begin

mark(v) false for all v V foreach v V : (v, v ) D do

S(v) ← λ − (v) mark(v) true

end foreach do

foreach v V : (v, v ) D, mark(v ) = true do

S

(v)

min

{S

(v )

(v)

 

(v,v ) D

} −

 

mark(v) true end foreach

while v V : mark(v) = false end