- •Contents
- •1 Introduction
- •1.1 Objectives
- •1.2 Overview
- •2 Background
- •2.1 Digital Design for DSP Engineers
- •2.1.2 The Field-Programmable Gate Array
- •2.1.3 Arithmetic on FPGAs
- •2.2 DSP for Digital Designers
- •2.3 Computation Graphs
- •2.4 The Multiple Word-Length Paradigm
- •2.5 Summary
- •3 Peak Value Estimation
- •3.1 Analytic Peak Estimation
- •3.1.1 Linear Time-Invariant Systems
- •3.1.2 Data-range Propagation
- •3.2 Simulation-based Peak Estimation
- •3.3 Hybrid Techniques
- •3.4 Summary
- •4 Word-Length Optimization
- •4.1 Error Estimation
- •4.1.1 Word-Length Propagation and Conditioning
- •4.1.2 Linear Time-Invariant Systems
- •4.1.3 Extending to Nonlinear Systems
- •4.2 Area Models
- •4.3.1 Convexity and Monotonicity
- •4.4 Optimization Strategy 1: Heuristic Search
- •4.5 Optimization Strategy 2: Optimum Solutions
- •4.5.1 Word-Length Bounds
- •4.5.2 Adders
- •4.5.3 Forks
- •4.5.4 Gains and Delays
- •4.5.5 MILP Summary
- •4.6 Some Results
- •4.6.1 Linear Time-Invariant Systems
- •4.6.2 Nonlinear Systems
- •4.6.3 Limit-cycles in Multiple Word-Length Implementations
- •4.7 Summary
- •5 Saturation Arithmetic
- •5.1 Overview
- •5.2 Saturation Arithmetic Overheads
- •5.3 Preliminaries
- •5.4 Noise Model
- •5.4.1 Conditioning an Annotated Computation Graph
- •5.4.2 The Saturated Gaussian Distribution
- •5.4.3 Addition of Saturated Gaussians
- •5.4.4 Error Propagation
- •5.4.5 Reducing Bound Slackness
- •5.4.6 Error estimation results
- •5.5 Combined Optimization
- •5.6 Results and Discussion
- •5.6.1 Area Results
- •5.6.2 Clock frequency results
- •5.7 Summary
- •6 Scheduling and Resource Binding
- •6.1 Overview
- •6.2 Motivation and Problem Formulation
- •6.3 Optimum Solutions
- •6.3.1 Resources, Instances and Control Steps
- •6.3.2 ILP Formulation
- •6.4 A Heuristic Approach
- •6.4.1 Overview
- •6.4.2 Word-Length Compatibility Graph
- •6.4.3 Resource Bounds
- •6.4.4 Latency Bounds
- •6.4.5 Scheduling with Incomplete Word-Length Information
- •6.4.6 Combined Binding and Word-Length Selection
- •6.5 Some Results
- •6.6 Summary
- •7 Conclusion
- •7.1 Summary
- •7.2 Future Work
- •A.1 Sets and functions
- •A.2 Vectors and Matrices
- •A.3 Graphs
- •A.4 Miscellaneous
- •A.5 Pseudo-Code
- •References
- •Index
6.3 Optimum Solutions |
117 |
The cost function used for each resource is given in terms of its wordlength in (6.3), where k R is a technology-dependent constant representing the relative cost of adder and multiplier implementations.
cost(r) = |
k |
·· r, r N |
N × N |
(6.3) |
|
p |
q, r = (p, q) |
|
|
Example 6.4. As a motivational example, consider again the sequencing graph representing data-dependencies shown in Fig. 6.2. An area-optimal schedule, binding and word-length selection for this sequencing graph is illustrated in Fig. 6.3 for the case k = 1 and no operation pipelining. This resource allocation consists of two adders: one of 25-bits and one of 19-bits, and three multipliers: one is a 19 × 17-bit multiplier, one a 33 × 17-bit multiplier, and one a 40 × 12bit multiplier. The graphical matrix illustrates which resource is being used by which operation at which time step.
Note that in Fig. 6.3 resources can perform operations up to the wordlength of the resource, even if implementation in a larger resource leads to a longer latency than a ‘tight-fitting’ resource would require. For example operation m4 is implemented in a resource of latency 7 cycles, although its word-length only requires a 22 × 16-bit multiplier which would take only 5 cycles to complete. This ‘stretching’ of operations that are not on the critical path can conceivably lead to significantly reduced area, by exposing possibilities for resource sharing.
6.3 Optimum Solutions
Integer Linear Programming (ILP) [GN72] has been used in high-level synthesis for some time [HLH91, Ach93, LP93, DeM94, LMD94]. This section presents an extension to these ILP formulations in order to solve Problem 6.3 [CCL00c]. Formulation as an ILP is useful from an analytical perspective, because it formalizes the problem and its constraints. In addition, for small problem instances, ILP solvers such as lp solve [Sch97] may be used to obtain globally optimum solutions to the synthesis problem. These optimum solutions are valuable references for comparison with heuristic approaches.
6.3.1 Resources, Instances and Control Steps
Before presenting the ILP formulation of Problem 6.3, it is necessary to define certain quantities and notations, to be used in the following sections.
The starting point for the ILP approach is a sequencing graph P (V, D) and a target overall latency constraint λ. The latency constraint corresponds to a user-specified upper bound on the number of clock cycles which may elapse
118 6 Scheduling and Resource Binding |
|
|
||
Time Step |
|
Resource Type |
|
|
25 |
19 |
(19,17) |
(33,21) |
(40,12) |
0 |
a2 |
|
1 |
||
|
||
2 |
|
|
3 |
|
|
4 |
|
|
5 |
|
|
6 |
|
|
7 |
|
|
8 |
|
|
9 |
|
|
10 |
|
|
11 |
|
|
12 |
|
|
13 |
|
|
14 |
a4 |
|
15 |
||
|
||
16 |
a3 |
|
17 |
||
|
m1 |
m2 |
a1 |
m3
m4 |
|
m 5 |
|
|
|
Fig. 6.3. An optimum scheduling, resource binding, and word-length selection for the sequencing graph illustrated in Fig. 6.2
between the start of the first operation in the sequencing graph, and the end of the last operation in the sequencing graph.
Let Vm = {v V : type(v) = mult} and similarly let Va = {v V : type(v) = add}.
Any resource of the correct type, and large enough in word-length, can perform an operation. For example a resource type (p, q) can perform any p × q -bit multiplication, so long as p ≤ p and q ≤ q. However the searchspace for area-e cient implementations may be trimmed significantly by observing that area-optimal resource bindings will only ever use the resource word-length that is just large enough to cover all operations assigned to that
6.3 Optimum Solutions |
119 |
resource. For an adder to which operations V V have been assigned, this corresponds to a word-length of max bA(a). For a multiplier to which op-
erations V |
|
|
a V |
|
|
|||
V have been assigned, this corresponds to a word-length of |
||||||||
tion |
|
" |
# |
|
" |
# |
|
|
( max π |
bM |
max π |
|
bM (m) ), where π ( ) and π ( ) are the projec- |
||||
m V 1 |
(m) , m V |
2 |
|
1 |
· |
2 · |
||
|
operators (6.4). |
|
|
|
|
|
||
|
|
|
|
|
π1 |
(p, q) = p |
|
(6.4) |
|
|
|
|
|
π2(p, q) = q |
|
||
|
|
|
|
|
|
|
There are therefore only certain resource types which can arise from the optimal sharing of resources between operations. Let RA(a) denote the set of adders which could implement the addition a Va and RM (m) denote the set of multipliers which could implement the multiplication m Vm. Then RA(a) and RM (m) are given by (6.5) and (6.6) respectively. Together, these resource types form a resource-set R(v) for each operation v V (6.7). R denotes the set of all such resource types (6.8).
RA(a) = {p bA(Va) : p ≥ bA(a)} |
(6.5) |
RM (m) = {(p, q)| (p, b) bM (Vm), (c, q) bM (Vm) : |
M |
(m)} |
p ≥ c q ≥ b p ≥ d q ≥ e where (d, e) = b |
|
|
RM (v), v Vm |
|
|
R(v) = RA(v), v Va |
|
|
$ |
|
|
R = R(v) |
|
|
v V |
|
|
(6.6)
(6.7)
(6.8)
An upper bound I(r) may be placed on the number of instances of each resource type that could arise. For an adder resource, there can be as many instances of a w-bit adder as there are w-bit addition operations (6.9). For a multiplier resource, each p × q-bit resource can only arise due to resource sharing of a p × b-bit and a c × q-bit multiplication with p ≥ c and q ≥ b. The number of these pairings is bounded by (6.10).
I(r) = |{a Va : bA(a) = r}|, for r R(Va) |
(6.9) |
M |
|
I(p, q) = min { |{m Vm : q ≥ e where (p, e) = b M(m)}|, |
(6.10) |
|{m Vm : p ≥ d where (d, q) = b (m)}| } , |
for (p, q) R(Vm)
From (6.2) it is possible to define the maximum latency max(v) and minimum latency min(v) of each operation v V according to (6.11, 6.12).
120 6 Scheduling and Resource Binding
min(v) = |
min L(r) |
(6.11) |
|
r R(v) |
|
max(v) = |
max L(r) |
(6.12) |
|
r R(v) |
|
In order to bound the possible execution control steps of each operation, it is necessary to utilize as-soon-as-possible (ASAP) and as-late-as-possible (ALAP) scheduling [Par99], provided in Algorithms ASAP and ALAP for completeness.
Algorithm 6.1 Algorithm ASAP
Input: A sequencing graph P (V, D) and a latency (v) for each operation v V Output: A schedule S : V → N {0}
begin
mark(v) ← false for all v V foreach v V : (v , v) D do
S(v) ← 0 mark(v) ← true
end foreach do
foreach v V : (v , v) D, mark(v ) = true do
S(v) ← max {S(v ) + (v )}
(v ,v) D
mark(v) ← true end foreach
while v V : mark(v) = false end
Algorithm 6.2 Algorithm ALAP
Input: A sequencing graph P (V, D), latency constraint λ, and a latency (v) for each operation v V
Output: A schedule S : V → N {0} begin
mark(v) ← false for all v V foreach v V : (v, v ) D do
S(v) ← λ − (v) mark(v) ← true
end foreach do
foreach v V : (v, v ) D, mark(v ) = true do
S |
(v) |
← |
min |
{S |
(v ) |
(v) |
|
(v,v ) D |
} − |
|
mark(v) ← true end foreach
while v V : mark(v) = false end