Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Казанский национальный исследовательский технологический университет

Предмет:

Химия

Файл:

Constantinides G.A., Cheung P.Y.K., Luk W. - Synthesis and optimization of DSP algorithms (2004)(en).pdf

Скачиваний:

Добавлен:

15.08.2013

Размер:

1.54 Mб

Скачать

☆

<<< < Предыдущая 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 3536 / 4036 37 38 39 40 > Следующая >>>

6.5 Some Results

141

3.Extract the subset Vb V of nodes on the bound critical path which could complete within the latency constraint (6.31)

4.if Vb = do

return failure case

5.Find Vp Vb of nodes maximizing the measure (6.32)

6.if v Vp : L(π1(R(v))) < max(v) do

Select one such node v Vp else do

Select an arbitrary node v Vp end if

7. H ← H \ {{v, r} H : L(r) = max(v)} end

Example 6.19. Fig. 6.10 illustrates an example reﬁnement phase corresponding to the data ﬂow graph introduced in Fig. 6.2 and reproduced in Fig. 6.10(a) for convenience. Nodes that are on the computation critical path with respect to the data ﬂow graph P (V, D) are highlighted in Fig. 6.10(a), Vc = {a2, m3, m4, m5, a1, a3}. This data ﬂow graph has been scheduled and resourcebound in Fig. 6.10(b). The portions of node execution time between min(v) and max(v) have been shaded in the ﬁgure.

The augmented data ﬂow graph P (V, D Db) obtained from timeabutment edges Db is illustrated in Fig. 6.10(c) and consists of a single extra edge from operation m2 to operation m5. The resulting change in critical path is signiﬁcant. The bound critical path is given by Vb = {m1, m2, m5, a3}.

For λ = 18, the lowest achievable latency constraint, the subset Vb is given by Vb = {m1, m2}. The heuristic measures described above may then be applied to decide which of these two nodes is to have its upper bound latency reduced.

6.5 Some Results

Before considering in detail the performance of the methods discussed in this chapter, it is instructive to follow through the sequencing graph of Fig. 6.2 which has been used as an example throughout this chapter where appropriate. Both optimal and heuristic schedules, resource allocations, bindings and wordlength selections have been performed for this example in order to explore the area / latency tradeo achievable. Fig. 6.11 plots these results. Not all ILP results are shown, due to excessive ILP solver execution time. The ILP and heuristic solutions are identical for all cases except λ = 26, where there is a slight di erence caused by the presence of an extra adder in the heuristic solution.

Figure 6.12 illustrates the results for the benchmark circuits introduced in Section 4.6. For comparison, not only are the optimal (ILP) results and

142	6 Scheduling and Resource Binding
	m1				ADD	MULT	MULT
	m1	a			25	(40,17)	(33,21)
	MULT	2
	MULT	ADD		0	a2
	(18,15)	ADD		0
	(18,15)	20		2
		20
						m 1
				4		m 1	m3
				6			m3
m2		m3		8
m2		m3		10
MULT		MULT		10	a4
(19,17)		(33,21)		12	a4	m 2
(19,17)		(33,21)		12		m 2	m4
				14			m4
	m4	m5		14
	m4	m5	a4	16
	MULT	MULT	a4	16
	MULT	MULT	ADD
	(22,16)	(40,12)	ADD	18
	(22,16)	(40,12)	22	18	a1	m5
			22	20	a1
				20
	a1	a3		22
	a1	a3		24
	ADD	ADD		24	a3
	19	25		26	a3
				26

(a) sequencing graph			(b) initial schedule
m1		a
MULT		2
MULT		ADD
(18,15)		ADD
(18,15)		20
		20
				critical
m2	m3			node
m2	m3
MULT	MULT
(19,17)	(33,21)
				min
m4	m5	a4	max	operation
m4	m5		operation	latency
MULT	MULT		operation	latency
MULT	MULT	ADD	latency
(22,16)	(40,12)	ADD	latency
(22,16)	(40,12)	22
		22
a1		a3
ADD		ADD
19		25

Fig. 6.10. Example use of the bound critical path for word-length reﬁnement

the heuristic results provided, but also the solutions corresponding to word- length-blind scheduling followed by an optimal resource binding [CCL00b]. All three sets of results are only provided for the IIR ﬁlter, the polyphase ﬁlter bank and the RGB to YCrYb convertor, due to the excessive execution time of both the ILP solver and the optimal binding.

These results illustrate that for the benchmark circuits, the heuristic presented in this chapter provides a signiﬁcant improvement in area (between −16% and 46%, average 15%) over a two-stage approach of scheduling and then binding, even when the binding is optimal. The heuristic results have between 0% and 62% (average 14%) worse area than the optimum combined solution, for cases where the optimum is known.

In order to fully characterize the heuristic performance, further results have been obtained using artiﬁcially generated examples. For statistically signiﬁcant data on solution quality, 200 random sequencing graphs have been

6.5 Some Results

143

	1600
	1500				heuristic synthesis
	1500				optimal (ILP) synthesis
	1400
	1300
area
resulting	1200
resulting
	1100
	1000
	900
	800	20	25	30	35	40	45
	15	20	25	30	35	40	45
				latency constraint λ

Fig. 6.11. Design-space exploration for the simple sequencing graph of Fig. 6.2

generated for each (problem size |V |, latency constraint λ) pair, using a variant of the TGFF algorithm [DRW98]. The ﬁrst set of detailed quality results that has been collected, measures the variation of the heuristic solution quality with both the problem size and the tightness of the supplied latency constraint. For each sequencing graph, the minimum possible latency constraint λmin has been found using ASAP scheduling, from which latency constraints have been generated corresponding to a 0% to 30% relaxation of λmin. Algorithm ArchSynth has then been executed on the graph / latency constraint combination. The resulting area has been normalized with respect to the optimal solution resulting from [CCL00b] where operations may only share resources when the implemented resource has latency no longer than the minimum required to implement the given operations. These results are plotted in Fig. 6.13, as a percentage area penalty for using the approach of [CCL00b] over the heuristic presented in this chapter. Each point represents the mean of two hundred representative designs.

The results illustrate that for designs with even a small ‘slack’ in terms of latency constraints, signiﬁcant area improvements of up to 30% can be made by performing the scheduling, binding, and word-length selection in the intertwined manner proposed. The area improvements come from increased resource sharing due to implementing small word-length operations in larger

144 6 Scheduling and Resource Binding

700
							optimal (ILP)
							heuristic
650							optimal binding from [CCL00b]
650
600
550
500
area										area
450
400
350
300
250	13.5	14	14.5	15	15.5	16	16.5	17	17.5	18
13	13.5	14	14.5	15	15.5	16	16.5	17	17.5	18
					λ

7500
					heuristic
					optimal binding from [CCL00b]
7000
6500
6000
5500
5000
4500
4000
3500
3000	8	9	10	11	12	13	14
7	8	9	10	11	12	13	14
				λ

(a) IIR

(b) FIR

	850
								heuristic
	800
	750
	700
area	650							area
	600
	550
	500
	450	15	16	17	18	19	20	21
	14	15	16	17	18	19	20	21
					λ

750
							heuristic
700
650
600
550
500
450
400	15	16	17	18	19	20	21
14	15	16	17	18	19	20	21
				λ

(d) DCT2

	200
								optimal (ILP)
	190							heuristic
								optimal binding from [CCL00b]
	180
	170
	160
area	150										area
area											area
	140
	130
	120
	110
	100	5.5	6	6.5	7	7.5	8	8.5	9	9.5	10
	5	5.5	6	6.5	7	7.5	8	8.5	9	9.5	10
						λ

170
160							optimal (ILP)
							heuristic
							optimal binding from [CCL00b]
150
140
130
120
110
100
90
80	13.5	14	14.5	15	15.5	16	16.5	17	17.5	18
13	13.5	14	14.5	15	15.5	16	16.5	17	17.5	18
					λ

(e) PFB

(f) RGB–YCrCb

Fig. 6.12. Design-space exploration for the benchmark circuits of Section 4.6

6.5 Some Results

145

Area penalty (%)

−5 30

20			20
15
			15
Latency constraint	10
		10
			No. operations
	5
		5

Fig. 6.13. Variation with number of operations and latency constraint of area penalty for [CCL00b] over the proposed heuristic

word-length resources with longer latency. Even for relatively small graphs, area improvements of tens of percent are possible.

Fig. 6.14 illustrates the increase in implementation area from using the heuristic presented in Section 6.4 over the optimum combined problem presented in Section 6.3. These results can only be provided for modest problem size and a minimum latency constraint λ = λmin, as the ILP solution execution time scales rapidly with problem size and as the latency constraint is relaxed.

The variation of minimum-latency execution time with problem size for 200 graphs using the ILP model (solved with lp solve [Sch97] on a Pentium III 450MHz) and the heuristic algorithm (implemented in C on the same machine) is shown in Fig. 6.15, illustrating the polynomial complexity of the heuristic against the exponential complexity of the ILP. Over the range of 1 to 10 operations, the relative increase in area ranges from 0% to 16% whereas the ILP solution takes between one and three orders of magnitude greater time to execute. An important point not brought out by these results is the scaling of execution time with overall latency constraint. The number of variables in the ILP model scales with the latency constraint, making the execution time highly dependent on this parameter. This is illustrated in Table 6.1 for 200 9-operation sequencing graphs. By contrast, the execution time of Algorithm ArchSynth does not scale with the latency constraint. Thus the

146 6 Scheduling and Resource Binding

	18
	16
	14
(%)	12
optimum	12
optimum	10
over
premium	8
premium
Area	6
Area
	4
	2
	0	2	3	4	5	6	7	8	9	10
	1	2	3	4	5	6	7	8	9	10
					Number of operations

Fig. 6.14. Variation of area premium for Algorithm ArchSynth over the optimum solution

one to three orders of magnitude illustrated in Fig. 6.15 are under conditions most favourable to the ILP-based solution.

Table 6.1. Variation of execution time for 200 graphs with λ/λmin for heuristic and ILP solution

λ/λmin	heuristic (secs)	ILP (mins:secs)
1.00	3.02	2:07.09
1.05	3.51	4:05.21
1.10	3.73	15:55.56


1.15	3.52	>30:00.00
1.15	3.52	>30:00.00

The results presented clearly indicate the practical nature of the heuristic presented in this chapter, whose execution results in a speedup of up to three orders of magnitude over ILP solution, even for modest graph sizes, at an area-penalty of between 0% and 16%.

<<< < Предыдущая 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 3536 / 4036 37 38 39 40 > Следующая >>>

Соседние файлы в предмете Химия

#
15.08.20133 Mб83Chivers T. - A Guide to Chalcogen-Nitrogen Chemistry (2005)(en).pdf
#
15.08.201332.51 Mб61Clayden J. - Organic chemistry (Oxford, 2000).pdf
#
15.08.20136.12 Mб48Cohen M.F., Wallace J.R. - Radiosity and realistic image synthesis (1995)(en).pdf
#
15.08.20137.79 Mб13Computational Methods for Protein Folding.pdf
#
15.08.201314.81 Mб12Computational methods in surface and colloid science 2000 vol. 89.djvu
#
15.08.20131.54 Mб20Constantinides G.A., Cheung P.Y.K., Luk W. - Synthesis and optimization of DSP algorithms (2004)(en).pdf
#
15.08.2013112.29 Кб12Covalent analogues of DNA base-pairs and triplets . Part 2[c] Synthesis and cytostatic activity of bis(purin-6-yl)acetylenes, -diacetylenes and related compounds.pdf
#
15.08.2013124.64 Кб11Covalent analogues of DNA base-pairs and triplets IV+. Synthesis of trisubstituted benzenes bearing purine and[s]or pyrimidine rings by cyclotrimerization of 6-ethynylpurines and[s]or 5-ethynyl-1,3-dimethyluracil.pdf
#
15.08.20133.56 Mб84Cycloaddition Reactions in Organic Synthesis.pdf
#
15.08.20138.31 Mб26D. Harvey. Modern Analytical Chemistry. 2000.djvu
#
15.08.2013192.98 Кб13Dankwardt A. - Immunochemical assays in pesticide analysis(en).pdf