- •Contents
- •1 Introduction
- •1.1 Objectives
- •1.2 Overview
- •2 Background
- •2.1 Digital Design for DSP Engineers
- •2.1.2 The Field-Programmable Gate Array
- •2.1.3 Arithmetic on FPGAs
- •2.2 DSP for Digital Designers
- •2.3 Computation Graphs
- •2.4 The Multiple Word-Length Paradigm
- •2.5 Summary
- •3 Peak Value Estimation
- •3.1 Analytic Peak Estimation
- •3.1.1 Linear Time-Invariant Systems
- •3.1.2 Data-range Propagation
- •3.2 Simulation-based Peak Estimation
- •3.3 Hybrid Techniques
- •3.4 Summary
- •4 Word-Length Optimization
- •4.1 Error Estimation
- •4.1.1 Word-Length Propagation and Conditioning
- •4.1.2 Linear Time-Invariant Systems
- •4.1.3 Extending to Nonlinear Systems
- •4.2 Area Models
- •4.3.1 Convexity and Monotonicity
- •4.4 Optimization Strategy 1: Heuristic Search
- •4.5 Optimization Strategy 2: Optimum Solutions
- •4.5.1 Word-Length Bounds
- •4.5.2 Adders
- •4.5.3 Forks
- •4.5.4 Gains and Delays
- •4.5.5 MILP Summary
- •4.6 Some Results
- •4.6.1 Linear Time-Invariant Systems
- •4.6.2 Nonlinear Systems
- •4.6.3 Limit-cycles in Multiple Word-Length Implementations
- •4.7 Summary
- •5 Saturation Arithmetic
- •5.1 Overview
- •5.2 Saturation Arithmetic Overheads
- •5.3 Preliminaries
- •5.4 Noise Model
- •5.4.1 Conditioning an Annotated Computation Graph
- •5.4.2 The Saturated Gaussian Distribution
- •5.4.3 Addition of Saturated Gaussians
- •5.4.4 Error Propagation
- •5.4.5 Reducing Bound Slackness
- •5.4.6 Error estimation results
- •5.5 Combined Optimization
- •5.6 Results and Discussion
- •5.6.1 Area Results
- •5.6.2 Clock frequency results
- •5.7 Summary
- •6 Scheduling and Resource Binding
- •6.1 Overview
- •6.2 Motivation and Problem Formulation
- •6.3 Optimum Solutions
- •6.3.1 Resources, Instances and Control Steps
- •6.3.2 ILP Formulation
- •6.4 A Heuristic Approach
- •6.4.1 Overview
- •6.4.2 Word-Length Compatibility Graph
- •6.4.3 Resource Bounds
- •6.4.4 Latency Bounds
- •6.4.5 Scheduling with Incomplete Word-Length Information
- •6.4.6 Combined Binding and Word-Length Selection
- •6.5 Some Results
- •6.6 Summary
- •7 Conclusion
- •7.1 Summary
- •7.2 Future Work
- •A.1 Sets and functions
- •A.2 Vectors and Matrices
- •A.3 Graphs
- •A.4 Miscellaneous
- •A.5 Pseudo-Code
- •References
- •Index
108 5 Saturation Arithmetic
|
|
(a) uniform / sim |
|
|
(b) wl optimized / sim |
|
800 |
(d) comb optimized |
|
(e) uniform / l1 |
|
|
|
|
|
|
(f) wl optimized / l1 |
|
750 |
|
area |
|
|
resulting system |
700 |
|
650 |
|
|
|
|
|
|
600 |
|
|
550 |
10−7 |
|
10−8 |
|
|
|
specified error variance |
Fig. 5.17. Comparison of di erent design approaches for trading-o system area and error, for a 4th order lowpass elliptic IIR filter
strated that significant area reductions can be achieved for the narrow bandpass filter example. Fig. 5.18 illustrates how the area consumption of second order autoregressive filters varies with the location of their complex conjugate pole on the z-plane, for fixed error specification. Compared to 1 scaling, area savings have been achieved using the techniques developed in this chapter for systems with poles with magnitude greater than approximately 0.9.
5.6.2 Clock frequency results
As noted in Section 5.2, there are also significant timing overheads associated with the use of saturation arithmetic. While Algorithm CombOptAlg does not explicitly consider circuit speed, it is instructive to place the points on Fig. 5.16 on a speed / area design-space diagram. This is shown in Fig. 5.19, where the short simulation run results are used for representation of simulation-scaled systems. There are ten graphs, corresponding to the ten error variance specifications in Fig. 5.16. Speed estimates are obtained from Altera MaxPlus II [MAX] on the fully placed and routed design in an Altera Flex10kRC240-3 device. A Pareto-optimal point [DeM94] is a point in
5.6 Results and Discussion |
109 |
450 |
|
|
|
|
|
|
(a) comb optimized |
|
|
|
(b) uniform / l1 |
|
|
|
(c) wl optimized / l1 |
400 |
|
|
|
350 |
|
|
|
area (#LCs) |
|
|
|
300 |
|
|
|
250 |
|
|
|
200 |
|
|
|
10−3 |
10−2 |
10−1 |
100 |
distance of poles from |z|=1
Fig. 5.18. Variation of system area with pole location
the design space that is not dominated in all design objectives by any other design-space point. The Pareto optimal points in Fig. 5.19 are joined by solid lines.
The results of Fig. 5.19 demonstrate that although the di erence in area is small between short-run simulation-based scaling word-length-optimized systems and those resulting from Algorithm CombOptAlg, there is a significant speed di erence. The source of this consistent speedup, averaging 27.6% over uniform word-length and 23.7% over optimized word-length structures, is illustrated in Fig. 5.20 where the saturator locations and degrees are illustrated for a single optimization example.
Comparing Figs. 5.20(a) and (b), simulation-based scaling has resulted in a large number of low degree saturators. In contrast the optimized saturators are few in number, but are generally of higher degree. Although aiming to reduce system implementation area, Algorithm CombOptAlg has also resulted in significant speedup over simulation-based approaches by using only a small number of saturators. The Cauchy-Schwartz bound tends to drive the solution towards using fewer saturators in order to minimize the potential error crosscorrelation e ects.