- •Copyright
- •Contents
- •About the Author
- •Foreword
- •Preface
- •Glossary
- •1 Introduction
- •1.1 THE SCENE
- •1.2 VIDEO COMPRESSION
- •1.4 THIS BOOK
- •1.5 REFERENCES
- •2 Video Formats and Quality
- •2.1 INTRODUCTION
- •2.2 NATURAL VIDEO SCENES
- •2.3 CAPTURE
- •2.3.1 Spatial Sampling
- •2.3.2 Temporal Sampling
- •2.3.3 Frames and Fields
- •2.4 COLOUR SPACES
- •2.4.2 YCbCr
- •2.4.3 YCbCr Sampling Formats
- •2.5 VIDEO FORMATS
- •2.6 QUALITY
- •2.6.1 Subjective Quality Measurement
- •2.6.2 Objective Quality Measurement
- •2.7 CONCLUSIONS
- •2.8 REFERENCES
- •3 Video Coding Concepts
- •3.1 INTRODUCTION
- •3.2 VIDEO CODEC
- •3.3 TEMPORAL MODEL
- •3.3.1 Prediction from the Previous Video Frame
- •3.3.2 Changes due to Motion
- •3.3.4 Motion Compensated Prediction of a Macroblock
- •3.3.5 Motion Compensation Block Size
- •3.4 IMAGE MODEL
- •3.4.1 Predictive Image Coding
- •3.4.2 Transform Coding
- •3.4.3 Quantisation
- •3.4.4 Reordering and Zero Encoding
- •3.5 ENTROPY CODER
- •3.5.1 Predictive Coding
- •3.5.3 Arithmetic Coding
- •3.7 CONCLUSIONS
- •3.8 REFERENCES
- •4 The MPEG-4 and H.264 Standards
- •4.1 INTRODUCTION
- •4.2 DEVELOPING THE STANDARDS
- •4.2.1 ISO MPEG
- •4.2.4 Development History
- •4.2.5 Deciding the Content of the Standards
- •4.3 USING THE STANDARDS
- •4.3.1 What the Standards Cover
- •4.3.2 Decoding the Standards
- •4.3.3 Conforming to the Standards
- •4.7 RELATED STANDARDS
- •4.7.1 JPEG and JPEG2000
- •4.8 CONCLUSIONS
- •4.9 REFERENCES
- •5 MPEG-4 Visual
- •5.1 INTRODUCTION
- •5.2.1 Features
- •5.2.3 Video Objects
- •5.3 CODING RECTANGULAR FRAMES
- •5.3.1 Input and output video format
- •5.5 SCALABLE VIDEO CODING
- •5.5.1 Spatial Scalability
- •5.5.2 Temporal Scalability
- •5.5.3 Fine Granular Scalability
- •5.6 TEXTURE CODING
- •5.8 CODING SYNTHETIC VISUAL SCENES
- •5.8.1 Animated 2D and 3D Mesh Coding
- •5.8.2 Face and Body Animation
- •5.9 CONCLUSIONS
- •5.10 REFERENCES
- •6.1 INTRODUCTION
- •6.1.1 Terminology
- •6.3.2 Video Format
- •6.3.3 Coded Data Format
- •6.3.4 Reference Pictures
- •6.3.5 Slices
- •6.3.6 Macroblocks
- •6.4 THE BASELINE PROFILE
- •6.4.1 Overview
- •6.4.2 Reference Picture Management
- •6.4.3 Slices
- •6.4.4 Macroblock Prediction
- •6.4.5 Inter Prediction
- •6.4.6 Intra Prediction
- •6.4.7 Deblocking Filter
- •6.4.8 Transform and Quantisation
- •6.4.11 The Complete Transform, Quantisation, Rescaling and Inverse Transform Process
- •6.4.12 Reordering
- •6.4.13 Entropy Coding
- •6.5 THE MAIN PROFILE
- •6.5.1 B slices
- •6.5.2 Weighted Prediction
- •6.5.3 Interlaced Video
- •6.6 THE EXTENDED PROFILE
- •6.6.1 SP and SI slices
- •6.6.2 Data Partitioned Slices
- •6.8 CONCLUSIONS
- •6.9 REFERENCES
- •7 Design and Performance
- •7.1 INTRODUCTION
- •7.2 FUNCTIONAL DESIGN
- •7.2.1 Segmentation
- •7.2.2 Motion Estimation
- •7.2.4 Wavelet Transform
- •7.2.6 Entropy Coding
- •7.3 INPUT AND OUTPUT
- •7.3.1 Interfacing
- •7.4 PERFORMANCE
- •7.4.1 Criteria
- •7.4.2 Subjective Performance
- •7.4.4 Computational Performance
- •7.4.5 Performance Optimisation
- •7.5 RATE CONTROL
- •7.6 TRANSPORT AND STORAGE
- •7.6.1 Transport Mechanisms
- •7.6.2 File Formats
- •7.6.3 Coding and Transport Issues
- •7.7 CONCLUSIONS
- •7.8 REFERENCES
- •8 Applications and Directions
- •8.1 INTRODUCTION
- •8.2 APPLICATIONS
- •8.3 PLATFORMS
- •8.4 CHOOSING A CODEC
- •8.5 COMMERCIAL ISSUES
- •8.5.1 Open Standards?
- •8.5.3 Capturing the Market
- •8.6 FUTURE DIRECTIONS
- •8.7 CONCLUSIONS
- •8.8 REFERENCES
- •Bibliography
- •Index
CODING SYNTHETIC VISUAL SCENES |
• |
|
155 |
|
|
|
|
|
|
|
|
Core |
|
|
|
|
|
|
Scalable still |
|
|
|
|
|
|
|
|
texture |
|
|
Animated 2D |
|
|
|
|
|
|
|
|
Mesh |
Facial Animation |
|
Simple Face |
Simple FBA |
|
Binary Shape |
|
Basic |
|
|
|
|
Animated |
|||||
Parameters |
|
|
|
|||||
|
|
|
|
|
|
Texture |
||
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
2D dynamic mesh |
|
|
|
|
|
|
Body Animation |
|
(uniform topology) |
|
2D dynamic mesh |
|
|
|
|
|
|
|
|
(Delaunay |
|
|
|
|
Parameters |
|
|
|
|
|
|
|
|
|
|
|
|
topology) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Figure 5.74 Tools and objects for animation
5.7.2 The Core Studio Profile
The Core Studio object is intended for distribution of studio-quality video (for example between production studios) and adds support for Sprites and P-VOPs to the Simple Studio tools. Sprite coding is modified by adding extra sprite control parameters that closely mimic the properties of ‘real’ video cameras, such as lens distortion. Motion compensation and motion vector coding in P-VOPs is modified for compatibility with the MPEG-2 syntax, for example, motion vectors are predictively coded using the MPEG-2 method rather than the usual MPEG-4 median prediction method.
5.8 CODING SYNTHETIC VISUAL SCENES
For the first time in an international standard, MPEG4 introduced the concept of ‘hybrid’ synthetic and natural video objects for visual communication. According to this concept, some applications may benefit from using a combination of tools from the video coding community (designed for coding of ‘real world’ or ‘natural’ video material) and tools from the 2D/3D animation community (designed for rendering ‘synthetic’ or computer-generated visual scenes).
MPEG4 Visual includes several tools and objects that can make use of a combination of animation and natural video processing (Figure 5.74). The Basic Animated Texture and Animated 2D Mesh object types support the coding of 2D meshes that represent shape and motion, together with still texture that may be mapped onto a mesh. A tool for representing and coding 3D Mesh models is included in MPEG-4 Visual Version 2 but is not yet part of any profile. The Face and Body Animation tools enable a human face and/or body to be modelled and coded [8].
It has been shown that animation-based tools have potential applications to very low bit rate video coding [9]. However, in practice, the main application of these tools to date has been in coding synthetic (computer-generated) material. As the focus of this book is natural video coding, these tools will not be covered in detail.
5.8.1 Animated 2D and 3D Mesh Coding
A 2D mesh is made up of triangular patches and covers the 2D plane of an image or VO. Deformation or motion between VOPs can be modelled by warping the triangular patches. A 3D