More Related Content
Similar to Analog Digital Video (20)
More from Yoss Cohen (20)
Analog Digital Video
- 2. Course Content
Introduction to Video
• Basic Concepts & Formats
• Introduction to Multimedia coding
• Lossy Compression
• Basic Video CODEC
• Standardization Landscape
• Components
• File Formats
• AVI, MPEG4 FF, MKV
• Codecs
• H264, VP6, WMV / VC-1, VP8
Copyright © 2008 LOGTEL Yossi Cohen
- 3. Course Content
• Delivery methods
• RTP Streaming
• Progressive Download
• HTML5 Video
• HTTP Streaming
Copyright © 2008 LOGTEL Yossi Cohen
- 5. Agenda
Basic Video Concepts
Color Spaces
Interlacing
Video Connection(Component, S-Video)
Image compression
Introduction to video compression
Copyright © 2008 LOGTEL Yossi Cohen
- 6. 4.2 Color Models in Images
Colors models and spaces used to store, display,
and print images.
RGB Color Model for CRT Displays
We expect to be able to use 8 bits per color channel
for color that is accurate enough.
However, in fact we have to use about 12 bits per
channel to avoid an aliasing effect in dark image areas
— contour bands that result from gamma correction.
For images produced from computer graphics, we
store integers proportional to intensity in the frame
buffer. So should have a gamma correction LUT
between the frame buffer and the CRT.
Copyright © 2008 LOGTEL Yossi Cohen
- 7. Color matching
How can we compare
colors so that the
content creators and
consumers know what
they are seeing?
Many different ways
including CIE
chromacity diagram
Copyright © 2008 LOGTEL Yossi Cohen
- 8. Video Color Transforms
Largely derived from older analog methods of coding
color for TV. Luminance is separated from color
information.
YIQ is used to transmit TV signals in North America and
Japan.This coding also makes its way into VHS video
tape coding in these countries since video tape
technologies also use YIQ.
In Europe, video tape uses the PAL or SECAM codings,
which are based on TV that uses a matrix transform
called YUV.
Finally, digital video mostly uses a matrix transform
called YCbCr that is closely related to YUV
Copyright © 2008 LOGTEL Yossi Cohen
- 9. Color Models in Video
• Largely derive from older analog methods of coding
color for TV. Luminance is separated from color
information.
• A matrix transform YIQ is used to transmit TV signals
in North America and Japan. (NTSC) This coding also
makes its way into VHS video tape coding in these
countries since video tape technologies also use YIQ.
• In Europe, video tape uses the PAL or SECAM
codings, which are based on TV that uses a matrix
transform called YUV.
• Finally, digital video mostly uses a matrix transform
called YCbCr that is closely related to YUV.
Copyright © 2008 LOGTEL Yossi Cohen
- 11. YUV Color Model
•YUV codes a luminance signal (for gamma-corrected
signals) equal to Y , the “luma".
•Chrominance refers to the difference between a color
and a reference white at the same luminance. (U and V)
The transform is:
Copyright © 2008 LOGTEL Yossi Cohen
- 13. YIQ Color Model
YIQ is used in NTSC color TV broadcasting.
Again, gray pixels generate zero (I;Q)
chrominance signal.
I and Q are a rotated version of U and V .
The transform is:
Copyright © 2008 LOGTEL Yossi Cohen
- 14. YCbCr Color Model
1. The Rec. 601 standard for digital video uses
another color space YCbCr which closely
related to the YUV transform.
2. The YCbCr transform is used in JPEG image
compression and MPEG video compression.
For 8-bit coding:
Copyright © 2008 LOGTEL Yossi Cohen
- 16. Component Video
High-end solution, use of three separate video signals
for R,G,B planes.
Each color channel is sent as a separate video signal.
(a) Most computer systems use Component Video, with
separate signals for R, G, and B signals.
(b) Provides the best color reproduction since there is
no “crosstalk“ between the three channels.
(c) Component video, requires more bandwidth and
good synchronization of the three components than
composite/S-Video .
Copyright © 2008 LOGTEL Yossi Cohen
- 17. Composite Video
• color (“chrominance") and intensity (“luminance")
signals are mixed into a single carrier wave.
a) Chrominance is a composition of two color
components (I and Q, or U and V).
b) In NTSC TV, e.g., I and Q are combined into a
chroma signal, and a color subcarrier is then
employed to put the chroma signal at the high-
frequency end of the signal shared with the
luminance signal.
c) The chrominance and luminance components can
be separated at the receiver end and then the two
color components can be further recovered.
Copyright © 2008 LOGTEL Yossi Cohen
- 18. Composite Video
d) When connecting to TVs or VCRs, Composite
Video uses only one wire and video color signals
are mixed, not sent separately. The audio and
sync signals are additions to this one signal.
Since color and intensity are wrapped into the
same signal, some interference between the
luminance and chrominance signals is inevitable.
Copyright © 2008 LOGTEL Yossi Cohen
- 19. S-Video
Uses two wires, one for luminance and another for a
composite chrominance signal.
less crosstalk between the color information and the gray-
scale information.
In fact, humans are able to differentiate spatial resolution
in grayscale images with a much higher acuity than for the
color part of color images.
As a result, we can reduce color
information since we can only see
fairly large blobs of color, so it
makes sense to send less color
detail.
Copyright © 2008 LOGTEL Yossi Cohen
- 20. VIDEO SCANNING
•Interlacing
•De-Interlacing
Copyright © 2008 LOGTEL Yossi Cohen
- 21. Analog Video Scanning Process
An analog signal f(t) samples a time-varying image. So-
called “progressive" scanning traces through a complete
picture (a frame) row-wise for each time interval.
In TV, and in some monitors and multimedia standards as
well, another system, called “interlaced" scanning is used:
a) The odd-numbered lines are traced first, and then the
even-numbered lines are traced. This results in “odd" and
“even" fields | two fields make up one frame.
b) In fact, the odd lines (starting from 1) end up at the
middle of a line at the end of the odd field, and the even
scan starts at a half-way point.
Copyright © 2008 LOGTEL Yossi Cohen
- 22. Q R : horizontal Trace. V P : vertical trace
Copyright © 2008 LOGTEL Yossi Cohen
- 23. Interlacing effects
• Because of interlacing, the odd and even lines
are displaced in time from each other |
generally not noticeable except when very fast
action is taking place on screen, when blurring
may occur.
• For example, in the video in Fig. 5.2, the
moving helicopter is blurred more than is the
still background.
Copyright © 2008 LOGTEL Yossi Cohen
- 25. de-Interlace
Since it is sometimes necessary to change the frame rate,
resize, or even produce stills from an interlaced source
video, various schemes are used to “de-interlace" it.
a) The simplest de-interlacing method consists of
discarding one field and duplicating the scan lines of
the other field. The information in one field is lost
completely using this simple technique.
b) b) Other more complicated methods that retain
information from both fields are also possible. Analog
video use a small voltage offset from zero to indicate
“black", and another value such as zero to indicate the
start of a line. For example, we could use a blacker-
than-black“ zero signal to indicate the beginning of a
line.
Copyright © 2008 LOGTEL Yossi Cohen
- 26. NTSC Video
NTSC
NTSC (National Television System Committee) TV
standard is mostly used in North America and Japan. It
uses the familiar 4:3 aspect ratio (i.e., the ratio of picture
width to its height) and uses 525 scan lines per frame at 30
frames per second (fps).
a) NTSC follows the interlaced scanning system, and each
frame is divided into two fields, with 262.5 lines/field.
b) Thus the horizontal sweep frequency is 525 X 29.97
=15, 734 lines/sec, so that each line is swept out in 63.6 u
second.
c) Since the horizontal retrace takes 10.9 u sec, this leaves
52.7 sec for the active line signal during which image data
is displayed (see Fig.5.3).
Copyright © 2008 LOGTEL Yossi Cohen
- 27. NTSC
NTSC video is an analog signal with no fixed horizontal
resolution. Therefore one must decide how many times to
sample the signal for display: each sample corresponds
to one pixel output.
A “pixel clock" is used to divide each horizontal line of
video into samples. The higher the frequency of the pixel
clock, the more samples per line there are.
Different video formats provide dierent numbers of
samples per line, as listed in Table 5.1.
Copyright © 2008 LOGTEL Yossi Cohen
- 29. NTSC Color Modulation
NTSC uses the YIQ color model, and the technique of quadrature
modulation is employed to combine (the spectrally overlapped part of) I (in-
phase) and Q (quadrature) signals into a single chroma signal C:
C = I cos(Fsct) + Qsin(Fsct) (5:1)
This modulated chroma signal is also known as the color subcarrier, whose
magnitude is qI2 +Q2, and phase is arctan(Q/I). The frequency of C is Fsc
3:58 MHz.
The NTSC composite signal is a further composition of the luminance signal Y
and the chroma signal as defined below:
composite = Y +C = Y +I cos(Fsct) + Qsin(Fsct) (5:2)
Copyright © 2008 LOGTEL Yossi Cohen
- 30. PAL
PAL (Phase Alternating Line) is a TV standard widely
used in Western Europe, China, India, and many other
parts of the world.
PAL uses 625 scan lines per frame, at 25
frames/second, with a 4:3 aspect ratio and interlaced
fields.
(a) PAL uses the YUV color model. It uses an 8 MHz
channel and allocates a bandwidth of 5.5 MHz to Y, and
1.8 MHz each to U and V. The color subcarrier
frequency is fsc 4:43 MHz.
(b) In order to improve picture quality, chroma signals
have alternate signs (e.g., +U and -U) in successive
scan lines, hence the name “Phase Alternating Line".
Copyright © 2008 LOGTEL Yossi Cohen
- 31. PAL
(c) This facilitates the use of a (line rate) comb filter at the
receiver| the signals in consecutive lines are averaged so
as to cancel the chroma signals (that always carry
opposite signs) for separating Y and C and obtaining high
quality Y signals.
Copyright © 2008 LOGTEL Yossi Cohen
- 32. Video Worlds
Intro to Media Coding
Image and Video
Speech
Audio
Copyright © 2008 LOGTEL
- 33. Compression
Compression – Representing information by
less bit than the original information
Lossless Compression – Original information
and compressed information are identical.
example LZ, TAR and other compression
techniques.
Lossy Compression – Compressed info is not
the same as uncompressed info. Example:
MP3, JPEG etc
Lossy compression is often MODEL Based
Compression
Copyright © 2008 LOGTEL Yossi Cohen
- 34. Compression terms
Encoder – Module which compress the
information
Decoder – Module which decompress the
information
CODEC – (en)CODer / DEcoder
Channel – the medium which the information is
passed through for example ADSL line or disk
Decoder
Encoder Channel
Disk
Copyright © 2008 LOGTEL Yossi Cohen
- 35. Model Based Compression
Pre
Processing
Losless Compression
Model Quantize / Entropy
Based Prioritize Reorder Coding
Transform
Bit rate control
Copyright © 2008 LOGTEL Yossi Cohen
- 36. Human Visual System
The human eye has two basic light receptors:
Rods – Light Intensity receptors
Cons – Colored light receptors
Copyright © 2008 LOGTEL Yossi Cohen
- 37. The Human Eye
Rods Concentration >> Cons Concentration
Green Discrimination << Red, Blue
Discrimination
Low Frequency > High Frequency
Copyright © 2008 LOGTEL Yossi Cohen
- 38. Image Coding Model Based transformations
RGB (3 equally quantized colors) ->
YUV (Light Intensity + two color channels)
Pixel based domain -> Frequency domain
Copyright © 2008 LOGTEL Yossi Cohen
- 39. Speech coding
In speech coding, the vocal tract is used as a
model:
Copyright © 2008 LOGTEL Yossi Cohen
- 40. Audio / Music Coding
In general Audio Coding, the ear is used as a
model:
Frequencies -> Frequency bands
Masking and Temporal Masking are used
Copyright © 2008 LOGTEL Yossi Cohen
- 41. Basic Image and Video coding
Definitions
Where to lose information: color & frequency
Copyright © 2008 LOGTEL
- 42. What is a digital image?
Audio PCM
One 1-D array of
sample
BMP Image
Three 2-D arrays of
numbers representing
Red, Green and Blue
values
Copyright © 2008 LOGTEL Yossi Cohen
- 43. Image Compression? Why?
Image size = 720*580
3 Image Layers RGB =720*580*3
8 Bits per pixel 720*580*3*8
= 10022400 bits
Lots of bits for one Lena
Copyright © 2008 LOGTEL Yossi Cohen
- 45. Color based decimation
Our eyes have better resolution and scaling
for luminance then for color.
Compress color by using 4:2:0 method
Copyright © 2008 LOGTEL Yossi Cohen
- 46. Counting the bits
How much can we save by color
compression?
3*Image size in RGB 24 bit color representation.
1 + 2*1/4 Image size in 4:2:0 YUV representation.
Compression ratio is 2 !!
Actual saving is bigger due to different Y and
UV quantization.
Copyright © 2008 LOGTEL Yossi Cohen
- 47. Linear Transform
If the signal is formatted as a Energy compaction property:
vector, a linear transform can The transformed signal vector
be formulated as a matrix- has few, large coefficients and
vector product that transform many nearly zero small
the signal into a different coefficients. These few large
domain. coefficients can be encoded
Examples: efficiently with few bits while
K-L Transform retaining the majority of energy
of the original signal.
Discrete Fourier Transform
Discrete cosine transform
Discrete wavelet transform
Copyright © 2008 LOGTEL Yossi Cohen
- 48. Block-based Image Coding
Block-based image Advantages:
coding scheme: Parallel processing
partitions the entire can be applied to
image into 8 by 8 or process individual
blocks in parallel.
16 by 16 (or other
Redundant information
size) blocks. in close proximity (like
The coding algorithm cache)
is applied to individual
blocks independently.
Copyright © 2008 LOGTEL Yossi Cohen
- 49. Transform - DCT
The DCT transform the data from pixel
intensity to frequency intensity.
Low frequency are important high frequency
less
1 7 7 (2m + 1)uπ (2n + 1)vπ
4 ∑∑ F (u , v) cos cos m = n = 0;
u =0 v =0 16 16
f (m, n) = 7 7
1 (2m + 1)uπ (2n + 1)vπ
8 ∑∑
F (u, v) cos cos 0 ≤ m, n ≤ 7; m + n > 0.
u = v =0
(You’ll0 get launch even if you 1616
don’t remember
the IDCT formula above)
Copyright © 2008 LOGTEL Yossi Cohen
- 51. AC Coefficients
AC coefficients are first
weighted with a quantization 1 2 6 7 15 16 28 29
matrix: 3 5 8 14 17 27 30 43
C(i,j)/q(i,j) = Cq(i,j) 4 9 13 18 26 31 42 44
Then quantized. 10 12 19 25 32 41 45 54
Then they are scanned in a 11 20 24 33 40 46 53 55
zig-zag order into a 1D 21 23 34 39 47 52 56 61
sequence to be subject to AC 22 35 38 48 51 57 60 62
Huffman encoding. 36 37 49 50 58 59 63 64
Question: Given a 8 by 8
array, how to convert it into a Zig-Zag scan order
vector according to the zig-
zag scan order? What is the
algorithm?
Copyright © 2008 LOGTEL Yossi Cohen
- 57. JPEG Image Coding Algorithms
Quantization DC
8x8 Matrix DC DPCM Huffman
block
DCT Q
Zig Zag AC
AC Scan Huffman
Code books
JPEG Encoding Process
Copyright © 2008 LOGTEL Yossi Cohen
- 58. Generalization of JPEG Coding
Transform Entropy
Color, Frequency Quantize Reorder Coding
JPEG Encoding Process
Copyright © 2008 LOGTEL Yossi Cohen
- 60. Video Coding
Video coding is often implemented as encoding
a sequence of images.Motion compensation
is used to exploit temporal redundancy
between successive frames.
Examples: MPEG-I, MPEG-II, MPEG-IV,
H.263, H.263+, H264
Existing video coding standards are based on
JPEG image compression as well as motion
compensation.
Copyright © 2008 LOGTEL Yossi Cohen
- 61. Video Coding Standardization Scope
Only restrictions on the Bitstream, Syntax, and
Decoder are standardized:
Permits the optimization of encoding
Permits complexity reduction
Provides no guarantees on quality
Copyright © 2008 LOGTEL Yossi Cohen
- 62. Video Encoding
Buffer control
Current
frame x(t) r Bit stream
+ DCT Q VLC Buffer
−
Q-1 This is a simplified block
diagram where the
encoding of intra coded
IDCT frames is not shown.
Xp(t): predicted ^ r(t): reconstructed residue
frame
+
^
x(t): reconstructed
Motion ^x(t-1) current frame
x(t) Frame
Estimation &
Compensation Buffer
Motion vectors
Copyright © 2008 LOGTEL Yossi Cohen
- 63. Video Encoding
Color Frequency
Transform Buffer control
Transform
+ Q Reorder Entropy
−
Q-1 This is a simplified block
diagram where the
encoding of intra coded
Tf-1 frames is not shown.
Xp(t): predicted ^ r(t): reconstructed residue
frame
+
^
x(t): reconstructed
Motion ^x(t-1) current frame
x(t) Frame
Estimation &
Compensation Buffer
Motion vectors
Copyright © 2008 LOGTEL Yossi Cohen
- 64. Forward Motion Estimation
1 2 3 4 1 2 4
3
5 6 7 8 5 7 8
6
9 10 11 12 9 11 12
10
13 15 16
13 14 15 16 14
Current frame constructed From
different parts of reference frame Reference frame
Copyright © 2008 LOGTEL Yossi Cohen
- 65. Video sequence : Tennis frame 0, 1
previous frame current frame
50 50
100 100
150 150
200 200
50 100 150 200 250 300 350 50 100 150 200 250 300 350
Copyright © 2008 LOGTEL Yossi Cohen
- 66. Frame Difference
Frame Difference :frame 0 and 1
Copyright © 2008 LOGTEL Yossi Cohen
- 67. What is motion estimation?
Motion Vector Field of frame 1
50
0
-50
-100
-150
-200
-250
0 50 100 150 200 250 300 350 400
Copyright © 2008 LOGTEL Yossi Cohen
- 68. What is motion compensation ?
Motion compensated frame
50
100
150
200
50 100 150 200 250 300 350
Copyright © 2008 LOGTEL Yossi Cohen
- 69. Motion Compensated Frame Difference
Motion Compensated Frame Difference :frame 0 and 1
Frame Difference :frame 0 and 1
Copyright © 2008 LOGTEL Yossi Cohen
- 71. Frame Types
Three types of frames:
Intra (I): the frame is coded as if it is an image
Predicted (P): predicted from an I or P frame
Bi-directional (B): forward and backward predicted
from a pair of I or P frames.
A typical frame arrangement is:
I1 B1 B2 P1 B3 B4 P2 B5 B6 I2
P1, P2 are both forward-predicted from I1. B1, B2 are
interpolated from I1 and P1, B3, B4 are interpolated
from P1, P2, and B5, B6 are interpolated from P2, I2.
New Coding standards added other frame types:
SP, SI, D
Copyright © 2008 LOGTEL Yossi Cohen
- 74. Chronological evolution of Video Coding Standards
ITU-T H.263 H.263++
VCEG (1995/96) H.263+ (2000)
H.261 (1997/98)
H.264
(1990) MPEG-2
( MPEG-4
(H.262)
Part 10 )
(1994/95) MPEG-4 v1 (2002)
ISO/IEC (1998/99)
MPEG MPEG-4 v2
MPEG-1 (1999/00)
MPEG-4 v3
(1993)
(2001)
1990 1992 1994 1996 1998 2000 2002 2003
Copyright © 2008 LOGTEL Yossi Cohen
- 75. ITU Standards
H261
Early standard
Compressed data rate, n*64 Kbps (was created for ISDN
connections, remember it’s an ITU standard)
Resolution
QCIF 176x144,CIF 352x288
H263
Supports a wider range of bit-rates <64Kbs and up
Error recovery and performance improvements over h.261
Resolution
SQCIF, QCIF, CIF, 4CIF 704x576, 16CIF 1408x115
www.dsp-ip.com
Copyright © 2008 LOGTEL Yossi Cohen
- 76. ITU Standards
H264
Improved H263
Arithmetic coding
Dynamic block size (not only 8x8)
(Much) Better results then MPEG4-2
Tradeoff – computational overhead.
www.dsp-ip.com
Copyright © 2008 LOGTEL Yossi Cohen
- 77. ITU Standards
ITU standard evolution over the years
H261
H262
MPEG2
What’s next?
H263
H264
www.dsp-ip.com
Copyright © 2008 LOGTEL Yossi Cohen
- 78. ISO MPEG Standards
MPEG-1: CD Compression (X1)
MPEG-2: Television Broadcast quality
MPEG-4: Multimedia & Systems standard
MPEG-7: Meta-Data description
MPEG-21: Standard for the creation,
distribution and consumption of Multimedia
(mainly DRM, IPMP).
www.dsp-ip.com
Copyright © 2008 LOGTEL Yossi Cohen
- 79. Data virtualization in ISO standards
The evolution of standards from pixel description to
object description manipulation and right in ISO
standards
Object Rights
MPEG-21
Object Descriptors
MPEG-7
Object coding
MPEG-4
Image Coding
MPEG-1/2
www.dsp-ip.com
Copyright © 2008 LOGTEL Yossi Cohen
- 80. MPEG-1
A standard for storage and retrieval of audio and
video, (1992)
Up to 1.5 Mbps
P-frame, Predictive-coded frames
requires info from previous I or P frames
B-frames, Bi-directionally predictive coded frames
requires previous and following frames
D-frame, DC-coded frames
Consists of lowest frequency of an image
Used for fast forward and fast reverse modes
Copyright © 2008 LOGTEL Yossi Cohen
- 81. MPEG-2
A standard for high-quality video and digital
television, (1994)
2-100 Mbps
Coding similar to MPEG-1
Several profiles and levels for different
resolutions and qualities
Enhanced audio, (multiple channels)
Copyright © 2008 LOGTEL Yossi Cohen
- 82. MPEG-4
Designed for multimedia, (v1 Oct.1998)
Coding of both natural and synthetic audio-
visual data
Improved efficiency, (object based)
Error robustness
Many more MM features
Copyright © 2008 LOGTEL Yossi Cohen
- 83. Why ISO adopted ITU technology
Comparison of compression formats
38 CIF 30Hz
37
36
35
34
33
Quality 32
Y-PSNR [dB] 31
30
29
28 JVT/H.26L
27 MPEG-4
26 MPEG-2
25 H.263
0 500 1000 1500 2000 2500 3000 3500
Bit-rate [kbit/s]
Copyright © 2008 LOGTEL Yossi Cohen
- 85. MPEG History
Moving Picture Experts Group was founded in
January 1988 by Leonardo Chiariglione together with
around 15 experts in compression technology
Creator of numerous standards like MPEG-1, MPEG-
2, MPEG-4, MPEG-7, MPEG-21 etc.
The Group has not limited it’s scope to only “pictures”
– sound wasn’t forgot (e.g. MPEG-1 Layer3)
The industry adopted fast the MPEG standard
(Philips, Samsung, Intel, Sony etc)
MPEG has given birth to a number of technologies
we take now for granted: DVD and Digital TV
(MPEG-2), MP3 (MPEG-1 L3)
Copyright © 2008 LOGTEL Yossi Cohen
- 86. MPEG-2
In 1994, MPEG has published the ISO/IEC-
13818, also known as MPEG-2
MPEG-2 was the standard adopted by DVD
and Digital TV
MPEG2 is designed for video compression
between 1.5 and 15 Mbps for SD
MPEG-2 streams come in 2 forms: Program
Stream and Transport Stream
Copyright © 2008 LOGTEL Yossi Cohen
- 88. MPEG2- Systems
Define
Storage
Transport
Control
of MPEG2 streams
Copyright © 2008 LOGTEL
Yossi Cohen Yossi Cohen
DSP-IP
- 89. Model for MPEG-2 Systems
Copyright © 2008 LOGTEL
Yossi Cohen Yossi Cohen
DSP-IP
- 90. MPEG-2 Program Stream
Similar to MPEG-1 Systems Multiplex
Combines one or more Packetised Elementary
Streams (PES), which have a common time-
base, into a single stream
Designed for use in relatively error-free
environments and suitable for applications
which may involve software processing
Program stream packets may be of variable and
relatively great length
Variable length / Error free what's the
connection?
Copyright © 2008 LOGTEL Yossi Cohen
- 91. MPEG-2 Transport Stream
Combines one or more Packetized Elementary
Streams (PES) with one or more independent
time bases into a single stream (sometimes
called multiplex)
Elementary streams sharing a common time-
base form a program
Designed for use in environments where errors
are likely, such as storage or transmission in
lossy or noisy media
The transport stream is made of packets with
fixed length of 188 bytes – Why?
What is the header overhead in 188 bytes
packet?
Copyright © 2008 LOGTEL Yossi Cohen
- 94. MPEG-2 Audio
Backwards compatible - defines extensions:
MultiChannel coding
5 channel audio (L, R, C, LS, RS)
Multilingual coding
7 multilingual channels
Lower sampling frequencies (LSF)
Optional Low Frequency Enhancement (LFE) -
Bass
Copyright © 2008 LOGTEL Yossi Cohen
- 96. File Formats
Movie (meta-data)
Video track
trak
moov
Audio track
trak
Media Data
sample sample sample sample
mdat
frame frame
Copyright © 2008 LOGTEL
- 97. Agenda
Intro to file formats
Second Generation formats
RIFF: AVI, WAV
Third Generation Containers
MPEG4 FF
MKV
Copyright © 2008 LOGTEL Yossi Cohen
- 98. File Format Segmentation
File
Formats
3rd 2nd 1st
Generation Generation Generation
Object Media Raw /
XML Based
Based Muxer Proprietary
Copyright © 2008 LOGTEL Yossi Cohen
- 100. 2ND Generation Files features
Multiple media track in the same file
Identification of codec
Usually by FourCC
Interleaving
Copyright © 2008 LOGTEL Yossi Cohen
- 101. 2nd Generation File Formats
2nd Generation FF
RIFF ASF MPEG2 FLV
MP2PS
WAV AVI WMA WMV MP2TS
VOB
Copyright © 2008 LOGTEL Yossi Cohen
- 103. AVI Overview
AVI files use the AVI RIFF format (like WAV)
Introduced by Microsoft on 1992
File is divided into:
Streams – Audio, Video, Subtitles
Blocks “Chunks” -
Copyright © 2008 LOGTEL Yossi Cohen
- 104. Blocks / Chunks
A RIFF File logical unit
Chunks are identified by four letters (FOUR-CC)
RIFF file has two mandatory sub-chunks and
one optional sub-chunk
Mandatory Chunks: RIFF ('AVI '
LIST ('hdrl‘
hdrl – File header
'avih'(<Main AVI Header>)
movi - Media Data LIST ('strl’ ... ) . . . )
LIST ('movi‘ . . . )
Optional Chunk ['idx1
['idx1'<AVI Index>]
idx1 - Index )
*This order is fixed
Copyright © 2008 LOGTEL Yossi Cohen
- 105. AVI main header
RIFF 'AVI ' - Identifies the file as RIFF file.
LIST 'hdrl' - Identifies a chunk containing sub-
chunks that define the format of the data.
'avih' - Identifies a chunk containing general
information about the file. Includes:
dwMicrosecPerFrame - Time between frames
dwMaxBytesPerSec – number of bytes per second
the player should handle
dwReserved1 - Reserved
dwFlags - Contains any flags for the file.
Copyright © 2008 LOGTEL Yossi Cohen
- 106. Example - headers
Avi file header
Initial frame
chunk ID chunk size format chunk ID
Data rate flages
Time between streams
frames
Total no. of
frames
Frame Stream header
width 320
Frame
height
reserved
Size of padding Junk chunk
identifier
Copyright © 2008 LOGTEL Yossi Cohen
- 107. Example – data chunks
Audio data chunk
(stream 01)
video data chunk
(stream 00)
Copyright © 2008 LOGTEL Yossi Cohen
- 108. AVI Summary
Advantages
Includes both audio and video
Index-able
Disadvantage
Not suited for progressive DW
Very rigid format
Insufficient support for: seeking, metadata multi-
reference frames
Copyright © 2008 LOGTEL Yossi Cohen
- 110. Why “Fix it”?
2nd Generation Formats are missing:
Metadata
Separate from Media
Info on angle, language, Synchronization
Versioning
Better Streaming Support
Reduce CPU per stream
Better seeking support
Better parsing
XML
Atom Based
Copyright © 2008 LOGTEL Yossi Cohen
- 111. Main Attributes
File format is not just a Video / Audio multiplexer
Separation between
Media – Audio, Video, Images, Subtitles
Metadata – Indexing, frame length, Tags
Copyright © 2008 LOGTEL Yossi Cohen
- 112. 3rd Generation File Formats
3rd Generation
XML Based Object Based
Matruska (MKV) MOV MPEG4 FF
Fragmented
3GPP FF
MPEG4 FF
Copyright © 2008 LOGTEL Yossi Cohen
- 114. MP4 File Format
File Structuring Concepts
Separate the media data from descriptive (meta)
data.
Support the use of multiple files.
Support for hint tracks:
support of real time streaming over any protocol
Copyright © 2008 LOGTEL Yossi Cohen
- 115. Separate Metadata and Media
Key meta-information is compact
The type of media present
Time-scales
Timing
Synchronization points etc.
Enables
Random access
Inspection, composition, editing etc.
Simplified update
Copyright © 2008 LOGTEL Yossi Cohen
- 116. Multiple file support
Use URLs to ‘point to’ media
Distinct from URLs in MPEG-4 Systems
URLs use file-access service
e.g. file://, http://, ftp:// etc.
Permits assembly of composition without
requiring data-copy
Referenced files contain only media
Meta-data all in ‘main’ file
Copyright © 2008 LOGTEL Yossi Cohen
- 117. Logical File Structure
Presentation (‘movie’) contains
Tracks which contain
Samples
Copyright © 2008 LOGTEL Yossi Cohen
- 118. Physical Structure—File
Succession of objects (atoms, boxes)
Exactly one Meta-data object
Zero or more media data object(s)
Free space etc.
Copyright © 2008 LOGTEL Yossi Cohen
- 119. Example Layout
Movie (meta-data)
Video track
trak
moov
Audio track
trak
Media Data
sample sample sample sample
mdat
frame frame
Copyright © 2008 LOGTEL Yossi Cohen
- 120. Meta-data tables
Sample Timing
Sample Size and position
Synchronization (random access) points, priority
etc.
Temporal/physical order de-coupled
May be aligned for optimization
Permits composition, editing, re-use etc. without re-
write
Tables are compacted
Copyright © 2008 LOGTEL Yossi Cohen
- 121. Multi-protocol Streaming support
Two kinds of track
Media (Elementary Stream) Tracks
Sample is Access Unit
Protocol ‘hint’ tracks
Sample tells server how to build protocol transmission
unit (packet, protocol data unit etc.)
Copyright © 2008 LOGTEL Yossi Cohen
- 122. Track types
Visual—’description’ formats
MPEG4
JPEG2000
Audio—’description’ formats
MPEG4 compressed tracks
‘Raw’ (DV) audio
Other MPEG-4 tracks
Hint Tracks (streaming)
Copyright © 2008 LOGTEL Yossi Cohen
- 123. Track Structure
Sample pointers (time, position)
Sample description(s)
Track references
Dependencies, hint-media links
Edit lists
Re-use, time-shifting, ‘silent’ intervals etc.
Copyright © 2008 LOGTEL Yossi Cohen
- 124. Hint Tracks
May include media (ES) data by ref.
Only ‘extra’ protocol headers etc. added to hint
tracks — compact
Make SL, RTP headers as needed
May multiplex data from several tracks
Packetization/fragmentation/multiplex through
hint structures
Timing is derived from media timing
Copyright © 2008 LOGTEL Yossi Cohen
- 125. Hint track structure
Movie (meta-data)
Video track
trak
moov
Hint track
trak
Sample Data
sample sample
hint sample hint sample
mdat header header
frame frame
pointer pointer
Copyright © 2008 LOGTEL Yossi Cohen
- 126. Extensibility
Other media types.
Non-sc29 sample descriptions (e.G. Other video).
Non-sc29 track types (e.G. Laboratory instrument
trace).
Copyright notice (file or track level) etc.
General object extensions (GUIDs).
Copyright © 2008 LOGTEL Yossi Cohen
- 127. Advantages
Compatibility
files can be played by other companies players.
Real Player with envivo plug-in.
Windows media player etc.
Files can be streamed by other companies streaming
server
Darwin Streaming Server.
Quick Time Streaming Server.
Copyright © 2008 LOGTEL Yossi Cohen
- 128. Single File-Multiple data types
No need to do an export process for files, one
file type is used for storage of video, audio,
events, continues telemetry data from sensors
and JPEG images in one file.
Audio
Métadonnées
Video
JPEG1
JPEG1
Sensor Continues data
events
Copyright © 2008 LOGTEL Yossi Cohen
- 129. Single file playback
All video track of a site could be stored in one
file. In order to view many cameras in a
synchronized manner the MPEG-4 file format
can hold all the views of multiple cameras in one
file.
Audio
Métadonnées
Video cam 1
Video cam 2
Video cam …….
Video cam N
Copyright © 2008 LOGTEL Yossi Cohen
- 130. Skimming
Skimming – shortening a long movie to its
interesting points, much like creating a “promo”.
For example skimming a surveillance movie of
two hours to 2 minutes where there is movement
and people are entering the building.
MPEG-4 FF enables the creation of skims within
the file through the use of edit-list (part of the
standard) without overhead.
Copyright © 2008 LOGTEL Yossi Cohen
- 131. MKV FILE FORMAT
XML Based File-Format
Copyright © 2008 LOGTEL Yossi Cohen
- 132. MKV - File Format
Container file format for videos, audio tracks,
pictures and subtitles all in one file.
Announced on Dec. 2002 by Steve Lhomme.
Based on Binary XML format called EBML
(Extensible Binary Meta Language)
Complete Open-Standard format. (Free for
personal use).
Source is licensed under GNU L-GPL.
Copyright © 2008 LOGTEL Yossi Cohen
- 133. MKV - Specifications
Can contain chapter entries of video streams
Allows fast in-file seeking.
Metadata tags are fully supported.
Multiple streams container in a single file.
Modular – Can be expanded to company special
needs.
Can be streamed over HTTP, FTP, etc.
Copyright © 2008 LOGTEL Yossi Cohen
- 134. MKV Support software & hardware
Players:
All Player, BS.Player, DivX Player, Gstreamer-Based
players, VLC media, xine, Zoom Player, Mplayer,
Media Player Classic, ShowTime, Media Player
Classic and many more
Media Centers:
Boxee, DivX connected, Media Portal, PS3 Media
Server, Moovida, XBMC etc.
Blu-Ray Players:
Samsung, LG and Oppo.
Mobile Players:
Archos 5 android device, Cowon A3 and O2.
Copyright © 2008 LOGTEL Yossi Cohen
- 135. MKV - EBML in details
A binary format for representing data in XML-like
format.
Using specific XML tags to define stream
properties and data.
MKV conforms to the rules of EBML by defining
a set of tags.
Segment , Info, Seek, Block, Slices etc.
Uses 3 Lacing mechanisms for shortening small
data block (usually frames).
Uses: Xiph, EBML or fixed-sized lacing.
Copyright © 2008 LOGTEL Yossi Cohen
- 136. MKV – Simple representation
Type Description
Header Version info, EBML type ( matroska in our case ).
Meta Seek Optional, Allows fast seeking of other level 1 elements in file.
Information
Segment File information - title, unique file ID, part number, next file
Information ID.
Track Basic information about the track – resolution, sample rate,
codec info.
Chapters Predefines seek point in media.
Clusters Video and audio frames for each track
Cueing Data Stores cue points for each track. Allows fast in track seeking.
Attachment Any other file relates to this. ( subtitles, Album covers, etc… )
Tagging Tags that relates to the file and for each track (similar to MP3
ID3 tags).
Copyright © 2008 LOGTEL Yossi Cohen
- 137. MKV – Streaming
Matroska supports two types of streaming.
File Access
Used for reading file locally or from remote web
server.
Prone to reading and seeking errors.
Causes buffering issues on slow servers.
Live Streaming
Usually over HTTP or other TCP based protocol.
Special streaming structure – no Meta seek, Cues,
Chapters or attachments are allowed.
Copyright © 2008 LOGTEL Yossi Cohen
- 138. File Format Summary - Trends
Metadata is important
Simple metadata or XML
Separated from media
Forward compatibility
Not crash if don’t understand a data entry
Progressive download oriented
Multi-bitrate oriented
Fragmentation -> Lower granularity
Self contained File fragments
CDN-ability
Copyright © 2008 LOGTEL Yossi Cohen
- 139. Video Codecs
Movie (meta-data)
Video track
trak
moov
Audio track
trak
Media Data
sample sample sample sample
mdat
frame frame
Copyright © 2008 LOGTEL Yossi Cohen
- 140. Why Advance ? MPEG2 Works .
Coding efficiency
Packetization
Robustness
Scalable profiles
Internet requires Interaction
Scalable & On demand
Fast-Forward / Fast Rewind / Random Access
Stream switching
Multi
Bitrate
resolution /screen
Copyright © 2008 LOGTEL Yossi Cohen
- 142. Codec discussion
Internet and video codec
Standard codecs – MPEG4-2 and H.264
Non standard codecs
Sorenson Spark
VP6
WMV9
VC-1
VP8
Copyright © 2008 LOGTEL Yossi Cohen
- 144. H.264 Terminology
The following terms are used interchangeably:
H.26L
“JVT CODEC”
The “AVC” or Advanced Video CODE
Proper Terminology going forward:
MPEG-4 Part 10 (Official MPEG Term)
ISO/IEC 14496-10 AVC
H.264 (Official ITU Term)
Copyright © 2008 LOGTEL Yossi Cohen
- 145. H264 Standard ideas
“Blocks” size fixed ->Variable
Slice
Block
Block Size order/scanning –> different orders
Zig-zag, Flexible Macroblock Order
Additional spatial prediction - >Intra prediction
Inter prediction 1 frame only ->Multiple frames
P and B picture
Multiple reference frame
Copyright © 2008 LOGTEL Yossi Cohen
- 146. H264 Standard Ideas
Pixel interpolation
Motion vectors
In-loop Deblocking filter
Improved Entropy coding
Copyright © 2008 LOGTEL Yossi Cohen
- 147. New Features of H.264 - summarized
SP, SI - Additional picture types
NAL (Network Abstraction Layer)
CABAC - Additional entropy coding mode
¼ & 1/8-pixel motion vector precision
In-loop de-blocking filter
B-frame prediction weighting
4×4 integer transform
Multi-mode intra-prediction
NAL - Coding and transport layers separation
FMO - Flexible MacroBlock ordering.
Copyright © 2008 LOGTEL Yossi Cohen
- 149. Profiles and Levels
Profiles: Baseline, Main, and X
Baseline: Progressive, Videoconferencing &
Wireless
Main: esp. Broadcast
Extended: Mobile network
Wireless <> Mobile
Copyright © 2008 LOGTEL Yossi Cohen
- 151. Baseline Profile
Baseline profile is the minimum
implementation
No CABAC, 1/8 MC, B-frame, SP-slices
15 levels
Resolution, capability, bit rate, buffer, reference #
Built to match popular international production
and emission formats
From QCIF to D-Cinema
Progressive (not interlaced)
I and P slices types
Copyright © 2008 LOGTEL Yossi Cohen
- 152. Baseline Profile
1/4-sample Inter prediction
Deblocking filter, Redundant slices
VLC-based entropy coding (no CABAC)
4:2:0 chroma format
Flexible Macroblock Ordering (FMO)
Arbitrary Slice Order (ASO)
Decoder process slices in an arbitrary order as they
arrive to the decoder.
The decoder dose not have a wait for all slices to be
properly arranged before it starts processing them.
Reduces the processing delay at the decoder.
Copyright © 2008 LOGTEL Yossi Cohen
- 153. Baseline Profile
FMO: Flexible Macroblock Ordering
With FMO, macroblocks are coded according to a
macroblock allocation map that groups, within a given
slice.
Macroblocks from spatially different locations in the
frame.
Enhances error resilience
Redundant slices:
allow the transmission of duplicate slices.
Copyright © 2008 LOGTEL Yossi Cohen
- 154. H.264 Profiles & Levels - Main
All Baseline features Plus
Interlace
B slice types (bi directional reference )
CABAC
Weighted prediction
All features included in the Baseline profile
except:
Arbitrary Slice Order (ASO)
Flexible Macroblock Order (FMO)
Redundant Slices
Copyright © 2008 LOGTEL Yossi Cohen
- 155. Main Profile
CABAC
Good performance (bit rate reduction) by
Selecting models by context
Adapting estimates by local statistics
Arithmetic coding reduces computational complexity
Improve computational complexity more than
10%~20% of the total decoder execution time at
medium bitrate
Average bit-rate saving over CAVLC 10-15%
Copyright © 2008 LOGTEL Yossi Cohen
- 156. Extended Profile
All Baseline features plus
Interlace
B slice types
Weighted prediction
Copyright © 2008 LOGTEL Yossi Cohen
- 157. Frame structure
Slices:
A picture is split into 1 or several slices.
Slices are self contained.
Slices are a sequence of MB.
MacroBlocks [MB]
Basic syntax & processing unit.
Contains 16x16 luma samples and
2 x 8x8 chroma samples.
MB within a slice depend on each other.
MB can be further partitioned.
Copyright © 2008 LOGTEL Yossi Cohen
- 159. Scanning order of residual blocks
For Intra 16x16 MB
, block labeled -1 is
transmitted first
containing DC
coeff.
Luma residual
blocks 0-15 are
transmitted
Block 16 & 17
contain a 2x2 array
of chroma DC
coeff.
Chroma residual
blocks 18-25 are
sent
Copyright © 2008 LOGTEL Yossi Cohen
- 160. Variable block size
Slices
A picture split into 1 or several slices
Slices are a sequence of macroblocks
Macroblock
Contains 16x16 luminance samples and
two 8x8 chrominance samples
Macroblocks within a slices depend on
each others
Macroblocks can be further partitioned
Slice 0
Slice 1
Slice 2
Copyright © 2008 LOGTEL Yossi Cohen
- 161. Basic Marcoblock Coding Structure
Input Coder
Video Control
Control
Signal
Data
Transform/
Quant.
Scal./Quant.
- Transf. coeffs
Decoder Scaling & Inv.
Split into
Macroblocks Transform
16x16 pixels Entropy
Coding
De-blocking
Intra-frame Filter
Prediction
Output
Motion- Video
Compensation Signal
Intra/Inter
Motion
Data
Motion
Estimation
Copyright © 2008 LOGTEL Yossi Cohen
- 162. Motion Compensation
Input Coder
Video Control
Control
Signal
Data
Transform/
Quant.
Scal./Quant.
- Transf. coeffs
Decoder Scaling & Inv.
Split into
Macroblocks Transform
16x16 pixels Entropy
Coding
De-blocking
16x16 16x8 8x16 8x8
Intra-frame MBFilter 0 0 1
Prediction Types 0 0 1
Output1 2 3
Motion- Video
Compensation 8x8 8x4
Signal 4x8 4x4
Intra/Inter
8x8 0 0 1
0 0 1
Types Motion
1 2 3
Data
Motion Various block sizes and shapes
Estimation
Copyright © 2008 LOGTEL Yossi Cohen
- 163. Tree structured Motion Compensation
Input Coder
Video Control
Control
Signal
Data
Transform/
Quant.
Scal./Quant.
- Transf. coeffs
Decoder Scaling & Inv.
Split into
Macroblocks Transform
16x16 pixels 16x16 16x8 8x16
Entropy 8x8
MB 0 Coding 0 1
Types 0 0 1
De-blocking 1 2 3
Intra-frame Filter
8x8 8x4 4x8 4x4
Prediction 0 0 1
8x8 0 Output 0 1
Motion- Types Video 1 2 3
Compensation Signal
Intra/Inter
Motion 5
013
46
Data7
Motion 2
8
Estimation
Motion vector accuracy 1/4 (6-tap filter)
Copyright © 2008 LOGTEL Yossi Cohen
- 164. Variable block size
Block sizes of 0 0 1
16x8, 8x16, 8x8,
8x4 , 4X8 and 0 0 1 2 3
1
4X4 are
available.
Mode 1 Mode 2 Mode 3 Mode 4
1 16x16 block 2 16x8 blocks 2 8x16 blocks 4 8x8 blocks
0 1 0 1 2 3
Using seven different 0 1 2 3
2 3 4 5 6 7
block sizes can translate
4 5 8 9 1 1
into bit rate savings of 4 5 6 7 0 1
6 7
more than 15% as 1
2
1
3
1
4
1
5
compared to using only a Mode 5 Mode 6
16x16 block size. 8 8x4 blocks 8 4x8 blocks
Mode 7
16 4x4 blocks
Copyright © 2008 LOGTEL Yossi Cohen
- 165. How to select the partition size?
The partition size that minimizes the coded
residual and motion vectors
Copyright © 2008 LOGTEL Yossi Cohen
- 166. The Trade off .
Large partition size (e.g. 16x16,16x8, 8x16) requires small
number of bits to signal the choice of motion vector and the
partition type.
However, the motion compensated residual may contain a
significant amount of energy in frame areas with high details.
Small partition size (e.g. 8x4, 4x4 etc) may give a lower energy
residual after motion compensation but requires a large number
of bits to signal the motion vectors and the choice of partition.
The choice of partition size therefore has significant impact on
compression performance.
In general, a large partition size is appropriate for
homogeneous areas of the frame and a small partition size may
be beneficial for details area.
Copyright © 2008 LOGTEL Yossi Cohen
- 167. Interpolation
Quarter sample luma
interpolation
2 steps:
Applying a 6 tap filter
with tap values:
(1,-5,20,20,-5,1)
Quarter sample
positions are obtained
by averaging samples at
integer and half sample
positions.
b=round((E-5F+20G+20H-5I+J)/32)
Copyright © 2008 LOGTEL Yossi Cohen
- 168. Chroma Interpolation
Chroma interpolation is 1/8
sample accurate since luma
motion is ¼ sample accurate.
Fractional chroma sample
positions are obtained using the
equation:
Copyright © 2008 LOGTEL Yossi Cohen
- 169. Inter prediction modes
MVs for neighboring partitions are often highly
correlated.
So we encode MVDs instead of MVs
MVD = predicted MV – MVp
¼ pixel accurate motion compensation
Copyright © 2008 LOGTEL Yossi Cohen
- 170. Multiple Reference Frames
Input Coder
Video Control
Control
Signal
Data
Transform/
Quant.
Scal./Quant.
- Transf. coeffs
Decoder Scaling & Inv.
Split into
Macroblocks Transform
16x16 pixels Entropy
Coding
De-blocking
Intra-frame Filter
Prediction
Output
Motion- Video
Compensation Signal
Intra/Inter
Motion
Multiple Reference Data
Frames for
Motion Motion Compensation
Estimation
Copyright © 2008 LOGTEL Yossi Cohen
- 172. Intra prediction modes
4x4 luminance prediction modes
0(vertical) 1(Horizontal) 2(DC) 3(Diagonal 4(Diagonal
Down/left) Down/right)
5(Vertical-right) 6(Horizontal-down) 7(Vertical-left) 8(Horizontal-top)
Mode 2 (DC)
Predict all pixels from
(A+B+C+D+I+J+K+L+4)/8 or
(A+B+C+D+2)/4 or (I+J+K+L+2)/4
Copyright © 2008 LOGTEL Yossi Cohen
- 174. Intra prediction modes
Intra 16x16 luminance and 8x8 chrominance prediction modes
Copyright © 2008 LOGTEL Yossi Cohen
- 175. Inter prediction modes
chrominance Pixel interpolation
Quarter chrominance Pixels are A B
interpolated by tacking weighted dy
dx
averages of distance from the new S-dx
pixel to four surrounding original S-dy
pixels.
C D
(s-dx)(s-dy)A+dx(s-dy)B+(s-dx)dyC+dxdyD+s2/2
V=
S2
Copyright © 2008 LOGTEL Yossi Cohen
- 176. Deblocking filter
Deblocking filter:
Improves subjective visual and objective quality of
the decoded picture
Significantly superior to post filtering
Filtering affects the edges of 4x4 block structure
Highly content adaptive filtering procedure mainly
removes blocking artifacts and does not
unnecessarily blur the visual content
Filtering strength is dependent on inter,intra, motion
and coded residuals.
Copyright © 2008 LOGTEL Yossi Cohen
- 178. Deblocking filter
Deblocking filter: Highly compressed decoded inter picture
1) Without Filter 2) with H264/AVC Deblocking
Copyright © 2008 LOGTEL Yossi Cohen
- 180. Entropy coding
Entropy coding methods:
CABAC - Discussed
UVLC
H.264 offers a single Universal VLC (UVLC) table for
all symbol
CAVLC
CAVLC (Context-based variable Length Coding )
Probability distribution is static
Code words must have integer number of bits (Low
coding efficiency for highly peaked pdfs)
Copyright © 2008 LOGTEL Yossi Cohen
- 181. CABAC: Technical Overview
update probability estimation
Context Binarization Probability Coding
modeling estimation engine
Adaptive binary arithmetic coder
Chooses a model Maps non-binary Uses the provided model
conditioned on symbols to a for the actual encoding
past observations binary sequence and updates the model
Copyright © 2008 LOGTEL Yossi Cohen
- 182. Complexity of codec design
Codec design includes much higher complexity (memory &
computation) – rough guess 2-3x decoding power increase
relative to MPEG4, 4-5x encoding
Problem areas:
Smaller block sizes for motion compensation (cache access
issues)
Longer filters for motion compensation (more memory
access)
Multi-frame motion compensation (more memory for
reference frame storage)
More segmentations of macroblock to choose from (more
searching in the encoder)
More methods of predicting intra data (more searching)
Arithmetic coding (adaptivity, computation on output bits)
Copyright © 2008 LOGTEL Yossi Cohen
- 184. Summary
New key features are:
Enhanced motion compensation
Small blocks for transform coding
Improved de-blocking filter
Enhanced entropy coding
Substantial bit-rate savings (up to 50%) relative to
other standards for the same quality
The complexity of the encoder triples that of the
prior ones
The complexity of the decoder doubles that of the
prior ones
Copyright © 2008 LOGTEL Yossi Cohen
- 185. Sorenson Spark video Codec
H263 variant
Low footprint (code size) ~100K
Good performance for 2002
Quality SPARK vs Optimal MPEG (H263+)
20-30% less efficient
SPARK Quality RT vs Offline
RT has Considerably lower quality due to processing
power and RT (delay) constraints
Copyright © 2008 LOGTEL Yossi Cohen
- 186. Sorenson Spark - 2
Does Not support:
Arithmetic coding
Advance prediction
B-frames
Features
De-blocking filter mode
UMV - Unrestricted Motion Vector mode
Arbitrary frame dimensions
Supported by FFMPEG
D – Frames
Copyright © 2008 LOGTEL Yossi Cohen
- 187. D-Frames
D (Disposable) frames
One way prediction
Provides flexible bit-rate: I-D-P-D-P-D-P
D-frames used only when feeding a flash
communication server
Copyright © 2008 LOGTEL Yossi Cohen
- 188. On2 TrueMotion VP6
Features
Compressed I-frames (Intra-compression makes use
of spatial predictors)
unidirectional predicted frames (P-frames)
Multiple reference P-frames
8x8 iDCT-class transform (4x4 in VP7)
improved quantization strategy (preserves image
details)
Advance Entropy Coding
Copyright © 2008 LOGTEL Yossi Cohen
- 189. VP6 Features
Entropy Coding
various techniques are used based on complexity and
frame size including:
VLC
Context modeled binary coding (like H264 CABAC)
Bit Rate Control
To reach the requested data rate, VP6 adjusts
Quantization levels
Encoded frame dimensions
Entropy Coding
Drop frames
Copyright © 2008 LOGTEL Yossi Cohen
- 190. VP6 motion prediction
Motion Vectors
One vector per MacroBlock (16x16)
or
4 vectors for each block (8x8)
Quarter pel motion compensation support
Unrestricted motion compensation support
Two reference frames:
The previous frame
or
Previously bookmarked frame
Copyright © 2008 LOGTEL Yossi Cohen
- 191. VP6 vs H264
VP6 is much simpler than H.264
Requires less CPU resourced for decoding & encoding
Code size is considerably smaller.
Simpler means less efficient? NO! Techniques
used:
Mix of adaptive sub-pixel motion estimation
Better prediction of low-order frequency coefficients
Improved quantization strategy
de-blocking and de-ringing filters
Enhanced context based entropy coding,
Copyright © 2008 LOGTEL Yossi Cohen
- 192. PSNR Graphs are used for comparative
analysis of compression quality. Each line
720p High Profile H.264 vs VP7
represents the encode quality on a given
This axis represents quality. Higher is better
clip at multiple datarates. The highest line
Draw a line straight
represents the codec with the best quality.
Alexander Trailer intersect
across until you
In this case VP7 clearly is better than x264.
47
the lower line ( in this
Pick any point on case x264. i.e. keep the
46.5
the top line, in this Tips for reading this kind of a
quality/ psnr constant )
46 case it’s VP7. graph (a PSNR graph):
What this means:
45.5 On this clip VP7 at 2750 kbps has the
same quality / PSNR as x264 high profile
45
Draw a line straight kbps. i.e. you’d need 30% higher a line straight
at 3620
PSNR
Draw
44.5 down from that pointdatarate to get the same quality outfrom that point to
to down of Vp7
the datarate axis. The x264 that you got from vP7. x264
the datarate axis. The
44
crossing point tells you crossing point tells you
43.5 the datarate at that the datarate at that
point. point.
43
42.5
1400 1900 2400 2750 kbps
2900 3400 3620 kbps 3900 4400
Kbps
This axis represents datarate in kilobits per second.
Copyright © 2008 LOGTEL Yossi Cohen
- 193. VP6 vs. H264
There is a difference between the codec
technology and a codec implementation.
Copyright © 2008 LOGTEL Yossi Cohen
- 194. On2 VP7
Not open source
Non-standard royalties model
Better video quality than H264
Used by:
Part of EVD – China standard for HD-DVD
Skype Beta (V 2.0)
Flash Player
Copyright © 2008 LOGTEL Yossi Cohen
- 195. Windows Media
Windows media is a format used by Microsoft for
encoding and distributing Audio and Video.
Windows Media has two types of media:
Windows Media Audio (WMA)
Windows Media video (WMV)
Windows Media Video
A modified version of MPEG 4
Codec version has initially started from version 7 for
windows media player 7 and then evolved to version
8-10
Copyright © 2008 LOGTEL Yossi Cohen
- 196. Windows Media 9 - VC1 Format
Microsoft has submitted Version 9 codec to the Society
of Motion Picture and Television Engineers (SMPTE), for
approval as an international standard. SMPTE is
reviewing the submission under the draft-name "VC-1")
This codec is also used to distribute high definition video
on standard DVDs in a format Microsoft has branded as
WMV HD. This WMV HD content can be played back on
computers or compatible DVD players.
The Trial version of standards were published by
SMPTE in September 2005
WMV9 was approved by SMPTE, April 2006
Copyright © 2008 LOGTEL Yossi Cohen
- 198. Before we start
VP8 goal is NOT to delivery the best video
quality in any given bitrate
VP8 was designed as a mobile video decoder
and should be examined in this context:
VP8 vs H.264 base profile
Copyright © 2008 LOGTEL Yossi Cohen
- 199. Google VP8
Last month, in Google IO (its developer
confrence), Google released VP8 as open
source
VP8 is a light weight video codec developed by
On2.
VP8 provide quality which is the same/higher
than H.264 base profile
VP8 memory requirements are lower than H.264
base profile
After optimization, VP8 might have better MIPS
performance than H.264 base profile
Copyright © 2008 LOGTEL Yossi Cohen
- 200. Genealogy
VP8 is part of a well know codec family
VP3 was released to open source to become
XIPH Theora
VP6 is used in Flash video
VP7 is used in Skype
Theora VP3
Motivation:
“No Royalties” CODEC
VP7 VP6
VP8
Copyright © 2008 LOGTEL Yossi Cohen
- 201. ADAPTATION – WHO USE IT?
Software
Hardware
Platform & Publishers
Copyright © 2008 LOGTEL Yossi Cohen
- 202. Software Adaptation
Android, Anystream, Collabora
Corecodec, Firefox, Adobe Flash
Google Chrome, iLinc,
Inlet, Opera, ooVoo
Skype, Sorenson Media
Theora.org, Telestream, Wildform.
Copyright © 2008 LOGTEL Yossi Cohen
- 203. Hardware adaptation
AMD, ARM, Broadcom
Digital Rapids, Freescale
Harmonic ,Logitech, ViewCast
Imagination Technologies, Marvell
NVIDIA, Qualcomm, Texas Instruments
VeriSilicon, MIPS
Copyright © 2008 LOGTEL Yossi Cohen
- 204. Platforms and Publishers
Brightcove
Encoding.com
HD Cloud
Kaltura
Ooyala
YouTube
Zencoder
Copyright © 2008 LOGTEL Yossi Cohen
- 206. Adaptive Loop Filter
Improved Loop filter provides better quality &
preformance in comparison to H.264
Source: On2
Copyright © 2008 LOGTEL Yossi Cohen
- 207. Golden Frames
Golden frames enables better decoding of
background which is used for prediction in later
frames
Could be used as resync-point:
Golden frame can reference an I frame
Could be hidden (not for display)
Source: On2
Copyright © 2008 LOGTEL Yossi Cohen
- 208. Decoding efficiency
CABAC is an H.264 feature which improves
coding efficiency but consumes many CPU
cycles
VP8 has better entropy coding than H.264, this
leads to relatively lower CPU consumption under
the same conditions
• Decoding efficiency is
important for smooth
operation and long battery
life in netbooks and mobile
devices
Copyright © 2008 LOGTEL Source: On2 Yossi Cohen
- 209. Resolution up-scaling & downscaling
Supported by the decoder
Encoder could decide dynamically (RT
applications) to lower resolution in case of low
bit rate and let the decoder scale.
Remove decision from the application
No need for an I frame
Copyright © 2008 LOGTEL Yossi Cohen
- 210. VP8 BASICS
Definitions
Bitstream structure
Frame structure
Copyright © 2008 LOGTEL Yossi Cohen
- 211. Definitions
Frame – same as H.264
Segment – Parallel to slice in H.264. MB in the
same segment will use the settings such as:
Probabilistic encoder/decoder settings
De-blocking filter settings
Partition – block of byte aligned compressed
video bits.
Copyright © 2008 LOGTEL Yossi Cohen