Video Compression, Part 3-Section 2, Some Standard Video Codecs

Dr. Mohieddin Moradi
mohieddinmoradi@gmail.com
1
Dream
Idea
Plan
Implementation

Section I
– ISO/IEC JTC 1/SC 29 Structure and MPEG
– ITU-T structure and VCEG (Video Coding Experts Group or Visual Coding Experts Group)
– A Generic Interframe Video Encoder
– H.261 Video Coding Standard
– MPEG-1 Video Coding Standard
– MPEG-2 Video Coding Standard
Section II
– MPEG-2 Transport and Program Streams
– H.263 Video Coding Standard
– H.263+ Video Coding Standard
– H.263++ Video Coding Standard
– Bit-rate (R) and Distortion (D) in Video Coding
2
Outline

3
Video Source
Decompress
(Decode)
Compress
(Encode)
Video Display
Coded
video
ENCODER + DECODER = CODEC

− Created in 1992
• 300 members, >35 countries - www.dvb.org
• Promotion of open standards for Digital TV broadcasting
− Principal Recommandations
• Physical Layer
− Satellite: DVB-S, DVB-S2
− Cable: DVB-C
− Terrestrial: DVB-T, DVB-T2
− Mobiles DVB-H, DVB-SH
• Signalisation
− Information de services: DVB-SI
− Services synchro: DVB-SAD
• Protection
− DVB-CAS, DVB-CSA
− Interface smartcard: DVB-CI, DVB-CI+
4
DVB

DVB-C
(QAM)
DVB-T
(COFDM)
DVB-T
(COFDM)
DVB-S
(QPSK)
DVB-S
(QPSK)
Multiplexing
Scrambling
MPEG-2
Coding
Descrambling
MPEG-2
Decoding
Satellite
Terrestrial
Cable
DVB-C
(QAM)
Scrambling
Key
DVB
Demultiplexing
Descrambl.
Key
MPEG-2
Video Audio Data Data Audio Video
DVB Systems
5

…bits bits bits ...
Video or Audio
Elementary Stream (ES)
PES Packet
Header Payload
Packetized
Elementary Stream (PES)
Time stamps
TS Packet (188 bytes)
Header Payload
MPEG 2
Transport Stream (TS)
PID
TS Header, contains PID and clock
PES Header
Rule: Every elementary stream gets its own (Packet ID) PID
The MPEG Transport Stream
6

Processing of The Streams in The STB
Tuner/
Demod
MPEG2
Demux
Video
Decomp.
Audio
Decomp.
System
Memory
Processor
• 6 TV
• 20 Radio
• Service Information
QAM
OFDM
A/D
A/D
MPEG2-TS : 40 Mbit/s, e.g..:
188
188
MPEG2-TS
PID Header Payload
DEMUX
queues
PID 1
PID 2
section
section
QPSK
7

8
Digital Terrestrial TV - Layers
. . . provide clean interface points. . . .
Picture
Layer
Multiple Picture Formats
and Frame Rates
1920 x 1080
1280 x 720
50,25, 24 Hz
Transmission
Layer
7 MHz
COFDM / 8-VSBVHF/UHF TV Channel
Video
Compression
Layer
MPEG-2
compression
syntax
ML@MP
or
HL@MP
Data
Headers
Motion
Vectors
Chroma and Luma
DCT Coefficients
Variable Length Codes
Transport
Layer MPEG-2
packets
Video packet Video packetAudio packet Aux data
Packet Headers Flexible delivery of data

9
Digital Television Encode Layers
Delivery System
Bouquet Multiplexer
Program 2 Program 3
Service
Mux
Other Data
Control Data
Program Association Table (PAT)
Picture
Coding
Audio
Coding
Data
Coding
MPEG-2
or AC-3MPEG-2
Control
Data
Video Data Sound
Modulator & Transmitter
Error Protection
Control Data
188 byte packetsMPEG Transport Data Stream
Program 1 Multiplexer
MPEG Transport
Stream MuxControl Data
Program Map Table (PMT)
PES PES PES

10
Digital Television Decode Layers
Audio
Decoder
Data
Decoder
Picture
Decoder
MPEG
or AC-3MPEG-2
Demodulator & Receiver Error
Control
Delivery System
DataMon
Speakers
MPEG Transport Stream
De-Multiplexer
MPEG
DeMux
Transport
Stream

11
− MPEG-2 Container formats (a file format that can contain data compressed by standard codecs)
• TS: Transport Stream (Multiplexed A/V PES and User Data)
• PS: Program Stream
− PES: Packetized Elementary Stream, Audio or Video
− ES: Elementary Streams-Compressed Data
Video
Data
Audio
Data
Elementary
Streams
Video
Encoder
Audio
Encoder
Packetizer
Packetizer
ES
ES
Video
PES
Program
Stream
MUX
Transport
Stream
MUX
Audio
PES
PS: Program Stream
TS: Transport Stream
MPEG-2 Video System Standard
For noisier environments
such as terrestrial
broadcast channels
For an error-free
environment such as
Digital Storage Media
(DSM)

12
MPEG-2 Packetized Elementary Stream (PES)
MPEG-2 Video
Video ES (Elementary Stream)
I0 P3 B1 B2 P6 B3 B4 I9 B7 B8 P12 B B P B B I
I0 B1 B2 P3 B4 B5 P6 B7 B8
Video Frames Frame Frame Frame Frame Frame Frame Frame Frame Frame
MPEG-2 System
Subband Samples
Side
Information
Sync, System
Info. and CRC
Ancillary
Data Field
Audio ES (Elementary Stream)
MPEG-2 System
Audio Tracks
frame frame frame frame frame frame
frame frame frame frame frame frame
MPEG-2 Audio
Video PES (Packetized Elementary Stream)
Audio PES (Packetized Elementary Stream)

Output from MPEG-2 System Encoder:
13
MPEG -2 System
Processor
Elementary Stream (ES):
- Digital Control Stream
- Digital Audio (compressed)
- Digital Video (compressed)
- Digital Data
PES Packet has 6 bytes Protocol Header
• 3 bytes start code
• 1 byte stream ID
– 110x xxxx audio stream number x xxxx
– 1110 yyyy video stream number yyyy
– 1111 0010 DSM-CC (Digital Storage Media) control packet
• 2 bytes length field
Packet Start
Code Prefix
24
Stream ID
8 16
PES
Header
(optional)
PES Packet
PES Packet
Length
Packet Start
Code Prefix
24
Stream ID
8 16
PES
Header
(optional)
PES Packet
PES Packet
Length
(up to 65536 bytes including 6 byte protocol header)

14
PES Packet Syntax Diagram
Packet Start
Code Prefix
24
Stream ID
8 16
PES
Header
(optional)
PES Packet Data Bytes
PES Packet
Length
’10’
PES
Scrambling
Control
PES
Priority
Optional
Fields
7
Flags
Copyright
PES
Header
Length
Data
Alignment
Indicator
Stuffing
Bytes
(0xFF)
Original
or
Copy
2 2 1 81 11
DSM
Trick
Mode
PTS
DTS
PES
Extension
Additional
Copy Info
ES
Rate
ESCR
Previous
PES
CRC1
33 42 22 8 7 16
5 Flags Optional
Fields
Pack
Header
Field
PES
Private
Data
Program
Packet
Seq Counter
P-STD
Buffer
PES
Extension
Length
PES
Extension
Data
128 168 16 7
8 m * 8

Packetized Elementary Stream
• The basic stream format for video, audio, data, ..
• PES offers a mechanism to carry conditional access information
• PES can be scrambled and also assigned priority
• PES can carry time references: PTS and DTS
• The largest data size within a PES packet is 64k Bytes.
PES Indicators
• PES_Priority - Indicates priority of the current PES packet.
• PES_Scrambling_Control - Defines whether scrambling is used, and the chosen scrambling method.
• Data_alignment_indicator - Indicates if the payload starts with a video or audio start code.
• Copyright information - Indicates if the payload is copyright protected.
• Original_or_copy - Indicates if this is the original ES
15

PES Optional Field
− Presentation Time Stamp (PTS) and possibly a Decode Time Stamp (DTS)
• For audio / video streams these time stamps which may be used to synchronize a set of elementary
streams and control the rate at which they are replayed by the receiver.
− Elementary Stream Clock Reference (ESCR)
− Elementary Stream Rate - Rate at which the ES was encoded.
− Trick Mode - indicates the video/audio is not the normal ES, e.g. after DSM-CC has signaled a replay.
− Copyright Information - set to 1 to indicated copyright ES.
− CRC - this may be used to monitor errors in the previous PES packet
− PES Extension Information - may be used to support MPEG-1 streams.
16

It is the central structure used in both PS and TS Streams; results from packetizing continuous streams of
compressed audio or video
− PES packets contain 2 timestamps
1. Decoding Time Stamp (DTS) – this tells the decoder when the packet should be decoded. The
data is then decoded into the bit stream.
2. Presentation Time Stamp (PTS) – this tells the decoder when the data should be displayed.
− The systems part specifies that the decoder must contain a Systems Time Clock (STC: Systems Time Clock).
• When a decoder’s STC is equal to a packet’s DTS the data in the packet is decoded
• When the STC is equal to a packet’s PTS the decoded data is sent to the display device (eg. graphics
card or sound card)
• The state of the encoders clock is placed in the stream at regular intervals. This synchronises the
decoder with the encoder.
17

− Packetising the continuous streams of compressed video and audio bitstreams (elementary streams or ES)
generates PES packets.
− A typical method of transmitting elementary stream data from a video or audio encoder is to first create
PES packets from the elementary stream data and then to encapsulate these PES packets inside Transport
Stream (TS) packets or Program Stream (PS) packets.
− The TS packets can then be multiplexed and transmitted using broadcasting techniques, such as those
used in an ATSC and DVB.
− Simply stringing together PES packets from the various encoders with other packets containing necessary
data to generate a single bitstream generates a programme stream.
− A transport stream consists of packets of fixed length containing 4 bytes of header followed by 184 bytes
of data, where the data are obtained by segmenting the PES packets.
18

19
MPEG-2 Transport Stream (TS)
Multiplexing
Subsystem
Multiplexer
TransportAudio
Compression
Digital
Modulation
Error
Correction
Encoder
Video
Compression
Video
Ancillary data
Audio
Transmission
Subsystem
Control data
Mixer
Video
Subsystem
Audio
Subsystem
ES
ES
ES
ES
TS
ES
Paketizer
ES
Paketizer
ES
Paketizer
PES
PES
PES
PES

…bits bits bits ...
Video or Audio
Elementary Stream (ES)
PES Packet
Header Payload
Packetized
Elementary Stream (PES)
Time stamps
TS Packet (188 bytes)
Header Payload
MPEG 2
Transport Stream (TS)
PID
TS Header, contains PID and clock
PES Header
Rule: Every elementary stream gets its own (Packet ID) PID
20

Program 1
Video 1
PES
Program 2 video 2
PES
Audio 1
PES
Transport Stream
188 Bytes
MPEG-2 Transport Stream (TS) Formation
21

MPEG-2 Transport Stream
Packetizer Packetizer Packetizer Packetizer Packetizer Packetizer Packetizer
Video
Encoder
Audio
Encoder
Video
Encoder
Audio
Encoder
Video
Encoder
Audio
Encoder
Packetizer Packetizer
Program 1
Video_1 Audio_1 Data_1
Program 2 Program 3
Video_2 Audio_2 Data_2 Video_3 Audio_3 Data_3
TRANSPORT MUX
TP1_1 TP2_1 TP1_2 TP2_2 TP3_1 TP1_3 TP2_3 TP3_2
Transport Stream
TP3_3
TP1_1 TP1_2 TP1_3 TP2_1 TP2_2 TP2_3 TP3_1 TP3_2 TP3_3
Transport MuxTransport Mux Transport Mux
MPEG-2 Transport Stream (TS) Formation
22

23
MPEG-2 Transport Stream (TS) Packet
Video
Audio
Teletext
(DVB) SI
Cond. Access
IP Packets
Private Data
Applications
App. Info
Time
Division
Multiplexing
(TDM)
MPEG-2 packets can contain
− Video, Audio, Teletext, Data streaming (13818-1)
− DSM-CC (Digital Storage Media Command and Control): data carousel, object carousel, SI-tables, etc ) (13818-6)
− DVB Data Piping
1 TS Packet
(188 Bytes)
Payload
PES / Section / Piped Data
(( 184-n) Byte )
Header with
PID
( 3 byte )
Adaptation
Field
( n byte )
Sync
( 1 byte )
TS Packets

It significantly differs from MPEG-1.
• It offers robustness for noisy channels
• It offer ability to assemble multiple programmes into a single stream.
• It uses fixed-length packets of size 188 bytes with a new header syntax.
• This packet can be segmented into four 47 bytes to be accommodated in the payload of four ATM
cells, with the AAL1 adaptation scheme.
• It is therefore more suitable for hardware processing and for error correction schemes, such as those
required in television broadcasting, satellite/cable TV and ATM networks.
24

The transport stream uses a fixed packet length (188 bytes)
• This allows easy decoder/encoder synchronisation.
• It also allows error correction codes to be inserted.
Transport Streams can contain packets from a number of Programs
• These can be different TV channels or maybe an EPG.
• Each program has a unique Packet ID placed in the packet header.
• Decoder can discard packets of other programs by checking the PID.
25

− The multiple programmes with independent time bases can be multiplexed in one transport stream.
− The transport stream also allows
• Synchronous multiplexing of programmes
• Fast access to the desired programme for channel hopping
• Multiplexing of programmes with clocks unrelated to transport clock
• Correct synchronization of elementary streams for playback.
• Control of the decoder buffers during start-up and playback for both constant and variable bit rate
(VBR) programmes.
26

Sync
Byte
PID
8 1 1 1 13 2 2
Continuity
Counter
PES 1 PES 2 PES N……..
Adaptation
Field
Transport Error Indicator
Payload Unit Start Indicator
Transport Priority
Flags
51 1 1
Flag
8 3
Optional
Fields
Adaptation
Field
Length 8
Adaptation
Field Extension
Adaptation Field Control
Scrambling Control
PCR
Original OCR
(OPCR)
Private
Data Length
8
Private …..
Data ….
Adaptation Field
Extension Length
Discontinuity Indicator
Random Access Indicator
PES Priority Indicator
188 bytes
Sync
1 byte
Header
3 bytes
TS Payload
184 bytes
4
Splice
Countdown
PCR Fields
TS Header
42 42
27

PID numbers for Program Specific Information (PSI) used for Service Information (SI)
0x0000 PAT Program Association Table
0x0001 CAT Conditional Access Table
0x0002 TSDT Transport Stream Description Table, EI DVB
0x0003-0x000F reserved
0x0010 NIT, ST Network Information Table, Stuffing Table
0x0011 SDT, BAT, ST Service Description Table, Bouquet Association Table, Stuffing Table
0x0012 EIT, ST Event Information Table, Stuffing Table
0x0013 RST, ST Running Status Table, Stuffing Table
0x0014 TDT, TOT, ST Time and Date Table, Time Offset Table, Stuffing Table
0x0015 Network synchronization
0x0016-0x001D reserved for future use
(0x001E DIT Discontinuity Information Table
(0x001F SIT Selection Information Table
188 bytes
Sync
1 byte
Header
3 bytes
Optional
Adaptation Field
X bits
Payload
184 bytes
PID
Packet Identifier
13 bits
28

PID
− Indicates where the data goes
• Allows filtering of packet for non viewed programs
− Does not indicate PES/section or coding type
− Reserved PID
• Some PSI data
• Program Assocation Table (PAT)
• Conditional Acces Table (CAT)
• Transport Stream Description Table (TSDT)
• User-reserved: Other standard bodies (DVB, ATSC, …)
PSI
− Multiplex description
− Program description
− Stream Descirpion
29
Program ID (PID) and Program Service Information (PSI)

30
Program # 100 – PMT PID 1025
Program # 200 – PMT PID 1026
Program # 100
Video PID – 501 – MPEG-2 Video
Audio PID (English) – 502 – MPEG-2 Audio
Audio PID (Spanish) – 503 – MPEG-2 Audio
Program # 200
Video PID – 601 – AVC Video
Audio PID (English) – 602 – AAC Audio
MPEG-2 Signaling Tables

31
Network Information
Bouquet Association
Service Description
Event Information
Running Status
Time & Date
Stuffing

• Identifies a multiplex (ID 16 bits) (The PAT is sent with the well-known PID value of 0x0000)
• Lists all programs (Lists the PIDs of tables describing each program)
─ Program Number (16 bit)
─ PID carrying PMT
• If PID= 0, NIT
• Defines the set of PIDs associated with a program, e.g. audio, video, ...
• PID carrying the PCR
─ Not always a media stream !
• Program Descriptors
─ Protection systems, interactive apps …
• Lists all streams
─ PID: where stream data is carried in the multiplex
─ streamType: type of media compression
─ Stream descriptors
• Language, coding parameters, demux parameters, … 33
Program Associate Table (PAT)
Other PacketsAudio Packet
Video Packet
51 51 51 6664 0 150 101

CAT - Conditional Access Table
− Defines type of scrambling used and PID
values of transport streams which contain
the conditional access management and
entitlement information (EMM).
TSDT- Transport Stream Description Table
− Contains descriptors relating to the overall
transport stream
34

NIT - Network Information Table
− It contains details of the bearer network (network
topology) used to transmit the MPEG multiplex,
including the carrier frequency
Service Description Table (SDT)
− Multiplex Description (channel names, …)
− Editorial description of the services in a TS
− Service names and ancillary services
Event Information Table (EIT)
− Electronic Program Guide for present and
following shows
Time and Date Table (TDT)
− Current date and time, UTC (used to synchronize
STB system time)
35
MPEG-2 Signaling Table (DVB, Mandatory)

Bouquet Association Table (BAT)
− Commercial operator description and services
− Several commercial operators may sell the same
services
Running Status Table (RST)
Stuffing Table (ST)
Time Offset Table (TOT)
− Local offset by region (used to synchronize STB
system time)
Application Information Table (AIT)
− Interactive App signaling (MHP, HbbTV,…)
− Type d’application
IP/MAC Notification Table (INT)
− IP Transport
36
MPEG-2 Signaling Tables (DVB, Optional)

− Scrambling may happen:
• At PES payload level
• At some sections payload level
• At TS packet level
− Most common use case
− PES headers are scrambled
− Exceptions
• PAT: required to get list of programs
• PMT: required to get protection system used
• NIT/TSDT (Transport Stream System Target Decoder): infrastructure management
37
Scrambling in MPEG-2 TS

AV Synchronization
− Want audio and video streams to be played back in sync with each other
− Video stream contains “Presentation Time Stamps (PTS)”
− MPEG-2 clock runs at 90 kHz
• Good for both 25 and 30 fps
− Each program carries a clock
• Program Clock Reference (PCR)
– PCR timestamps are sent with data by sender
• PES Timestamps relate to this clock
− Receiver uses PLL to synchronize clocks
38
MPEG-2 TS Timing

bit
Byte 7 6 5 4 3 2 1 0
Program Clock Reference (PCR) base
The intended time, in 90 kHz clock symbols, of the arrival at the
input of the decoder of the fourth byte of this structure.
(cont.)
reserved
PCR extension. Additional resolution, in 27 MHz clock. PCR = 300*base + ext
PCROPCR
Original PCR (OPCR) base
It should not be modified by any multiplexer or decoder
Used for recovery of single-program PCR from multi-program Transport Stream
(cont.)
reserved
Original PCR extension
PCR
Original PCR
(OPCR)
PCR Fields
42 42
PCR (Program Clock Reference)
39

Program Associate Table (PAT)
Other PacketsAudio Packet
Video Packet
Packet header includes a unique
Packet ID (PID) for each stream
PAT lists PIDs for program map tables
Network Info=10
Prog 1 = 150
Prog 2 = 301
Prog 3 = 511
etc.
Program guides
Subtitles
Multimedia data
Internet Packets
etc.
PMT lists PID associated with a particular program
Video = 51
Audio (English) = 64
Audio (French) = 66
Subtitle = 101
etc.
51 51 51 6664 0 150 101
40

MPEG-2 Example Transport Stream Packet
41
Example Transport Stream Packet
188 Bytes
Header
Flags
• Transport Error Indicator
• Payload Unit Start Indicator
• Transport Priority
• Transport Scrambling Control
Important PIDs
• 0x0000 – PAT PID
• 0x1FFF – “Null PID” gives space
for VBR
Continuity Counter (CC)
• 4-bit per-PID sequence #
• Helps detect packet loss
Adaptation Field (optional)
• Can carry range of other info
• PCR, splice point flags
• Transport of private data
Example Transport Stream
0x47
(sync) Flags
PID
(Payload ID)
More
Flags
CC Adaptation
Field
Data Payload
PID
0
CC
3
PAT
Data
PID
601
CC
11
PID
602
CC
7
PID
0x1FFF
NULL PID
601
CC
12
PID
602
CC
8

MPEG-2/DVB PID Allocation
− Program Association Table (PAT)
• always has PID = 0 (zero)
− Conditional Access Table (CAT)
• always has PID = 1
− Event Information Table (EIT)
• always has PID = 18 (0x0012)
− Program Map Tables (PMTs)
• have the PIDs specified in the PAT
− The audio, video, PCR, subtitle, teletext etc PIDs for all
programs are specified in their respective PMTs
MPEG-2/DVB PID Allocation
Table PID value
PAT 0x0000
CAT 0x0001
TSDT 0x0002
Reserved 0x0003 – 0x000F
NIT, ST 0x0010
SDT, BAT, ST 0x0011
EIT, ST 0x0012
RST, ST 0x0013
TDT, TOT, ST 0x0014
Network
Synchronization
0x0015
Reserved 0x0016 –
0x001B
Inband signaling 0x001C
measurement 0x001D
DIT 0x001E
SIT 0x001F
42

Increase resilience to transmissions errors
− Redundancy
− Reed Solomon 255/191, 25% redundant
− Each RS column is send in a section
− FEC aggregation is in another table
• Can be ignored
• Does not interfere with MPE
Without modifying existing implementations
− No modification on MPE (MPEG Movie File) sections
• Each MPE+IP on a section
• Aggregation of IP datagrams in memory
43
DVB MPE-FEC

44
Data over DVB
− Data piping
• raw transport on a PID
− Data streaming
• send in PES packets
− DSM-CC Data carrousel
• Transport on sections
− Object Carrousel
• Data Carousel + file system
− Multi Protocol Encapsulation (MPE)
• IP datagram over TS
Application

Program 0 PID=16
Program 1 PID=22
Program 2 PID=33
… …
Program M PID=55
PMT (Program Map Table)
for Program 1
CAT (Conditional Access Table) (PID=1)
NAT (Network Information Table)
(always Program 0, PID=16)
NIT is considered a Private data by ISO
Table section ID assigned by systemTable section ID always set to 0x01
Table section ID always set to 0x02 Table section ID always set to 0x00
Stream 1 PCR 31
Stream 2 Video 1 54
Stream 3 Audio 1 48
Stream 3 Audio 2 49
… … …
Stream k Data K 66
PAT (Program Associate Table) (PID=0)
CA Section 1 (Program 1) EMM PID(99)
… …
CA Section k (Program k) EMM PID(x)
Private Section 1 NIT Info.
… …
Private Section k NIT Info.
0 PAT 22 Prog 1. PMT 33 Prog 2. PMT 99 Prog 1 EMM 31 Prog 1 PCR 48 Prog 1 Audio 1 54 Prog 1 Video 1 109 Prog 2 EMM
Multiple-Program MPEG-2 Transport Stream:
PMT (Program Map Table)
for Program 2
Stream 1 PCR 41
Stream 2 Video 1 19
Stream 3 Audio 1 81
Stream 3 Audio 2 82
… … …
Stream k Data K 88
MPEG-2 / DVB PSI (Program Specific Information) Structure

46
Transport Multiplexing & Decoding
Transport
Stream
Demultiplex
and Decoder
Clock
Control
Video
Decoder
Channel
Specific
Decoder
Audio
Decoder
Decoded
Video
Decoded
Audio
Transport stream
containing one or
multiple programs
Transport
Stream
Demultiplex
and Decoder
Channel
Specific
Decoder
Transport Stream
with single program
Program Stream ≠ Transport Stream
Channel
Channel

47
Transport Stream Decoder
Multiplex
Buffer
Video
Decoder
Transport
Buffer
Re-order
Buffer
Decoded
Video
Decoded
Audio
ES Stream
Buffer
Multiplex
Buffer
Transport
Buffer
ES Stream
Buffer
Multiplex
Buffer
Transport
Buffer
ES Stream
Buffer
Audio
Decoder
System
Info.
Decoder
System
Control

− At the receiver, the transport streams are decoded by a transport demultiplexer (which includes a clock
extraction mechanism), unpacketised by a depacketiser and sent to audio and video decoders for
decoding.
− The decoded signals are sent to the receiver buffer and presentation unit, which outputs them to a display
device and a speaker at the appropriate time.
− Similarly, if the programme streams are used, they are decoded by the programme stream demultiplexer
and depacketiser and sent to the audio and video decoders.
− The decoded signals are sent to the respective buffer to await presentation.
− Also similar to MPEG-1 systems, the information about systems timing is carried by the clock reference field
in the bitstream that is used to synchronise the decoder Systems Time Clock (STC).
− Presentation Time Stamps (PTS), which are also carried by the bitstream, control the presentation of the
decoded output.
48

− For a payload of around 19 Mb/s
• 1 HDTV service - sport & high action
• 2 HDTV services - both film material
• 1 HDTV + 1 or 2 SDTV non action/sport
• 3 SDTV for high action & sport video
• 6 SDTV for film, news & soap operas
• However you do not get more for nothing.
− More services means less quality
49
Examples of DVB Data Containers
Single HDTV
program
HDTV 1
SDTV 1
SDTV 2
SDTV 3
SDTV 4
SDTV 5
Multiple SDTV
programs
SDTV 1
HDTV 1
Simulcast HDTV &
SDTV
Channel bandwidth can be used in different ways

50
− MPEG-2 Container formats (a file format that can contain data compressed by standard codecs)
• TS: Transport Stream (Multiplexed A/V PES and User Data)
• PS: Program Stream
− PES: Packetized Elementary Stream, Audio or Video
− ES: Elementary Streams-Compressed Data
Video
Data
Audio
Data
Elementary
Streams
Video
Encoder
Audio
Encoder
Packetizer
Packetizer
ES
ES
Video
PES
Program
Stream
MUX
Transport
Stream
MUX
Audio
PES
PS: Program Stream
TS: Transport Stream
MPEG-2 Video System Standard
For noisier environments
such as terrestrial
broadcast channels
For an error-free
environment such as
Digital Storage Media
(DSM)

51
Program Stream Structure (Simplified)

Program Stream (PS)
− It is similar to the MPEG-1 systems stream but uses a modified syntax and new functions to support
advanced functionalities (e.g. scalability).
− It provides compatibility with the MPEG-1 systems (MPEG-2 should be capable of decoding an MPEG-1
bitstream.
− Like the MPEG-1 decoder, programme stream decoders typically employ long- and variable-length
packets. Such packets are well suited for software-based processing and error free transmission
environments ( such as storage, disk).
− The packet sizes are usually 1–2 kbytes long, chosen to match the disc sector sizes (typically 2 kbytes).
− However, packet sizes as long as 64 kbytes are also supported.
52
MPEG-2 Systems

53
MPEG-2 Systems
Program Stream (PS)
− It includes features not supported by MPEG-1 systems.
• Scrambling of data
• Assignment of different priorities to packets
• Information to assist alignment of elementary stream packets
• Indication of copyright
• Indication of fast forward, fast reverse and other trick modes for storage devices.
• An optional field in the packets is provided for testing the network performance
• Optional numbering of a sequence of packets is used to detect lost packets.

54
Video Source
Decompress
(Decode)
Compress
(Encode)
Video Display
Coded
video

− H.263 standardization effort started Nov 1993 (finalization:1995)
− The primary goal in the H.263 standard codec was coding of video at low or very low bit rates (less than 64
kbps) for applications such as mobile networks, public switched telephone network (PSTN) and the
narrowband Integrated Services Digital Network (ISDN).
− Later on, the codec was found so attractive that higher resolution pictures could also be coded at
relatively low bit rates.
− The standard recommends operation on five standard pictures of the CIF family, known as sub- QCIF,
QCIF, CIF, 4CIF and 16CIF.
− The H.263+ (H.263 Ver. 2) was the first set of extensions to this family, which was intended for near-term
standardisation of enhancements of H.263 video coding algorithms for real-time telecommunications.
− Work on improving the encoding performance was an ongoing process under H.263++ (H.263 Ver. 3), and
every now and then a new extension called annex was added to the family.
55
H.263, H.263+ and H.263++ Standard

− The codec for long-term standardisation was called H.26L.
− The H.26L project had the mandate from ITU-T to develop a very low bit rate (less than 64 kbit/s with emphasis on
less than 24 kbit/s) video coding recommendation achieving
• Better Video Quality
• Lower Delay
• Lower Complexity
• Better Error Resilience
− In 2001, MPEG-4 committee joined the project in investigating new video coding techniques and technologies as
candidates for recommendation.
− The joint team eventually recommended the Joint Video Team (JVT) Codec which is informally known as
Advanced Video Coding (AVC).
− The standard is formally known as H.264 by the ITU-T and MPEG-4 part 10 by ISO/IEC.
56
H.26L Standard

− H.263 is a combination of H.261 and MPEG
− H.261 only accepts QCIF and CIF format → Various picture formats such as sub-QCIF,4CIF, etc.
− No 1/2 pel motion estimation in H.261, instead it uses a spatial loop filter → Half-pel motion compensation
− H.261 does not use median predictors for motion vectors but simply uses the motion vector in the MB to
the left as predictor.
− In H.263 there are four negotiable options
− H.261 does not use a 3-D VLC for transform coefficient coding → 3D VLC for transform coefficients
− GOB headers are mandatory in H.261.
− Quantizer changes at MB granularity requires 5 bits in H.261 and only 2 bits in H.263.
− No loop filter in H.263
− No macroblock addressing in H.263 (include in MB header)
57
H.263 Improvements over H.261

Unrestricted Motion Vector Mode (Annex D)
–MVs are allowed to point outside (outside pixels obtained from boundary repetition extension)
–Larger ranges: [-31.5, 31.5] instead of [-16, 15.5]
Syntax-Based Arithmetic Coding Mode (Annex E)
–Provide about 5% bit rate reduction and rarely used
Advanced Prediction Mode (Annex F)
–Allow 4 motion vectors per MB, one for each 8x8 block
–Overlapped block motion compensation for luminance
–Allow MVs point outside of picture (Motion vectors can now point to outside of picture).
–Reduce blocking artifacts and increase subjective picture quality.
PB-Frames Mode (Annex G)
–Double the frame rate without significant increase in bit rate
Usage:
– The decoder signals the encoder which of the options it has the capability to decode.
– If the encoder supports some of these options, it may enable them.
58
Negotiable Options in H.263

H.261 H.263
Demo: QCIF, 8 fps @ 28 Kb/s
59

Composed of a baseline plus four negotiable options
60
ITU-T Recommendation H.263
Baseline Codec
Unrestricted/Extended Motion Vector Mode
Advanced Prediction Mode
PB Frames Mode
Syntax-based Arithmetic
Coding Mode

Always 12:11 pixel aspect ratio.
61
Frame Formats
Format Y U,V
SQCIF 128x96 64x48
QCIF 176x144 88x72
CIF 352x288 176x144
4CIF 704x576 352x288
16CIF 1408x1152 704x576
352
288
Pixel
12:11
Picture
4:3

Picture & Macroblock Types
− Two picture types:
• Intra (I-frame) implies no temporal prediction is performed.
• Inter (P-frame) may employ temporal prediction.
− Macroblock (MB) types:
 Intra & Inter MB types (even in P-frames).
• Inter MBs have shorter symbols in P frames
• Intra MBs have shorter symbols in I frames
 Not coded MB types- MB data is copied from previous decoded frame.
62
H.263 Baseline

Motion Vectors
− Motion vectors have 1/2 pixel granularity.
− Reference frames must be interpolated by two.
− MV’s are not coded directly, but rather a median predictor is used.
− The predictor residual is then coded using a VLC table.
63
H.263 Baseline
X
CB
A CBAX MVMVMVMVMV ,,median

Motion Vector Delta (MVD) Symbol Lengths
64
H.263 Baseline
0
2
4
6
8
10
12
14
0 0.5 1 1.5 2 2.5 -
3.5
4.0 -
5.0
5.5 -
12.0
12.5-
15.5
MVD Absolute Value
Codelengthinbits

Transform Coefficient Coding
− Assign a variable length code according to three parameters (3-D VLC):
1) Length of the run of zeros preceding the current nonzero coefficient.
2) Amplitude of the current coefficient.
3) Indication of whether current coefficient is the last one in the block.
− The most common are variable length coded (3-13 bits), the rest are coded with escape sequences (22
bits)
65
H.263 Baseline

Quantization
− H.263 uses a scalar quantizer with center clipping.
− Quantizer varies from 2 to 62, by 2’s.
− Can be varied ±1, ±2 at macroblock boundaries (2 bits)
− Can be varied 2-62 at row and picture boundaries (5 bits).
66
H.263 Baseline
Q
-Q
2Q
-2Q
IN
OUT

Bit Stream Syntax
67
H.263 Baseline
Hierarchy of three layers.
Picture Layer
GOB* Layer
MB Layer
*A GOB is usually a row of macroblocks, except
for frame sizes greater than CIF.
Picture Hdr GOB Hdr MB MB ... GOB Hdr ...

Picture Layer Concepts
− PSC - sequence of bits that can not be emulated anywhere else in the bit stream.
− TR - 29.97 Hz counter indicating time reference for a picture.
− PType - Denotes Intra, Inter-coded, etc.
− P-Quant - Indicates which quantizer (2…62) is used initially for the picture.
68
H.263 Baseline
Picture Start
Code
Temporal
Reference
Picture
Type
Picture
Quant

GOB Layer Concepts, GOB Headers are Optional
− GSC - Another unique start code (17 bits).
− GOB Number - Indicates which GOB, counting vertically from the top (5 bits).
− GOB Quant - Indicates which quantizer (2…62) is used for this GOB (5 bits).
GOB can be decoded independently from the rest of the frame
69
H.263 Baseline
GOB Start
Code
GOB
Number
GOB
Quant

Macroblock Layer Concepts
− COD - if set, indicates empty Inter MB.
− MB Type - indicates Inter, Intra, whether MV is present, etc.
− CBP - indicates which blocks, if any, are empty.
− DQuant - indicates a quantizer change by +/- 2, 4.
− MV Deltas - are the MV prediction residuals.
− Transform coefficients - are the 3-D VLC’s for the coefficients.
70
H.263 Baseline
Coded
Flag
MB
Type
Code Block
Pattern
MV
Deltas
Transform
Coefficients
DQuant
8x8 pixel blocks
macroblock
Y
Cb Cr

Deblocking Filter
71
H.263 Options
No Filter Deblocking Loop Filter

Unrestricted/Extended Motion Vector Mode (UMV Mode)
1. Motion Vectors Over Picture Boundaries
− UMV dramatically improves motion estimation when moving objects
are entering/exiting the frame or moving around the frame border)
− Motion vectors are permitted to point outside the picture boundaries
– Non-existent pixels are created by replicating the edge pixels (When a
pixel referred to by motion vector points to outside of coded area, last full
pixel inside the coded picture area is used).
– Motion vector restricted such that no pixel of 16x16 (or 8x8) block shall
have horizontal or vertical distance more than 15 pixels outside of
picture.
– Improves compression when there is movement across the edge of a
picture boundary or when there is camera panning.
72
H.263 Options
Target Frame NReference Frame N-1
Edge pixels
are repeated.

Unrestricted/Extended Motion Vector Mode
2. Extended MV Range
− To extend the range of the motion
vectors from [-16,15.5] to [-31.5,31.5] with
some restrictions.
− This better addresses high motion scenes.
73
H.263 Options
15.5
15.5
-16
-16
-16
-16
15.5
15.5 (31.5,31.5)
Base motion vector range.
Extended motion vector range,
[-16,15.5] around MV predictor.

− The motion compensation in the core H.263 is based on one motion vector per macroblock of 16×16 pixels,
with half-pixel precision.
− The macroblock motion vector is then differentially coded with predictions taken from three surrounding
macroblocks, as indicated in Figure.
74
H.263 Options
MV: Current Motion Vector
MV1: Previous Motion Vector
MV2: Above Motion Vector
MV3: Above Right Motion Vector
MV2 MV3
MV1 MV

− The predictors are calculated separately for the horizontal and vertical components of the motion vectors,
MV1, MV2 and MV3.
− For each component, the predictor is the median* value of the three candidate predictors for this
component:
− The difference between the components of the current motion vector and their predictions is variable
length coded. The vector differences are defined by
75
H.263 Options

− In the special cases, at the borders of the current group of blocks (GOB) or picture, the following decision
rules are applied in order:
• The candidate predictor MV1 is set to zero if the corresponding macroblock is outside the picture at the left side .
• The candidate predictors MV2 and MV3 are set to MV1 if the corresponding macroblocks are outside the picture at
the top, or if the GOB header of the current GOB is nonempty.
• The candidate predictor MV3 is set to zero if the corresponding macroblock is outside the picture at the right side.
• When the corresponding macroblock is intra coded or was not coded, the candidate predictor is set to zero.
− Like unrestricted motion vector mode, motion vectors can refer to the area outside the picture
76
H.263 Options
MV: Current Motion Vector
MV1: Previous Motion Vector
MV2: Above Motion Vector
MV3: Above Right Motion Vector
MV2 MV3
MV1 MV
Picture or GOB border
MV2 MV3
(0,0) MV
MV1 MV1
MV1 MV
MV2 (0,0)
MV1 MV

− Includes motion vectors across picture boundaries from the previous mode.
− Option of using four motion vectors for 8x8 blocks instead of one motion vector for 16x16 blocks as in
baseline.
• In H.263, one motion vector per macroblock is used except in the advanced prediction mode, where
either one (four vectors with the same value) or four motion vectors per macroblock are employed.
• When there are four motion vectors, the information for the first motion vector is transmitted as the
code word motion vector data (MVD), and the information for the three additional vectors in the
macroblock is transmitted as the code word MVD2–4.
− Overlapped motion compensation to reduce blocking artifacts.
77
H.263 Options

Four motion vectors for 8x8 blocks instead of one motion vector for 16x16 blocks.
− The vectors are obtained by adding predictors to the vector differences indicated by MVD and MVD2–4,
as was the case when only one motion vector per macroblock was present.
− The predictors are calculated separately for the horizontal and vertical components.
− However, the candidate predictors MV1, MV2 and MV3 are redefined as indicated in Figure.
− The neighbouring 8×8 blocks that form the candidates for the prediction of the motion vector MV take
different forms depending on the position of the block in the macroblock.
78
H.263 Options
• Redefinition of the candidate predictors MV1, MV2 and MV3
for each luminance block in a macroblock.
• Motion vector prediction for 8x8 blocks used three
surrounding block motion vectors MV2
MV1
MV3
MV
MV2
MV1
MV3
MV
MV2
MV1
MV3
MV
MV2
MV1
MV3
MV

Overlapped Motion Compensation (OBMC)
− In normal motion compensation, the current block is composed of
• The predicted block from the previous frame (referenced
by the motion vectors)
• The residual data transmitted in the bit stream for the
current block.
− Overlapped motion compensation is only used for the 8×8
luminance blocks.
− Each pixel in an 8×8 luminance prediction block is the weighted
sum of three prediction values, divided by 8 (with rounding).
79
H.263 Options
Reference frame
Current MB

− To obtain the prediction values, three motion vectors are
used. They are the motion vector of the current luminance
block and two out of four remote vectors, as follows:
• the motion vector of the block at the left or right
side of the current luminance block;
• the motion vector of the block above or below the
current luminance block.
80
H.263 Options

− Let (m, n) be the column & row indices of an 88 pixel block in a frame.
− Let (i, j) be the column & row indices of a pixel within an 88 block.
− Let (x, y) be the column & row indices of a pixel within the entire frame
(𝒙, 𝒚) = (𝒎𝟖 + 𝒊, 𝒏𝟖 + 𝒋)
81
H.263 Options
B
88 pixel block
n, block column number
m, block row number
y, pixel column number
x, pixel row number
j, pixel column number
i, pixel row number

• Let (MV0
x,MV0
y) denote the motion vectors for the current block.
• Let (MV1
x,MV1
y) denote the motion vectors for the block above (below) if the
current pixel is in the top (bottom) half of the current block.
• Let (MV2
x,MV2
y) denote the motion vectors for the block to the left (right) if the
current pixel is in the left (right) half of the current block.
82
H.263 Options
MV0
MV1
MV1
MV2 MV2Current
Block
Right
Block
Below
Block

• The creation of each interpolated (overlapped) pixel, p(i, j), in an 8×8 reference luminance block is
governed by
𝑷(𝒙, 𝒚) = (𝒒(𝒙, 𝒚) 𝑯 𝟎(𝒊, 𝒋) + 𝒓(𝒙, 𝒚) 𝑯 𝟏(𝒊, 𝒋) + 𝒔(𝒙, 𝒚) 𝑯 𝟐(𝒊, 𝒋) + 𝟒)/𝟖
− Where,
𝒒 𝒙, 𝒚 = 𝒙 + 𝑴𝑽 𝟎
𝒙, 𝒚 + 𝑴𝑽 𝟎
𝒚
𝒓 𝒙, 𝒚 = 𝒙 + 𝑴𝑽 𝟏
𝒙, 𝒚 + 𝑴𝑽 𝟏
𝒚
𝒔(𝒙, 𝒚) = (𝒙 + 𝑴𝑽 𝟐
𝒙, 𝒚 + 𝑴𝑽 𝟐
𝒚)
83
H.263 Options
4 5 5 5 5 5 5 4
5 5 5 5 5 5 5 5
5 5 6 6 6 6 5 5
5 5 6 6 6 6 5 5
5 5 6 6 6 6 5 5
5 5 6 6 6 6 5 5
5 5 5 5 5 5 5 5
4 5 5 5 5 5 5 4
1 2 2 2 2 2 2 1
1 1 2 2 2 2 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 2 2 2 2 1 1
1 2 2 2 2 2 2 1
𝑯 𝟐 𝒊, 𝒋 = (𝑯 𝟏 𝒊, 𝒋 ) 𝑻
𝑯 𝟎 𝒊, 𝒋 = 𝑯 𝟏 𝒊, 𝒋 =

84
H0(i, j)
Weighting values for prediction with
motion vector of current block
H2(i, j)
Weighting values for prediction with motion vectors of
luminance blocks to the left or right of current luminance block
Right of the current block Left of the current block
H1(i, j)
Weighting values for prediction with motion vectors of the
luminance blocks on top or bottom of the current luminance block
Bottom of the current block
Top of the current block
The neighbouring pixels closer to the pixels in the current block take greater weights.
H.263 Options

PB Frames Mode
− A PB frame consists of two P- and B-pictures coded as one unit (coded together) (a P frame as in
baseline, and a B frame)
− The P-picture is predicted from the last decoded P-picture, and the B-picture is predicted from both
the last decoded P-picture and the P-picture currently being decoded (The prediction process is
illustrated in Figure).
− Can increase frame rate 2X with only about 30% increase in bit rate (because of B-frame).
− Since in the PB frames mode a unit of coding is a combined macroblock from P- and B-pictures, the
composite macroblock comprises 12 blocks.
− First the data for the six P-blocks are transmitted as the default H.263 mode, and then the data for the
six B-blocks.
− The composite macroblock may have various combinations of coding status for the P- and B-blocks,
which are dictated by the MCBPC.
85
H.263 Options

Best match
Forward Motion Vector
Macroblock to be coded
Previous reference picture
Current B-picture
Future reference picture
Best match
Backward Motion Vector
86
Forward Motion Vector and Backward Motion Vector, Recall
Forward Prediction
Backward Prediction

PB Frames Mode
87
H.263 Options
Restriction: the backward predictor cannot extend outside the current MB position of the future frame.
Picture 1
P Frame
(decoded P-picture) Picture 2
B Frame
Picture 3
P Frame
(current P-picture)
V 1/2 -V 1/2
PB
Forward Motion Vector Backward Motion Vector
Forward Prediction Backward Prediction
Forward Prediction

P B P
PB frame
TRB
TRD
𝑀𝑉 𝐹 =
𝑇𝑅 𝐵𝑀𝑉
𝑇𝑅 𝐷
+ 𝑀𝑉𝐷
𝑀𝑉 𝐵 =
𝑇𝑅 𝐵 − 𝑇𝑅𝐷 𝑀𝑉
𝑇𝑅 𝐷
𝑖𝑓 𝑀𝑉𝐷 𝑖𝑠 𝑒𝑞𝑢𝑎𝑙 𝑡𝑜 0
𝑀𝑉 𝐵 =
𝑀𝑉𝐹 −
𝑀𝑉 𝑖𝑓 𝑀𝑉𝐷 𝑖𝑠 𝑛𝑜𝑡 𝑒𝑞𝑢𝑎𝑙 𝑡𝑜 0
H.263 Options
PB Frames Mode
P-picture is predicted from the previous decoded P-picture
B-picture is predicted both from the previous decoded P-picture and the P picture currently being decoded.
𝑀𝑉 𝐹 𝑀𝑉 𝐵
𝑀𝑉
Assume: 𝑀𝑉 𝐷 is the delta vector component given
by the motion vector data of a B-picture (MVDB)
and corresponds to the vector component MV.
Forward and Bi-directional Prediction in B-block
Part of the block that is predicted bidirectionally
part that uses only forward prediction
FWD: Forward prediction
BID: Bidirectional prediction
P-Macroblock
BID
FWD
B-lock
88

Improved PB frames (BPB)
− This mode is an improved version of the optional PB frames mode of H.263 [22-M].
− Most parts of this mode are similar to the PB frames mode, the main difference being that in the
improved PB frames mode, the B part of the composite PB-macroblock, known as BPB-macroblock,
may have a separate motion vector for forward and backward prediction.
− This is in addition to the bidirectional prediction mode that is also used in the normal PB frames mode.
− Hence, there are three different ways of coding a BPB-macroblock, and the coding type is signalled
by the MVDB parameter.
• Bidirectional prediction
• Forward prediction
• Backward prediction
89
H.263 Options
Picture 1
P Frame
(decoded P-picture) Picture 2
B Frame
Picture 3
P Frame
(current P-picture)
V 1/2 -V 1/2
PB
Forward Motion Vector Backward Motion Vector
Forward
Prediction
Backward
Prediction
Forward Prediction

Syntax-based Arithmetic Coding Mode
− In encoding, a symbol is encoded by a specific array of integers (model) based on syntax and by calling
a encode_a_symbol (index, cumul_freq).
− A FIFO buffers the bits from arithmetic encoder.
− In decoding, a symbol is decoded by a specific model based on syntax and by calling decode_a_symbol
(cumul_freq).
− Syntax of top 3 layers: Picture, Group-of-Blocks and Macroblock remains the same, but that of block is
modified.
90
H.263 Options

Syntax-based Arithmetic Coding Mode
− In this mode, all the variable length coding and decoding of baseline H.263 is replaced with arithmetic
coding/decoding.
− This removes the restriction that each sumbol must be represented by an integer number of bits, thus
improving compression efficiency.
− Experiments indicate that compression can be improved by up to 10% over variable length
coding/decoding.
− Complexity of arithmetic coding is higher than variable length coding, however.
91
H.263 Options

92
Video Source
Decompress
(Decode)
Compress
(Encode)
Video Display
Coded
video

Enhance H.263 with additional options (Draft 20, Sept. ’97)
Coding efficiency
• Advanced intra coding mode
• Deblocking filter mode
• Improved PB-frames mode
• Reference picture resampling mode
• Alternative inter VLC mode
• Modified quantization mode
Error robustness
• Slice structured mode
• Referenced picture selection mode
• Independently segmented decoding mode
Enhanced Communication
• Temporal, SNR, and spatial scalability mode
• Reduced-resolution update mode
93
H.263 Ver. 2 (H.263+)

− H.263+ was standardized in January, 1998.
− The expected enhancements of H.263+ over H.263 fall into two basic categories:
• enhancing quality within existing applications;
• broadening the current range of applications.
− Adds negotiable options and features while still retaining a backwards compatibility mode.
− A few examples of the enhancements are as follows:
• improving perceptual compression efficiency;
• reducing video coding delay;
• providing greater resilience to bit errors and data losses.
94
H.263 Ver. 2 (H.263+)

Annex I: Advanced Intra Coding mode
Annex J: Deblocking Filter mode
Annex K: Slice Structured mode
Annex L: Supplemental Enhancement Information Specification
Annex M: improved PB Frame mode
Annex N: Reference Picture Selection mode
Annex O: Temporal, SNR, and Spatial Scalability mode
Annex P: Reference Picture Resampling
Annex Q: Independent Segment Decoding mode
Annex R: Independent Segment Decoding mode
Annex S: Alternative Inter VLC mode
Annex T: Modified Quantization mode
96
H.263+ (v2) Optional Tools

− In addition to the multiples of CIF, H.263+ permits
• any frame size from 4x4 to 2048x1152 pixels in increments of 4.
− Besides the 12:11 pixel aspect ratio (PAR), H.263+ supports
• Square (1:1)
• 525-line 4:3 picture (10:11)
• CIF for 16:9 picture (16:11)
• 525-line for 16:9 picture (40:33)
• and other arbitrary ratios
− In addition to picture clock frequencies of 29.97 Hz (NTSC), H.263+ supports
• 25 Hz (PAL)
• 30 Hz
• and other arbitrary frequencies
97
Arbitrary Frame Size, Pixel Aspect Ratio, Clock Frequency

98
Level 1 Level 2 Level 3
Advanced INTRA Coding Yes Yes Yes
Deblocking Filter Yes Yes Yes
Supplemental Enhancement Information (Full-Frame Freeze
Only)
Yes Yes Yes
Modified Quantization Yes Yes Yes
Unrestricted Motion Vectors No Yes Yes
Slice Structured Mode No Yes Yes
Reference Picture Resampling (Implicit Factor-of-4 Mode Only) No Yes Yes
Advanced Prediction No No Yes
Improved PB-frames No No Yes
Independent Segment Decoding No No Yes
Alternate INTER VLC No No Yes
H.263v2 specified a set of recommended modes in an informative appendix (Appendix II, since deprecated)
The prior informative Appendix II (recommended optional enhancement) was obsoleted by the creation of the normative Annex X.
H.263 Ver. 2 (H.263+)

− In this mode, either the DC coefficient, 1st column, or 1st row of coefficients are predicted from
neighboring blocks (Dc only, Vertical DC & AC, Horizontal DC &AC)
− Prediction is determined on a MB-by-MB basis.
− Essentially DPCM of Intra DCT coefficients.
− Can save up to 40% of the bits on Intra frames.
− A separate VLC table for intra DCT
− Modified quantization for intra coefficients
− Spatial prediction of DCT coefficients
99
Advanced Intra Coding Mode
Three neighboring blocks to the DCT domain
u 0 1 2 3 4 5 6 7
Block A
Rec A(u, v)
Block C
Rec C(u, v)
v
0
1
2
3
4
5
6
7
Block B
Rec B(u, v)
Index Prediction mode Code
0 0
(DC Only)
0
1 1
(Vertical DC & AC)
10
2 2
(Horizontal DC & AC)
11

− At very low bit rates, the block of pixels is mainly made of low-frequency DCT coefficients.
− In these areas, when there is a significant difference between the DC levels of the adjacent blocks, they
appear as block borders.
− The overlapped block matching motion compensation to some extent reduces these blocking artefacts.
− For further reduction in the blockiness, the H.263 specification recommends deblocking of the picture
through the block edge filter.
− The Deblocking Filter mode improves subjective quality by removing blocking and mosquito artifacts
common to block-based video coding at low bit rates.
100
Deblocking Filter Mode

− Deblocking Filter Mode introduces a deblocking filter inside the coding loop.
− Unlike in post-filtering, predicted pictures are computed based on filtered versions of the previous ones.
− Like the Advanced Prediction mode of H.263, the Deblocking Filter mode involves using four motion
vectors per macroblock.
− The filtering is performed on 8×8 block edges and assumes that 8×8 DCT is used and the motion vectors
may have either 8×8 or 16×16 resolution.
− Filtering is equally applied to both luminance and chrominance data.
− No filtering is permitted on the frame and slice edges.
101

− Consider four pixels A, B, C and D on a line (horizontal or
vertical) of the reconstructed picture, where A and B
belong to block 1 and C and D belong to a
neighbouring block 2, which is either to the right of or
below block 1.
− It Filters pixels along block boundaries while preserving
edges in the image content.
− Filter is in the coding loop which means, it filters the
decoded reference frame used for motion
compensation.
− It can be used in conjunction with a post-filter to further
reduce coding artifacts.
102
A B C D
A
B
C
D
block2block1
block1
Blockboundary
Example for filtered pixels on a
horizontal Block edge (Filtered pixels
on a vertical block edge)
Example for filtered pixels on a vertical
Block edge (Filtered pixels on a
horizontal block edge)
Block boundary

Deblocking Filter
− To turn the filter on for a particular edge, either block 1 or block 2 should be an intra or a coded
macroblock with the code COD =0.
− A, B, C and D are replaced by new values, A1, B1, C1, and D1 based on a set of non-linear equations.
− The strength of the filter is proportional to the quantization strength.
− The sign of d1 is the same as the sign of d.
103
H.263 Options
A B C D
A
B
C
D
block2block1
block1

Deblocking Filter
− Figure shows how the value of d1 changes with d and the quantiser parameter QP, to make sure that only
block edges which may suffer from blocking artefacts are filtered and not the natural edges.
− As a result of this modification, only the pixels on the edge are filtered so that their luminance changes are
less than the quantisation parameter, QP.
104
H.263 Options
d1 as a function of d

− To turn the filter on for a particular edge, either block 1 or block 2 should be an intra or a coded
macroblock with the code COD =0.
− A, B, C and D are replaced by new values, A1, B1, C1, and D1 based on a set of non-linear equations.
− The strength of the filter is proportional to the quantization strength.
𝑩𝟏 = 𝒄𝒍𝒊𝒑 (𝑩 + 𝒅𝟏) 𝑪𝟏 = 𝒄𝒍𝒊𝒑 (𝑪 − 𝒅𝟏)
𝑨𝟏 = 𝑨 − 𝒅𝟐 𝑫𝟏 = 𝑫 + 𝒅𝟐
𝒅𝟏 = 𝑭𝒊𝒍𝒕𝒆𝒓 [
𝑨 − 𝟒𝑩 + 𝟒𝑪 − 𝑫
𝟖
, 𝑺𝒕𝒓𝒆𝒏𝒈𝒕𝒉 𝑸𝑼𝑨𝑵𝑻 ] 𝒅𝟐 = 𝒄𝒍𝒊𝒑 𝒅𝟏
𝑨 − 𝑫
𝟒
,
𝒅𝟏
𝟑
𝑭𝒊𝒍𝒕𝒆𝒓(𝒙, 𝑺𝒕𝒓𝒆𝒏𝒈𝒕𝒉) = 𝑺𝑰𝑮𝑵(𝒙) ∗ (𝑴𝑨𝑿(𝟎, 𝒂𝒃𝒔(𝒙) − 𝑴𝑨𝑿(𝟎, 𝟐 ∗ ( 𝒂𝒃𝒔(𝒙) − 𝑺𝒕𝒓𝒆𝒏𝒈𝒕𝒉))))
105

− The Deblocking Filter mode improves subjective quality by removing blocking and mosquito artifacts common to
block-based video coding at low bit rates.
− Many applications make use of a post filter to reduce these artifacts.
− The post-filtering is useful in error-free and error-prone environments.
− This post filter is usually present at the decoder and is outside the coding loop. Therefore, prediction is not based
on the post filtered version of the picture.
− The one-dimensional version of the filter will be described.
− To obtain a two-dimensional effect, the filter is first used in the horizontal direction and then in the vertical
direction.
− The post filter is applied to all pixels within the picture.
− Edge pixels should be repeated when the filter is applied at picture boundaries.
106
Post-Filter

− The pixels A, B, C, D, E, F, G, (H) are aligned horizontally or
vertically.
− The post-filter strength is proportional to the quantization:
Strength(QUANT)
− The Strength1 and Strength2 may be different to better adapt
the total filter strength to QUANT.
− The Strength1, 2 may be related to QUANT for the macroblock
where D belongs or to some average value of QUANT over
parts of the picture or over the whole picture.
107
Post-Filter
𝑫𝟏 = 𝑫 + 𝑭𝒊𝒍𝒕𝒆𝒓
𝑨 + 𝑩 + 𝑪 + 𝑬 + 𝑭 + 𝑮 − 𝟔𝑫
𝟖
, 𝑺𝒕𝒓𝒆𝒏𝒈𝒕𝒉𝟏
when filtering in the first direction
𝑫𝟏 = 𝑫 + 𝑭𝒊𝒍𝒕𝒆𝒓
𝑨 + 𝑩 + 𝑪 + 𝑬 + 𝑭 + 𝑮 − 𝟔𝑫
𝟖
, 𝑺𝒕𝒓𝒆𝒏𝒈𝒕𝒉𝟐
when filtering in the second direction
The relation between Strength1, 2 and QUANT

108
Deblocking Loop Filter Demo
No Filter Deblocking Loop Filter

109
Deblocking Loop Filter and Post Filter Demo
Deblocking Loop Filter and Post FilterNo Filter

110
No Filter
Loop Filter Only
Deblocking Loop Filter and Post Filter

111
No Filter
Deblocking Loop Filter and TMN-8 Post FilterDeblocking Loop Filter Only
TMN-8 Post Filter Only
sequenceForeman24Kbps,10fps
TMN-8:VideoCodecTestModel,near-term,Version8(TMN8)
− The deblocking filter alone reduces
blocking artifacts significantly, mainly
due to the use of four motion vectors
per macroblock.
− The filtering process provides
smoothing, further improving
subjective quality.
− The effects of the post filter are less
noticeable, and adding the post filter
may actually result in blurriness.
− Therefore, the use of the deblocking
filter alone is usually sufficient.

− Allows insertion of resynchronization markers at macroblock boundaries to improve network packetization
and reduce overhead. More on this later
• Allows more flexible tiling of video frames into independently decodable areas to support “view
ports”, a.k.a. “local decode.”
• Improves error resiliency by reducing intra-frame dependence.
• Permits out-of-order transmission to reduce latency.
112
Slice Structured Mode

113
Slice Structured Mode
Slice
Boundaries
No INTRA or MV Prediction Across Slice Boundaries.
Slices Start And End on Macroblock Boundaries.
Slice
Boundaries
No INTRA or MV Prediction Across Slice Boundaries.
Slice Sizes Remain Fixed Between INTRA Frames.

Backwards compatible with H.263 but permits indication of supplemental information for features such as:
• Partial and full picture freeze requests
• Partial and full picture snapshot tags
• Video segment start and end tags for off-line storage
• Progressive refinement segment start and end tags
• Chroma keying info for transparency
• The Chroma Keying Information Function (CKIF) indicates that the "chroma keying" technique is used to represent
"transparent" and "semi-transparent" pixels in the decoded video pictures.
• When being presented on the display, "transparent" pixels are not displayed.
• Instead, a background picture which is either a prior reference picture or is an externally controlled picture is revealed.
• Semitransparent pixels are displayed by blending the pixel value in the current picture with the corresponding value in
the background picture.
114
Supplemental Enhancement Information

− Resampling of a temporally previous reference picture prior to its use as a reference for encoding,
enabling global motion compensation, predictive dynamic resolution conversion, predictive picture area
alteration and registration, and special-effect warping;
− Allows frame size changes of a compressed video sequence without inserting an Intra frame (No Intra
frame required when changing video frame sizes).
− Permits the warping of the reference frame via affine transformations to address special effects such as
zoom, rotation, translation.
− Can be used for emergency rate control by dropping frame sizes adaptively when bit rate get too high.
115
Reference Picture Resampling

− Specifies generalized method applied to previous reference picture to generate warped picture for use in
predicting current picture
− Special case of factor of 4 resampling, which converts horizontal and vertical size by factor of 2
(upsampling) or ½(downsampling) in each direction.
116
Reference Picture Resampling
Pixel positions of the reference picture
Pixel positions of the downsamped predicted picture
a=(A+B+C+D+1+RCRPR)/4
.
.
Downsampling
a
A B
C D
Pixel positions of the reference picture
Pixel positions of the upsamped predicted picture
a=(9A+3B+3C+D+7+RCRPR)/16
b=(3A+9B+C+3D+7+RCRPR)/16
c=(3A+B+9C+3D+7+RCRPR)/16
d=(A+3B+3C+9D+7+RCRPR)/16
a
c
b
d
A B
C D
Upsampling

− Specify arbitrary warping parameters via displacement vectors from corners.
− For source format changes
− Global motion compensation
− Special-effect warping
117
Reference Picture Resampling with Warping
MV00
MV10 MV11
MV01

No Intra frame required when changing video frame sizes
118
Reference Picture Resampling Factor of 4 Size Change
P P P P P

− Allows more flexibility in adapting quantizers on a macroblock by macroblock basis, by
enabling large quantizer changes through the use of escape codes.
− A mode which improves the control of the bit rate by changing the method for controlling the
quantizer step size on a macroblock basis.
− Reduces quantizer step size for chrominance blocks, compared to luminance blocks to
reduces the prevalence of chrominance artifacts .
− Modifies the allowable DCT coefficient range to avoid clipping, yet disallows illegal
coefficient/quantizer combinations.
− Increases the range of representable DCT coefficient values for use with small quantizer step
sizes, and increases error detection performance and reduces decoding complexity by
prohibiting certain unreasonable coefficient representations.
119
Modified Quantization (MQ)

− Allow modification of the quantizer at macroblock layer to any value, not limited to +1, -1, +2, and -2.
• DQUANT uses 2 bits (started with “1”) to specify small changes.
− It uses 6 bits (started with “0”) to specify other changes.
− Codeword: 0xxxxx, where the last 5 bits specify the new QUANT value.
120
Change of QUANT
Prior QUANT DQUANT = 10 DQUANT = 11
1 +2 +1
2- 10 - 1 +1
11- 20 - 2 +2
21- 28 - 3 +3
29 - 3 +2
30 - 3 +1
31 - 3 - 5

− Enhance chrominance quality by a finer quantizer.
− Improve picture quality by extending the range of representable quantized DCT coefficients, not limited
by [-127, +127].
121
Range of QUANT Value of QUANT_C
1- 6 QUANT_C = QUANT
7- 9 QUANT_C = QUANT - 1
10- 11 9
12- 13 10
14- 15 11
16- 18 12
19- 21 13
22- 26 14
27- 31 15

− Used for bit rate control by reducing the size of the residual frame adaptively when bit rate gets too high.
− A mode which allows an encoder to maintain a high frame rate during heavy motion by encoding a low-
resolution update to a higher-resolution picture while maintaining high resolution in stationary areas
122
Reduced-Resolution Update (RRU)
Up
sampling
16*16
reconstructed block
8*8 Coefficients
block
Result of inverse
transform
Coefficients
decoding
Block layer
decoding
Bitstream
Scaling-up
Macroblock
Layer Decoding
Pseudo-
Vector
16*16 Reconstructed
prediction error block
16*16 prediction blockMotion
Compensation
Reconstructed
Vector

− A scalable bit stream consists of layers representing different levels of video quality.
− Everything can be discarded except for the base layer and still have reasonable video.
− If bandwidth permits, one or more enhancement layers can also be decoded which refines the base layer
in one of three ways: temporal, SNR, or spatial
123
Scalability Mode
Enh. Layer 1
Enhancement Layer 3
Enhancement Layer 4
Base Layer
Enhancement Layer 2
H.263+Encoder
40kb/s
20kb/s
90kb/s
200kb/s
320kb/s
Layered Video Bitstreams

− Scalability is typically used when one bit stream must support several different transmission bandwidths
simultaneously, or some process downstream needs to change the data rate unbeknownst to the
encoder.
124
Scalability Mode
Example: Conferencing Multipoint Control Unit

125
384 kb/s
384 kb/s
128 kb/s
28.8 kb/s
Scalability Mode
Layered Video Bit Streams in Multipoint Conferencing

126
Scalability Mode
Higher
Frame Rate!
Base Layer + B Frames
Better
Spatial Quality!
Base Layer + SNR Layer
SNR
Enhancement
More
Spatial Resolution!!
Base Layer + Spatial Layer
Spatial
Enhancement
Temporal
Enhancement

SNR Scalability
EI EP EP
Enhancement Layer
I P P
Base Layer
Spatial Scalability
Base Layer I P P
EI EP EPEnhancement
Layer
Temporal Scalability
B2 B4I1
P3 P5
Scalability Mode
Low Temporal Resolution
High Temporal Resolution
127

Two or more frame rates can be supported by the same bit stream.
− It is achieved using bidirectionally predicted pictures or B-pictures.
− The B-frames can be discarded (to lower the frame rate) and the bit
stream remains usable.
− These B-pictures differ from the B-picture part of PB-frames in that they are
separate entities in the bitstream.
− These B-pictures are not syntactically intermixed with a subsequent P or its
enhancement part EP.
− B-pictures and the B part of PB-frames are not used as reference pictures
for the prediction of any other pictures. This property allows for B-pictures to
be discarded if necessary without adversely affecting any subsequent
pictures, thus providing temporal scalability.
− Since H.263 is normally used for low frame rate applications (low bit rates,
e.g. mobile), due to larger separation between the base layer I- and P-
pictures, there is normally one B-picture between them.
128
Temporal Scalability
I
or
P
B B P ......
• I and P frames form the base layer
• B-frames from the temporal enhancement layer
• B-frames can be discarded
Temporal Scalability Demonstration
• layer 0, 3.25 fps, P-frames
• layer 1, 15 fps, B-frames

The difference between the input picture and lower quality base layer
picture is coded.
− The picture in the base layer which is used for the prediction of the
enhancement layer pictures may be an I-picture, a P-picture, or the P
part of PB frames, but should not be a B-picture or the B part of a PB
frame.
− In the enhancement layer two types of picture are identified, EI
(enhancement I-picture) and EP (enhancement P-picture).
− If prediction is only formed from the base layer, then the
enhancement layer picture is referred to as EI-picture.
− In this case, the base layer picture can be an I- or a P-picture (or the P
part of PB frames).
− For both EI- and EP-pictures, prediction from the reference layer uses
no motion vectors → no inter prediction from base layer.
− however, EP may be predictively coded with respect to its previous
reconstructed picture at the same layer, called forward prediction.
129
SNR Scalability
Base Layer (15 kbit/s)
Enhancement Layer (40 kbit/s)
EI EP EP
PPI
EI EP
P P
I - Intracoded or Key Frame
P - Predicted Frame
EI - Enhancement layer key frame (enhancement I-picture)
EP - Enhancement layer predicted frame (enhancement P-picture)
SNR Scalability Demonstration
• layer 0, 10 fps, 40 kbps
• layer 1, 10 fps, 400 kbps

− The arrangement of the enhancement layer pictures in the
spatial scalability is similar to that of SNR scalability.
− The only difference is that before the picture in the reference
layer is used to predict the picture in the spatial enhancement
layer, it is downsampled by a factor of 2 either horizontally or
vertically (one-dimensional spatial scalability), or both
horizontally and vertically (two-dimensional spatial scalability).
− If enhancement layer be 2X the size of the base layer in each
dimension the base layer is interpolated (by 2X) before
predicting the spatial enhancement layer.
130
Spatial Scalability
Base Layer
Enhancement Layer
EI EP EP
PPI
EI EP
P P
P - Predicted Frame
Spatial Scalability Demonstration
• layer 0, QCIF, 10 fps, 60 kbps
• layer 1, CIF, 10 fps, 300 kbps

− It will increase the robustness of H.263 against the channel errors.
− It is possible for B-pictures to be temporally inserted not only between the
base layer pictures of type I, P, PB and, but also between the
enhancement picture types of EI and EP, whether these consist of SNR or
spatial enhancement pictures.
− It is also possible to have more than one SNR or spatial enhancement
layer in conjunction with the base layer. Thus, a multilayer scalable
bitstream can be a combination of SNR layers, spatial layers and B-
pictures.
− As with the two-layer case, B-pictures may occur in any layer.
− However, any picture in an enhancement layer which is temporally
simultaneous with a B-picture in its reference layer must be a B-picture or
the B-picture part of PB frames. This is to preserve the disposable nature
of B-pictures.
− Note, however, that B-pictures may occur in any layers that have no
corresponding picture in the lower layers. This allows an encoder to send
enhancement video with a higher picture rate than the lower layers.
131
Hybrid or Multilayer Scalability
EP
E
I
P
EI
E
P
P
B
E
P
P
EI
E
I
I
Base
Layer
Enhancement
Layer1
Enhancement
Layer2
E
P
P
B
P - Predicted Frame
Scalability Demonstration
• SNR/Spatial Scalability, 10 fps
• layer 0, 88x72, ~5 kbit/s, layer 1, 176x144, ~15
• layer 2, 176x144, ~40, layer 3, 352x288, ~80
• layer 4, 352x288, ~200

Pictures, which are dependent on other pictures, are located in the bitstream
after the pictures on which they depend.
− The bitstream syntax order is specified such that for reference pictures (i.e.
pictures having types I, P, EI, EP or the P part of PB) the following two rules
shall be obeyed:
1. All reference pictures with the same temporal reference appear in
the bitstream in increasing enhancement layer order. This is because
each lower layer reference picture is needed to decode the next
higher layer reference picture.
2. All temporally simultaneous reference pictures as discussed in item 1
appear in the bitstream prior to any B-pictures for which any of these
reference pictures is the first temporally subsequent reference picture
in the reference layer of the B-picture. This is done to reduce the
delay of decoding all reference pictures, which may be needed as
references for B-pictures. 132
Transmission Order of Pictures
Enhancement
Layer 2
Base
Layer
Enhancement
Layer 1
4
3
2
1 8
7
6
5
EI
EP
P
I B
B
B
B
Enhancement
Layer 2
Base
Layer
Enhancement
Layer 1
4
3
2
1 5
8
7
6
EI
EP
P
I B
B
B
B
Two Allowable Picture Transmission Orders

− Then, the B-pictures with earlier temporal references follow (temporally
ordered within each enhancement layer).
− The bitstream location of each B-picture complies with the following rules:
• Be after that of its first temporally subsequent reference pictures in the reference layer.
This is because the decoding of the B-pictures generally depends on the prior decoding
of that reference picture.
• Be after that of all reference pictures that are temporally simultaneous with the first
temporally subsequent reference picture in the reference layer. This is to reduce the
delay of decoding all reference pictures, which may be needed as references for B-
pictures.
• Precede the location of any additional temporally subsequent pictures other than B-
pictures in its reference layer. Otherwise, it would increase picture storage memory
requirement for the reference layer pictures.
• Be after that of all EI- and EP-pictures that are temporally simultaneous with the first
temporally subsequent reference picture.
• Precede the location of all temporally subsequent pictures within the same
enhancement layer. Otherwise, it would introduce needless delay and increase picture
storage memory requirements for the enhancement layer. 133
Transmission Order of Pictures
Enhancement
Layer 2
Base
Layer
Enhancement
Layer 1
4
3
2
1 8
7
6
5
EI
EP
P
I B
B
B
B
Enhancement
Layer 2
Base
Layer
Enhancement
Layer 1
4
3
2
1 5
8
7
6
EI
EP
P
I B
B
B
B
Two Allowable Picture Transmission Orders

I B
Base Layer P
EI EP
Enhancement
Layer 1
EP SNR
Scalability
Spatial
Scalability
Enhancement
Layer 2
EI EP EI
Temporal
Scalability
B B
Temporal
Scalability
Enhancement
Layer 3
Hybrid or Multilayer Scalability Example
134

I PBBase Layer
EI EPEnhancement
Layer 1
SNR
Scalability
Enhancement
Layer 3
B B
Temporal
Scalability
Enhancement
Layer 2
EI EI EPSpatial
Scalability
Multilayer Transmission Order Example
I B P
EI EP EP
EI EP EI
Temporal
Scalability
B B
Temporal
Scalability
EP
135

Method for interpolating pixels for 2-D scalability
136
Interpolation for Spatial Scalability
a b
c d
A B
C D
Original pixel positions
Interpolated pixel positions
a =(9A+3B+3C+D+8)/16
b =(3A+9B+C+3D+8)/16
c =(3A+B+9C+3D+8)/16
d =(A+3B+3C+9D+8)/16
Interpolation Formulation (Filtering)

Method for 2-D interpolation at boundaries
137
Interpolation for Spatial Scalability
Original Pixel Positions
Interpolated Pixel Positions
a = A
b = (3*A +B + 2) / 4
c = (A + 3*B + 2) / 4
d = (3*A + C + 2) / 4
e = (A +3*C + 2) / 4
Picture
Boundary
a b c
d
e
A B
C D
Picture
Boundary

− Improved PB-frames
• Improves upon the previous PB-frame mode by permitting forward prediction of “B” frame with a new
vector.
− Reference Picture Selection (RPS)
• A lower latency method for dealing with error prone environments by using some type of back-
channel to indicate to an encoder when a frame has been received and can be used for motion
estimation.
• In RPS Mode, a frame is not used for prediction in the encoder until it’s been acknowledged to be
error free.
138
Other Miscellaneous Features

− Independently Decodable Segments
• When signaled, it restricts the use of data outside of a current Group-of-Block segment or slice
segment. Useful for error resiliency.
− Alternative INTER VLC (AIV):
• Permits use of an alternative VLC table that is better suited for Intra coded blocks, or blocks with low
quantization.
• A mode which reduces the number of bits needed for encoding predictively-coded blocks when
there are many large coefficients the block.
139
Other Miscellaneous Features

140
Video Source
Decompress
(Decode)
Compress
(Encode)
Video Display
Coded
video

− Phone lines are “circuit-switched”.
− A (virtual) circuit is established at call initiation and remains for the duration of the call.
141
Internet Basics
Source Dest.switch
switch
switch

− Computer networks are “packet-switched”.
− Data is fragmented into packets, and each packet finds its way to the destination using different routes.
− Lots of implications...
142
Internet Basics
Source Dest.switch
switch
switchX

143
The Internet Is Heterogeneous
Router
Router Router
Corporate LAN
INTERNET
(Global Public)
AOL
HyperStream
FR: Frame Relay
SMDS: Switched Multimegabit Data Service
ATM: Asynchronous Transfer Mode
LAN LAN
TYMNET
MCI Mail
LAN Mail
Gateway
Host
Dial-up IP
“SLIP: Serial Line Internet Protocol”, “PPP: Point-
to-Point Protocol ”
IP
IPIP
“SMTP: Simple Mail
Transfer Protocol”
E-mail
FR: Frame Relay
“SLIP: Serial Line Internet Protocol”,
“PPP: Point-to-Point Protocol ”
X.25
“SMTP: Simple Mail
Transfer Protocol”
IP
Dial-up
E-mail
FR: Frame RelayFR: Frame Relay

− MCI Mail was one of the first ever commercial email services in the United States and one of the largest telecommunication services in the
world.
− AOL Mail is a free web-based email service provided by AOL, a division of Verizon Communications.
− X. 25 is an ITU-T standard protocol suite for packet-switched data communication in wide area networks (WAN).
− Frame Relay (FR) is a standardized wide area network technology that specifies the physical and data link layers of digital
telecommunications channels using a packet switching methodology.
− Asynchronous Transfer Mode (ATM) is a telecommunications standard defined by ANSI and ITU standards for carriage of user traffic, including
telephony, data, and video signals.
− Switched Multimegabit Data Service (SMDS) is a wide area networking (WAN) connection service designed for LAN interconnection through
the public telephone network. SMDS is designed for moderate bandwidth connections, between 1 to 34 Mbps, although SMDS has and is
being extended to support both lower and higher bandwidth connections.
− Tymnet was an international data communications network headquartered in Cupertino, California that used virtual call packet switched
technology and X.25, SNA/SDLC, ASCII and BSC interfaces to connect host computers at thousands of large companies, educational
institutions, and government agencies.
144
The Internet Is Heterogeneous

OSI (Open System Interconnection) Model
145

Comparison Between OSI and TCP/IP Model
146

147
Layers in the Internet Protocol Architecture
Network Access Layer
consists of routines for accessing
physical networks
1
Internet Layer
defines the datagram and handles the
routing of data.
2
Host-to-Host Transport Layer
provides end-to-end data delivery
services.
3
Application Layer
consists of applications and processes
that use the network.
4
Header
Header
Header
Data
Data
Header Data
Header Header Data

148
Internet Protocol Architecture
I P
FDDI
Ethernet
Token Ring
HDLC
SMDS
X.25
ATM
FR
TCP UDP
SNMP DNS
TELNET FTP SMTP
MIME . . .
. . . Network
Access
Layer
Internet
Host-Host
Transport
Utility/Application
RTP
MBone
VIC/VAT

149
Specific Protocols for Multimedia
IP
TCP UDP
RTP
Physical Network
Data
IP
UDP
RTP
payload
RTP
payload
UDP
RTP
payload
Payload header

− IP implements two basic functions
• Addressing
• Fragmentation
− IP treats each packet as an independent entity.
− Internet routers choose the best path to send each packet based on its
address. Each packet may take a different route.
− Routers may fragment and reassemble packets when necessary for
transmission on smaller packet networks.
− No guarantee a packet will reach its destination, and no guarantee of when
it will get there.
• IP packets have a Time-to-Live, after which they are deleted by a router.
• IP does not ensure secure transmission.
• IP only error-checks headers, not payload.
150
The Internet Protocol (IP)
IP
TCP UDP
RTP
Physical Network
Data
IP
UDP
RTP
payload
RTP
payload
UDP
RTP
payload
Payload header

− TCP is connection-oriented, end-to-end reliable, in-order protocol.
− TCP does not make any reliability assumptions of the underlying networks.
− Acknowledgment is sent for each packet.
− A transmitter places a copy of each packet sent in a timed buffer.
− If no “ack” is received before the time is out, the packet is re-transmitted.
− TCP has inherently large latency → not well suited for streaming multimedia.
151
Transmission Control Protocol (TCP)
IP
TCP UDP
RTP
Physical Network
Data
IP
UDP
RTP
payload
RTP
payload
UDP
RTP
payload
Payload header

− UDP is a simple protocol for transmitting packets over IP.
− Smaller header than TCP, hence lower overhead.
− Does not re-transmit packets.
− This is OK for multimedia since a late packet usually must be discarded anyway.
− Performs check-sum of data.
152
Universal Datagram Protocol (UDP)
IP
TCP UDP
RTP
Physical Network
Data
IP
UDP
RTP
payload
RTP
payload
UDP
RTP
payload
Payload header

153
Transmission Control Protocol (TCP) and Universal Datagram Protocol (UDP)

− RTP carries data that has real time properties
− Typically runs on UDP/IP
− Does not ensure timely delivery or QoS.
− Does not prevent out-of-order delivery.
− Profiles and payload formats must be defined.
• Profiles define extensions to the RTP header for a particular class of applications
such as audio/video conferencing (IETF RFC 1890).
• Payload formats define how a particular kind of payload, such as H.261 video,
should be carried in RTP.
− Used by Netscape LiveMedia, Microsoft NetMeeting®, Intel VideoPhone,
ProShare® Video Conferencing applications and public domain
conferencing tools such as VIC and VAT.
154
Real time Transport Protocol (RTP)
IP
TCP UDP
RTP
Physical Network
Data
IP
UDP
RTP
payload
RTP
payload
UDP
RTP
payload
Payload header

− RTCP is a companion protocol to RTP which
• monitors the quality of service
• conveys information about the participants in an on-going session
− It allows participants to send transmission and reception statistics to other participants.
− It also sends information that allows participants to associate media types such as audio/video for lip-sync.
− Sender reports allow senders to derive round trip propagation times.
− Receiver reports include count of lost packets and inter-arrival jitter.
− Scales to a large number of users since must reduce the rate of reports as the number of participants
increases.
155
Real-time Transport Control Protocol (RTCP)

− Most IP-based communication is unicast. A packet is intended for a single destination.
− In unicasting, the router forwards the received packet through only one of its interfaces.
− The relationship between the source and the destination is one-to-one.
156
Unicasting

− For multi-participant applications, streaming multimedia to each destination individually can waste network resources.
− A multicast address is designed to enable the delivery of packets to a set of hosts that have been configured as
members of a multicast group across various subnetworks.
− In multicasting, the router may forward the received packet through several of its interfaces.
− The source address is a unicast address, but destination address is a group address.
157
Multicast
Packets are
duplicated in
routers
One source and a group of destination

158
Unicast Example, Streaming Media to Multi-participants
S1
D1
S2
D1
D2
R
R
R
1
1
2
S1 sends duplicate
packets because there’s
two participants: D1, D2..
D2 sees excess
traffic on this subnet.

159
Multicast Example, Streaming Media to Multi-participants
S1
D1
S2
D1
D2
R
R
R
1
2
S1 sends single set of
packets to a multicast group.
D2 doesn’t see any
excess traffic on this subnet.
Both D1 receivers
subscribe to the
same multicast group.

− A multicast router may not find another multicast router in the neighborhood to forward the multicast packet.
− We make a multicast backbone (Mbone) out of these isolated routers using the concept of tunneling.
− The multicast backbone (Mbone) was an experimental backbone and virtual network built on top of the Internet for
carrying IP multicast traffic on the Internet. It required specialized hardware and software (early of 1990s).
160
Multicast Backbone (Mbone)
concept of tunneling.
Virtual point- to-point link
Isolated island of routers
Nonmulticast routers

− Easy to deploy (no explicit router support).
− Manual tunnel creation/maintenance.
− No routing policy – single tree.
161
MBONE

162
IP header
G=224.x.x.x
Data
Nonmulticast
routers
IP header
G=224.x.x.x Data
Encapsulator
(router entry point of the tunnel) Decapsulator
(router exit point of the tunnel)
Mbone IP in IP Tunneling

Real-time applications
• Interactive applications are sensitive to packet delays (telephone)
• Non-interactive applications can adapt to a wider range of packet delays (audio, video broadcasts)
• Guarantee of maximum delay is useful
163
Quality of Service Requirements (1)
Arrival
Offset
Graph
Playout
Point
Sampled Audio
Playout Buffer must be small
for interactive applications

Elastic applications
− Interactive data transfer (e.g. HTTP, FTP)
• Sensitive to the average delay, not to the distribution tail
− Bulk data transfer (e.g. mail and news delivery)
• Delay insensitive
− Best effort works well
164
Quality of Service Requirements (2)
Document
Document is only useful when it is
completely received. This means
average packet delay is important,
not maximum packet delay.
Document

Used by hosts to obtain a certain QoS from underlying networks for a multimedia stream (It operates over an IPv4 or IPv6).
− It provides receiver-initiated setup of resource reservations for multicast or unicast data flows.
− At each node, RSVP daemon attempts to make a resource reservation for the stream.
− It communicates with two local modules:
• Admission Control: It determines whether the node has sufficient resources available. “The Internet Busy Signal”
• Policy Control: It determines whether the user has administrative permission to make the reservation.
165
ReSerVation Protocol (RSVP)
Application
RSVPD
Admissions
Control
Packet
Classifier
Packet
Scheduler
Policy
Control
DATA
DATA
RSVPD
Policy
Control
Admissions
Control
Packet
Classifier
Packet
Scheduler DATA
Routing
Process
Host Router
RSVP Functional Diagram

166
R4
R5
R3R2
R1
Host A
24.1.70.210
Host B
128.32.32.69
PATH
PATH
2
2. The Host A RSVP daemon generates a PATH message
that is sent to the next hop RSVP router, R1, in the
direction of the session address, 128.32.32.69.
3
3. The PATH message follows the next hop path through R5
and R4 until it gets to Host B. Each router on the path creates
soft session state with the reservation parameters.
1. An application on Host A creates a session,
128.32.32.69/4078, by communicating with the RSVP
daemon on Host A.
1

167
R4
R5
R3R2
R1
PATH
PATH
RESV
RESV
5
5. The Host B RSVP daemon generates a RESV message that
is sent to the next hop RSVP router, R4, in the direction of
the source address, 24.1.70.210.
6
6. The RESV message continues to follow the next hop
path through R5 and R1 until it gets to Host A. Each
router on the path makes a resource reservation.
4. An application on Host B communicates with the local
RSVP daemon and asks for a reservation in session
128.32.32.69/4078. The daemon checks for and finds
existing session state.
4
Host A
24.1.70.210
Host B
128.32.32.69

− HTTP generally runs on TCP/IP and is the protocol upon which World-Wide-Web data is transmitted.
− Defines a “stateless” connection between receiver and sender.
− Sends and receives MIME-like messages and handles caching, etc.
− No provisions for latency or QoS guarantees.
168
Hyper-Text Transport Protocol (HTTP)

169
Real-time Streaming Protocol (RTSP)
RTSPMeta FilesMedia file download
A “network remote control” for multimedia servers.
− Establishes and controls either a single or several time-synchronized streams of continuous media such as
audio and video.
− Supports the following operations:
• Requests a presentation from a media server.
• Invite a media server to join a conference and playback or record.
• Notify clients that additional media is available for an existing presentation.

170
RTSP
Media file download
Meta Files

171
RTSP - Example

− How do we handle the special cases of
• unicasting?
• Multicasting?
− What about
• packet-loss?
• Quality of service?
• Congestion?
We’ll look at some solutions...
172
How Do We Stream Video Over the Internet?

− HTTP was not designed for streaming multimedia, nevertheless because of its widespread deployment via
Web browsers, many applications stream via HTTP.
− It uses a custom browser plug-in which can start decoding video as it arrives, rather than waiting for the
whole file to download.
− Operates on TCP so it doesn’t have to deal with errors, but the side effect is high latency and large inter-
arrival jitter.
− Usually a receive buffer is employed which can buffer enough data (usually several seconds) to
compensate for latency and jitter.
− Not applicable to two-way communication!
− Firewalls are not a problem with HTTP.
173
HTTP Streaming

− RTP was designed for streaming multimedia.
− Does not resend lost packets since this would add latency and a late packet might as well be lost in
streaming video.
− Used by Intel Videophone, Microsoft NetMeeting, Netscape LiveMedia, RealNetworks, etc.
− Forms the basis for network video conferencing systems (ITU-T H.323)
− Subject to packet loss, and has no quality of service guarantees.
− Can deal with network congestion via RTCP reports under some conditions:
• Should be encoding real time so video rate can be changed dynamically.
• Needs a payload defined for each media it carries.
174
RTP Streaming

− Payloads must be defined in the IETF(Internet Engineering Task Force) for all media carried by RTP.
− A payload has been defined for H.263 and H.263+.
− An RTP packet typically consists of...
− The H.263 payload header contains redundant information about the H.263 bit stream which can assist a payload handler
and decoder in the event that related packets are lost.
− Slice mode of H.263+ aids RTP packetization by allowing fragmentation on MB boundaries (instead of MB rows) and
restricting data dependencies between slices.
− But what do we do when packets are lost or arrive too late to use?
175
H.263 Payload for RTP
RTP Header
H.263 Payload Header
H.263 Payload (bit stream)

176
Video Source
Decompress
(Decode)
Compress
(Encode)
Video Display
Coded
video

− Depends on network topology.
− On the Mbone
• 2-5% packet loss
• single packet loss most common
− For end-to-end transmission, loss rates of 10% not uncommon.
− For ISPs, loss rates may be even higher during high periods of congestion.
177
Internet Packet Loss

178
Distribution of length of loss bursts
observed at a receiver
0.0001
0.001
0.01
0.1
1
0 5 10 15 20 25 30 35 40 45 50
length of loss bursts, b
Probabilityofbursts
oflengthb
Conditional loss probability
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 2 4 6 8 10 12
Number of consecutive packets lost, n
Probabilityoflosingpacketn+1
Internet Packet Loss
Packet Loss Burst Lengths

Error resiliency and compression have conflicting requirements.
− Video compression attempts to remove as much redundancy out of a video sequence as possible.
− Error resiliency techniques at some point must reconstruct data that has been lost and must rely on
extrapolations from redundant data.
179
Error Resiliency
+
-
REDUNDANCY
CompressionResiliency

− Errors tend to propagate in video compression
because of its predictive nature.
−
180
Error Resiliency
I or P frame P frame
One block is lost.
Error propagates to two
blocks in the next frame.

There are essentially two approaches to dealing with errors from packet loss:
• Error Redundancy Methods
• They are preventative measures that add extra infromation at the encoder to make it easier to
recover when data is lost.
• The extra overhead decreases compression efficiency but should improve overall quality in the
presence of packet loss.
• Error Concealment Techniques
• They are the methods that are used to hide errors that occur once packets are lost.
− Usually both methods are employed.
181
Error Resiliency

182
Intra Coding Resiliency
20
25
30
35
40
45
20 40 60 80 100 120 140 160 180
Data Rate (kbps)
AveragePSNR
resil 0
loss 0
resil 5
loss 0
resil 10
loss 0
resil 0
loss 10-
20
resil 5
loss 10-
20
resil 10
loss 10-
20

− Increasing the number of Intra coded blocks that the encoder produces will reduce error propagation
since Intra blocks are not predicted.
− Blocks that are lost at the decoder are simply treated as empty Inter coded blocks (Skipped Blocks).
− The block is simply copied from the previous frame.
− Very simple to implement.
183
Simple Intra Coding & Skipped Blocks

184
Reference Picture Selection (RPS) Mode of H.263+
I or P
frame
P
frame
P
frame
Last acknowledged
error-free frame.
In RPS Mode, a frame is not used for prediction in the
encoder until it’s been acknowledged to be error free.
No acknowledgment
received yet - not used for
prediction.
− Select one of several picture memories/prediction structures to reduce error propagation.
Bad picture

• Back channel message types
– Neither: no back channel is returned form decoder to encoder
– ACK: decoder returns only acknowledgement messages
– NACK: decoder returns only non-acknowledgement messages
– ACK+NACK: decoder returns both types of messages
• Channel for Back channel messages
– Separate Logical Channel: uses separate logical channel in the multiplex layer of system
– VideoMux: sends back-channel data within forward video data of a video stream coded data
ACK-based: a picture is assumed to contain errors, and thus is not used for prediction unless an ACK is
received.
NACK-based: a picture will be used for prediction unless a NACK is received, in which case the previous
picture that didn’t receive a NACK will be used.
185

186
Coding Control (CC)
T Q
Q
T
p
t
qz
q
v
Video
in
To
video
multiplex
coder
-1
-1
P AP1
AP2
APn

Reference pictures are interleaved to create two or more independently decodable threads.
− If a frame is lost, the frame rate drops to 1/2 rate until a sync frame is reached.
− Same syntax as Reference Picture Selection, but without ACK/NACK.
− Adds some overhead since prediction is not based on most recent frame.
187
Multi-threaded Video
1
3
2
5 7 9
4
6
8 10
I
P
P
P
P P
P P
P
I

− A video encoder contains a decoder (called the loop decoder) to create decoded previous frames
which are then used for motion estimation and compensation.
− The loop decoder must stay in sync with the real decoder, otherwise errors propagate.
188
Conditional Replenishment
ME/MC DCT, etc.
Decoder
Decoder
Encoder

− One solution is to discard the loop decoder.
− Can do this if we restrict ourselves to just two macroblock types:
• Intra coded
• Empty (just copy the same block from the previous frame)
− The technique is to check if the current block has changed substantially since the previous frame and
then code it as Intra if it has changed. Otherwise mark it as empty.
− A periodic refresh of Intra coded blocks ensures all errors eventually disappear.
189
Conditional Replenishment
ME/MC DCT, etc.
Decoder
Decoder
Encoder

− Lost macroblocks are reported back to the encoder using a reliable back-channel.
− The encoder catalogs spatial propagation of each macroblock over the last M frames.
− When a macroblock is reported missing, the encoder calculates the accumulated error in each MB of the
current frame.
− If an error threshold is exceeded, the block is coded as Intra.
− Additionally, the erroneous macroblocks are not used as prediction for future frames in order to contain
the error.
190
Error Tracking
Appendix II, H.263

− Some parts of a bit stream contribute more to image artifacts than others if lost.
− The bit stream can be prioritized and more protection can be added for higher priority portions.
191
Prioritized Encoding
AC Coefficients
DC Coefficients
MB Information
Motion Vectors
Picture Header
Increasing
Error Protection
Unprotected Encoding
Prioritized Encoding
(23% Overhead)
Prioritized Encoding Demo
VideosusedwithpermissionofICSI,UCBerkeley

− To hide the image degradation from the viewer.
− The main idea behind error concealment is to replace the damaged pixels with pixels from some parts of
the video that have maximum resemblance.
− In general, pixel substitution may come from the same frame or from the previous frame.
− These are called intraframe and interframe error concealment, respectively
192
Error Concealment by Interpolation
d1
d2
Lost block
Take the weighted average of
4 neighboring pixels.

Error Concealment with
• Least Square Constraints
• Bayesian Estimators
• Polynomial Interpolation
• Edge-Based Interpolation
• Multi-directional Recursive Nonlinear Filter (MRNF)
193
Other Error Concealment Techniques
MPQT@0.5 bpp, block loss:10% MRNF-GMLOS, PSNR=34.94dB
Example: MRNF Filtering

− Most multimedia applications place the burden of rate adaptivity on the source.
− For multicasting over heterogeneous networks and receivers, it’s impossible to meet the conflicting
requirements which forces the source to encode at a least-common denominator level.
− The smallest network pipe dictates the quality for all the other participants of the multicast session.
− If congestion occurs, the quality of service degrades as more packets are lost.
194
Network Congestion

Video Compression, Part 3-Section 2, Some Standard Video Codecs

Video Compression, Part 3-Section 2, Some Standard Video Codecs

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Video Compression, Part 3-Section 2, Some Standard Video Codecs

Semelhante a Video Compression, Part 3-Section 2, Some Standard Video Codecs (20)

Mais de Dr. Mohieddin Moradi

Mais de Dr. Mohieddin Moradi (11)

Último

Último (20)

Video Compression, Part 3-Section 2, Some Standard Video Codecs