SlideShare a Scribd company logo
1 of 39
EE 5359

                Project Report

                 Spring 2008




Study and Comparison of MPEG 2 and H.264 main
   profiles and available transcoding methods




               Priyanka Ankolekar
                  1000 51 4497
List of Acronyms


AVC: Advanced video coding
CABAC: Context-based adaptive binary arithmetic coding
CAVLC: Context-based adaptive variable length coding
DCT: Discrete Cosine Transform
GOP: Group of pictures
HDTV: High Definition Television
IDCT: Inverse DCT.
IQ: Inverse quantization:
ISO: International Organization for Standardization.
ITU: International telecommunication union.
JVT: Joint Video Team
M.E: Motion Estimation
M.C: Motion Compensation
MB: Macroblock
MV: Motion vector.
NAL: Network abstraction layer
QP: Quantization parameter.
VLC: Variable length coding
VLD: Variable length decoding
VCL: Video coding layer
VCEG: Video coding experts group.




                                                         2
Abstract


There is a high demand for multimedia applications like digital video recording
and teleconferencing. This has led to the development of various video coding
standards like MPEG-2 and H.264. The video coding layer of H.264 is
superficially similar to that of MPEG-2; however, there are several differences in
the details. In this project the MPEG-2 and H.264 video coding standards are
compared with a concentration on the main profiles. H.264 gives a better
compression performance than MPEG-2. However, MPEG-2 has already been
widely used in the field of digital broadcasting, HDTV and DVD applications. This
incompatibility problem between H.264 video source and the existing MPEG-2
decoders can be solved using transcoders. This project also discusses the
criteria for efficient transcoding and a few transcoding architectures.




                                                                                3
1 Introduction
Development of the international video coding standards such as MPEG-2 [7]
[11][17][16] boosted a diverse range of multimedia applications, including digital
video recording and teleconferencing. As a result of the growing demand for
better compression performance, advanced standards such as H.264 [1][2][6][9]
[18] were developed by the ITU-T-ISO/IEC Joint Video Team (JVT) in 2003. The
overall scheme of the video coding layer (VCL) of H.264 is superficially similar to
the encoding scheme of MPEG-2. However, there are significant differences in
the details. In this project the MPEG-2 and H.264 video coding standards are
compared, i.e. the similarities and differences are studied, with a concentration
on the main profiles.

H.264 can support various applications such as video broadcasting, video
streaming and video conferencing over fixed and wireless networks and over
different transport protocols. However, MPEG-2 has already been widely used in
the field of digital broadcasting, HDTV and DVD applications. The incompatibility
problem between H.264 video source and the existing MPEG-2 decoders can be
solved by using transcoders. In this project, the criteria for transcoding and a few
transcoding architectures are discussed.

The report has been structured in the following manner: Chapter 1 is an
introduction to the topic and explains the scope of the project. Chapter 2 explains
the various aspects of the MPEG-2 video coding standard while Chapter 3
covers the same for H.264 video coding standard. Chapter 4 shows a
comparison between the two standards. In Chapter 5, the topic of MPEG-2 to
H.264 transcoding is covered in greater detail.




                                                                                  4
2 MPEG-2
MPEG-2 is widely used as the format of digital television signals that are
broadcast by terrestrial (over-the-air), cable, and direct broadcast satellite TV
systems. It also specifies the format of movies and other programs that are
distributed on DVD and similar disks. As such, TV stations, TV receivers, DVD
players, and other equipment are often designed to this standard. MPEG-2 was
the second of several standards developed by the Moving Pictures Expert Group
(MPEG) and is an international standard (ISO/IEC 13818). [16]

The video section, part 2 of MPEG-2, is similar to the previous MPEG-1
standard, but also provides support for interlaced video; the format used by
analog broadcast TV systems. MPEG-2 video is not optimized for low bit-rates,
especially less than 1 Mbit/s at standard definition resolutions. However, it
outperforms MPEG-1 at 3 Mbit/s and above. MPEG-2 is directed at broadcast
formats at higher data rates of 4 Mbps (DVD) and 19 Mbps (HDTV). All
standards-compliant MPEG-2 video decoders are fully capable of playing back
MPEG-1 video streams. MPEG-2/video is formally known as ISO/IEC 13818-2
and as ITU-T Rec. H.262 [21].


2.1 MPEG-2 Profiles and Levels

MPEG-2 video supports wide range of applications from mobile to high quality
HD editing. For many applications, it is unrealistic and too expensive to support
the entire standard. To allow such applications to support only subsets of it, the
standard defines profile and level. [21]

Description

MPEG-2 video is a family of systems, each having an arranged degree of
commonality and compatibility. It allows four source formats, or ‘Levels’, to be
coded, ranging from Limited Definition (about today’s VCR4 quality), to full
HDTV5 – each with a range of bit rates [22]. The level defines the subset of
quantitative capabilities such as maximum bit rate, maximum frame size, etc [16].

In addition to this flexibility in source formats, MPEG-2 allows different ‘Profiles’.
Each profile offers a collection of compression tools that together make up the
coding system. A different profile means that a different set of compression tools
is available. [22]




                                                                                    5
MPEG-2 Profiles

2.1.1.1 Simple Profile
This profile has the fewest tools. The Simple profile offers the basic toolkit for
MPEG-2 encoding. This is intra and predicted frame encoding and decoding with
a color sub sampling of YUV 4:2:0.

2.1.1.2 Main Profile
This profile has all the tools of the Simple Profile plus one more (termed bi-
directional prediction). It gives better (maximum) quality for the same bit-rate than
the Simple Profile. A Main Profile decoder decodes both Main and Simple Profile-
encoded pictures. This backward compatibility pattern applies to the succession
of profiles. A refinement of the Main Profile, sometimes unofficially known as
Main Profile Professional Level or MPEG 422, allows line-sequential color
difference signals (4:2:2) to be used, but not the scaleable tools of the higher
Profiles.

2.1.1.3 SNR Scalable Profile and Spatially Scalable Profile
The two Profiles after the Main Profile are, successively, the SNR Scaleable
Profile and the Spatially Scaleable Profile. These add tools which allow the
coded video data to be partitioned into a base layer and one or more ‘top-up’
signals. The top-up signals can either improve the noise (SNR Scalability) or the
resolution (Spatial Scalability). These Scaleable systems may have interesting
uses. The lowest layer can be coded in a more robust way, and thus provide a
means to broadcast to a wider area, or provide a service for more difficult
reception conditions. Nevertheless there will be a premium to be paid for their
use in receiver complexity. Owing to the added complexity, none of the Scaleable
Profiles is supported by digital video broadcasting (DVB). The inputs to the
system are YUV component radio. However, the first four profiles code the color-
difference signals line-sequentially.

2.1.1.4 High Profile
It includes all the previous tools plus the ability to code line-simultaneous colour-
difference signals. In effect, the High Profile is a ‘super system’, designed for the
most sophisticated applications, where there is no constraint on bit rate.

Table 1 is a tabulated form of the properties of the various MPEG-2 profiles.




                                                                                        6
Table 1. MPEG-2 Profiles


                                 MPEG-2 Profiles[16]


                          Picture
                                        Chroma
Abbr.       Name          Coding               Aspect Ratios Scalable modes
                                        Format
                          Types


                                                 square pixels,
 SP     Simple profile I, P           4:2:0                     none
                                                 4:3, or 16:9


                                                 square pixels,
 MP     Main profile   I, P, B        4:2:0                     none
                                                 4:3, or 16:9


                                                                SNR (signal-to-
        SNR Scalable                             square pixels,
 SNR                 I, P, B          4:2:0                     noise ratio)
        profile                                  4:3, or 16:9
                                                                scalable


       Spatially
Spatia                                           square pixels, SNR- or spatial-
       Scalable        I, P, B        4:2:0
  l                                              4:3, or 16:9 scalable
       profile


                                      4:2:2 or   square pixels, SNR- or spatial-
 HP     High profile   I, P, B
                                      4:2:0      4:3, or 16:9     scalable




MPEG-2 Levels




                                                                                   7
2.1.1.5 Description of Levels

A level is the definition for the MPEG standard for physical parameters such as
bit rates, picture sizes and resolutions. There are four levels specified by
MPEG2: High level, High 1440, Main level, and Low level. MPEG-2 Video Main
Profile and Main level has sampling limits at ITU-R 601 parameters (PAL and
NTSC). Profiles limit syntax (i.e. algorithms) whereas Levels limit encoding
parameters (sample rates, frame dimensions, coded bitrates, buffer size etc.).
Together, Video Main Profile and Main Level (abbreviated as MP@ML) keep
complexity within current technical limits, yet still meet the needs of the majority
of applications. MP@ML is the most widely accepted combination for most cable
and satellite systems; however different combinations are possible to suit other
applications. [4]

Table 2 shows a comparison between the four MPEG-2 levels on the basis of the
frame size (PAL/NTSC) and the maximum bit rate for each.


                           Table 2. MPEG-2 Levels [22]




2.2 MPEG-2 Encoder




                                                                                  8
Figure 1. MPEG 2 encoder [10]

The various blocks of the MPEG-2 encoder are explained below:

DCT
The MPEG-2 encoder uses 8x8 2-D DCT. In the case of intra frames, it is applied
to 8x8 blocks of pels and in the case of inter frames it is applied to 8x8 blocks of
the residual (motion compensated prediction errors). Since DCT is more efficient
in compressing correlated sources, intra pictures DCT compress more efficiently
than inter pictures.

2.2.1 Quantizer
The DCT coefficients obtained above are then quantized by using a default or
modified matrix. User defined matrices may be downloaded and can occur in the
sequence header or in the quant matrix extension header. The quantizer step
sizes for DC coefficients of the luminance and chrominance components are 8, 4,
2 and 1 according to the intra DC precision of 8, 9, 10 and 11 bits respectively.

2.2.2 Motion estimation and compensation
Motion estimation and compensation: In the motion estimation process, motion
vectors for predicted and interpolated pictures are coded differentially between
macroblocks. The two motion vector components, the horizontal component first



                                                                                  9
and then the vertical component are coded independently. The motion
compensation process forms prediction from previously decoded pictures using
the motion vectors that are of integer and half-pel resolutions.

2.2.3 Coding decisions
There are four different coding modes in MPEG-2. These modes are chosen
based on whether the encoder encodes a frame picture as a frame or two fields
or in the case of interlaced pictures it can chose to encode it as two fields or use
16x8 motion compensation.

2.2.4 Scanning and VLC
The quantized transform coefficients are scanned and converted to a one
dimensional array. Two scanning methods are available:
a. Zigzag scan (Figure 2(a)): For progressive (non-interlaced) mode processing
b. Alternate scan (Figure 2(b)): For interlaced format video.




                 (a)                                                 (b)

                       Figure 2 (a) Zig Zag scan pattern (4x4) [4]
                                (b) Alternate scan pattern (4x4)




                                                                                 10
(a)




                                         (b)

Figure 3. Scan matrices in MPEG-2 [20] (8x8) (a) Zigzag scan (b) Alternate scan


The list of values produced by scanning is then entropy coded using a variable
length code (VLC).




2.3 MPEG-2 Decoder




                                                                             11
Figure 4. MPEG 2 Decoder [7]


At the decoder side, the quantized DCT coefficients are reconstructed and
inverse transformed to produce the prediction error. This predicted error is then
added to the motion compensated prediction generated from previously decoded
picture to produce the reconstructed output.

The various parts of the MPEG-2 decoder are:

2.3.1 Variable length decoding
This process involves the use of a table defined for decoding intra DC
coefficients and three tables one each for non intra DC coefficients, intra AC
coefficients and non intra AC coefficients. The decoded values basically infer one
of three courses of action: end of block, normal coefficients and escape coding.

2.3.2 Inverse scan
The output of the variable length decoding stage is one dimensional and of
length 64. Inverse scan process converts this one dimensional data into a two
dimensional array of coefficients according to a predefined scan matrix.

2.3.3 Inverse quantization
At this stage the two dimensional DCT coefficients are inverse quantized to
produce the reconstructed DCT coefficients. This process involves the rescaling
of the coefficients by essentially multiplying them by the quantizer step size. The
quantizer step size can be modified by using either a weighing matrix or a scale



                                                                                12
factor. After performing inversion quantization, saturation and mismatch control
operations are performed.

2.3.4 Inverse DCT
Once the reconstructed DCT coefficients are obtained, a 2D 8x8 inverse DCT is
applied to obtain the inverse transformed values. These values are then
saturated to keep them in the range of [-256:+255].

2.3.5 Motion Compensation
During this stage, predictions from previously decoded pictures are combined
with the inverse DCT transformed coefficient data to get the final decoded output.




3 H.264



                                                                               13
H.264/AVC [1][9] was developed by the JVT (Joint Video Team) to achieve
MPEG-2 [7] quality compression at almost half the bit rate. H.264/AVC provides
significant coding efficiency, simple syntax specifications, and seamless
integration of video coding into all current protocols and multiplex architectures.
H.264 supports various applications such as video broadcasting, video
streaming, and video conferencing over fixed and wireless networks and over
different transport protocols. [4]

H.264 video coding standard has the same basic functional elements as previous
standards (MPEG-1, MPEG-2, MPEG-4 part 2, H.261, H.263) [23], i.e., transform
for reduction of spatial correlation, quantization for bitrate control, motion
compensated prediction for reduction of temporal correlation, entropy encoding
for reduction of statistical correlation. However, in order to fulfill better coding
performance, the important changes in H.264 occur in the details of each
functional element by including intra-picture prediction, a new 4x4 integer
transform, multiple reference pictures, variable block sizes and a quarter pel
precision for motion compensation, a deblocking filter, and improved entropy
coding. [1]


3.1 H.264 Profiles

Each Profile specifies a subset of entire bitstream of syntax and limits that shall
be supported by all decoders conforming to that Profile. There are three Profiles
in the first version: Baseline, Main, and Extended. Baseline Profile is to be
applicable to real-time conversational services such as video conferencing and
videophone. Main Profile is designed for digital storage media and television
broadcasting. Extended Profile is aimed at multimedia services over Internet.
Also there are four High Profiles defined in the fidelity range extensions[19] for
applications such as content-contribution, content-distribution, and studio editing
and post-processing : High, High 10, High 4:2:2, and High 4:4:4. High Profile is to
support the 8-bit video with 4:2:0 sampling for applications using high resolution.
High 10 Profile is to support the 4:2:0 sampling with up to 10 bits of
representation accuracy per sample. High 4:2:2 Profile is to support up to 4:2:2
chroma sampling and up to 10 bits per sample. High 4:4:4 Profile is to support up
to 4:4:4 chroma sampling, up to 12 bits per sample, and integer residual color
transform for coding RGB signal. The Profiles have both the common coding
parts and as well specific coding parts as shown in Figure 5. [1]

 3.1.1 Common Parts of All Profiles




                                                                                 14
3.1.1.1 I slice (Intra-coded slice)

        This slice is coded by using prediction only from decoded samples within
        the same slice.

3.1.1.2 P slice (Predictive-coded slice)

This slice (Figure 6) is coded by using inter prediction from previously-decoded
reference pictures, using at most one motion vector and reference index to
predict the sample values of each block.

3.1.1.3 CAVLC (Context-based Adaptive Variable Length Coding)

This is used for entropy coding. After transform and quantization, the probability
that the level of coefficients is zero or +/-1 is very high. CAVLC handles the zero
and +/-1 coefficients as the different manner with the levels of coefficients. The
total numbers of zero and +/-1 are coded. For other coefficients, their levels are
coded.


 3.1.2 Baseline Profile

3.1.2.1 Flexible macroblock order

Macroblocks may not necessarily be in the raster scan order. The map assigns
macroblocks to a slice group.

3.1.2.2 Arbitrary slice order

The macroblock address of the first macroblock of a slice of a picture may be
smaller than the macroblock address of the first macroblock of some other
preceding slice of the same coded picture.

3.1.2.3 Redundant slice

This slice belongs to the redundant coded data obtained by same or different
coding rate, in comparison with previous coded data of same slice.




                                                                                15
Figure 5. The specific coding parts of the Profiles in H.264 [1].



3.1.3 Main Profile

3.1.3.1 B slice (Bi-directionally predictive-coded slice)

This slice (Figure 6) is coded by using inter prediction from previously-decoded
reference pictures, using at most two motion vectors and reference indices to
predict the sample values of each block.

3.1.3.2 Weighted prediction

This is a scaling operation performed by applying a weighting factor to the
samples of motion-compensated prediction data in P or B slice. A prediction
signal p for B slice is obtained using different weights from two reference signals,
r1 and r2.


Equation 1 [1]:                           p = w1 × r1 + w2 × r2

where w1 and w2 are weights.


                                                                                 16
3.1.3.3 CABAC (Context-based Adaptive Binary Arithmetic Coding)

This is used for entropy coding. It utilizes arithmetic coding in order to achieve
good compression.




            Figure 6. Illustration of temporal prediction (B and P slices)



3.1.4 Extended Profile

This profile includes all parts of Baseline Profile: flexible macroblock order,
arbitrary slice order, and redundant slice. The other features of this profile are:

3.1.4.1 SP slice

The specially coded slice for efficient switching between video streams, similar to
coding of a P slice.

3.1.4.2 SI slice

The switched slice, similar to coding of an I slice.

3.1.4.3 Data partition

The coded data is placed in separate data partitions, each partition can be
placed in different layer unit.




                                                                                17
3.1.4.4 B slice

H.264 generalizes the concept of bidirectional prediction and supports not only
forward/backward    prediction   pairs     but   also    forward/forward   and
backward/backward pairs.

3.1.4.5 Weighted prediction

All existing standards consider equal weights for reference pictures, i.e. a
prediction signal is obtained by averaging using equal weights of reference
signals. But gradual transitions from scene to scene need different weights.
H.264 uses weighted prediction method for a macroblock of P slice or B slice.

3.1.5 High Profiles

High profiles include all parts of the Main Profile: B slice, weighted prediction,
CABAC. The salient features of this profile are:

3.1.5.1 Adaptive transform block size

H.264 uses an adaptive transform block size, 4 x 4 and 8 x 8 (High Profiles only),
whereas previous video coding standards used the 8 x 8 DCT. The smaller block
size leads to a significant reduction in ringing artifacts. Also, the 4 x 4 transform
has the additional benefit of removing the need for multiplications. [1]

3.1.5.2 Quantization scaling matrices

Different scaling according to specific frequency associated with the transform
coefficients in the quantization process to optimize the subjective quality. The
High Profiles support the perceptual-based quantization scaling matrices similar
to those used in MPEG-2. The encoder can specify a matrix for scaling factor
according to the specific frequency associated with the transform coefficient for
use in inverse quantization scaling by the decoder. This allows optimization of
the subjective quality according to the sensitivity of the human visual system
which is less sensitive to the coded error in high frequency transform coefficients.

Table 3 shows a comparison between the baseline, extended, main and high
profiles of H.264.




                                                                                  18
Table 3. Comparison chart for the various profiles of H.264 [18]


                                       Baseline Extended Main High

             I and P Slices              Yes      Yes    Yes Yes
               B Slices                  No       Yes    Yes Yes
           SI and SP Slices              No       Yes     No   No
      Multiple Reference Frames          Yes      Yes    Yes Yes
       In-Loop Deblocking Filter         Yes      Yes    Yes Yes
        CAVLC Entropy Coding             Yes      Yes    Yes Yes
        CABAC Entropy Coding             No       No     Yes Yes
  Flexible Macroblock Ordering (FMO)     Yes      Yes     No   No
     Arbitrary Slice Ordering (ASO)      Yes      Yes     No   No
        Redundant Slices (RS)            Yes      Yes     No   No




                                                                     19
3.2 H.264 Encoder




                           Figure 7. H.264 encoder [9]

The encoder blocks are explained below:

3.2.1.1 4x4 Integer transform

The H.264 employs a 4x4 integer DCT as compared to 8x8 DCT adopted by the
previous standards. The smaller block size leads to a significant reduction in
ringing artifacts. Also, the 4 x 4 transform has the additional benefit of removing
the need for multiplications.

3.2.1.2 Quantization and scan

The H.264 standard specifies the mathematical formulae of the quantization
process. The scale factor for each element in each sub block varies as a function
of the quantization parameter associated with the macroblock and as a function


                                                                                20
of the position of the element within the sub block. The rate control algorithm
controls the value of the quantization parameter. Two types of scan pattern are
used for 4x4 blocks – one for frame coded macroblocks and one for field coded
macroblocks.

3.2.1.3 Context-based adaptive variable length coding (CAVLC) and Context-
        based adaptive binary arithmetic coding (CABAC) entropy coding

H.264 uses different variable length coding methods in order to match a symbol
to a code based on the context characteristics. They are context-based adaptive
variable length coding (CAVLC) and context-based adaptive binary arithmetic
coding (CABAC). All syntax elements except for the residual data are encoded
by the Exp-Golomb codes. In order to read the residual data (quantized
transform coefficients), zig-zag scan (interlaced) or alternate scan (non-interlaced
or field) is used. For coding the residual data, a more sophistical method called
CAVLC is employed. Also, CABAC is employed in Main and High profiles,
CABAC has more coding efficiency but higher complexity compared to CAVLC.

3.2.1.4 Deblocking filter

H.264 employs a deblocking filter to reduce the blocking artifacts in the block
boundaries and stops the propagation of accumulated coded noise. The filter is
applied after the inverse transform (before reconstructing and storing the
macroblock for future predictions) and in the decoder (before reconstructing and
displaying the macroblocks). The deblocking filter is applied across the edges of
the macroblocks and the sub-blocks. The filtered image is used in motion
compensated prediction of future frames and helps achieve more compression.




Figure 8. Diagram depicting how the loop filter works on the edges of the blocks
                              and sub-blocks [4]


                                                                                 21
3.2.1.5 Intra prediction

During intra prediction, the encoder derives a predicted block based on its
prediction with previously decoded samples. The predicted block is then
subtracted from the current block and then encoded. There are a total of nine
prediction modes (Figure 9) for each 4x4 luma block, four prediction modes for
each 16x16 luma block and four modes for each chroma block.




                           Figure 9. Intra prediction 4x4 [31]

3.2.1.6 Inter prediction

Inter prediction is performed on the basis of temporal correlation and consists of
motion estimation and motion compensation. As compared to the previous
standards, H.264 supports a large number of block sizes from 16x16 to 4x4.
Moreover H.264 supports motion vector accuracy of one-quarter of the luma
sample.

3.2.1.7 Reference pictures

Unlike the previous standards that just use the immediate previous I or P picture
for inter prediction, H.264 has the ability to use more than one previous reference
picture for inter prediction thus enabling the encoder to search for the best match
for the current picture from a wider set of reference pictures than just the
previously encoded one.




                                                                                22
3.3 H.264 Decoder

Figure 10 shows the block diagram of a general H.264/MPEG-4 AVC decoder.




                           Figure 10. H.264 decoder [7]

It includes all the control information such as picture or slice type, macroblock
types and subtypes, reference frames index, motion vectors, loop filter control,
quantizer step size etc, as well as coded data comprising of quantized transform
coefficients. The decoder of Figure 10 works similar to the local decoder at the
encoder; a simplified description is as follows. After entropy (CABAC or CAVLC)
decoding, the transform coefficients are inverse scanned and inverse quantized
prior to being inverse transformed. To the resulting 4_4 blocks of residual signal,
an appropriate prediction signal (intra or motion compensated inter) is added
depending on the macroblock type mbtyp (and submbtype) mode, the reference
frame, the motion vector/s, and decoded pictures store, or in intra mode. The
reconstructed video frames undergo deblock filtering prior to being stored for
future use for prediction. The frames at the output of deblocking filter may need
to undergo reordering prior to display. [2]




                                                                                23
4 Comparison between MPEG-2 and H.264

4.1 Key features of MPEG-2 video
The MPEG-2 coding standard has been designed to efficiently support both
interlaced and progressive video coding and produce high quality standard
definition video at about 4 Mbps. The MPEG-2 video standard uses a block-
based hybrid transform coding algorithm that employs transform coding of the
motion-compensated prediction error. While motion compensation exploits
temporal redundancies, the DCT transform exploits the spatial redundancies.
The asymmetric encoder-decoder complexity allows for a simpler decoder while
maintaining high quality and efficiency through a more complex encoder. [3]

4.2 Key features of H.264 video

The H.264 video coding standard has been developed recently through the joint
work of the ITU’s video coding experts group (VCEG) and ISO moving pictures
experts group (MPEG). The H.264 video coding standard is flexible and offers a
number of tools to support a range of applications with very low as well as very
high bitrate requirements.

4.3 Comparison: Similarities and Differences between MPEG-2 video
    and H.264 video

In this section, the MPEG-2 and H.264 video coding standards are compared
with respect to their various aspects such as bit rate, block size, macroblock size,
intra prediction, motion estimation blocks, quantization, motion vector prediction,
intra prediction amongst various other. Table 4 tabulates these comparisons
systematically. They are also further elaborated in the sub sections below.

4.3.1 Increased efficiency

Compared with MPEG-2 video, the H.264 video format gives perceptually
equivalent video at 1/3 to1/2 of the MPEG-2 bit rates. Some extensions--known
as "Fidelity Range Extensions" facilitate higher-fidelity video coding by supporting
higher bit-depths, including 10-bit and 12-bit encoding, and higher color
resolution using the sampling structures YUV 4:2:2 and YUV 4:4:4. This naturally
makes it attractive to video distributors, because it permits them to maximize the
number of services that may be contained in a given amount of bandwidth [30].




                                                                                 24
The bit rate gains are not a result of any single feature but a combination of a
number of encoding tools. These gains come with a significant increase in
encoding and decoding complexity [27]. In spite of the increased complexity, the
dramatic bandwidth savings encourages TV broadcasters to adopt the new
technology as they can use the bandwidth savings to provide new channels or
new data and interactive services. With the coding gains of H.264, full length
HDTV resolution movies can be stored on DVDs. Further more, the fact that the
same video coding format can be used to broadcast TV as well as for internet
streaming.

4.3.2 Coding flexibility

ISO14496-10/H.264, like previous MPEG standards, does not define a specific
encoder and decoder. Instead, it defines the syntax of an encoded bitstream and
describes the method of decoding that bitstream. The implementation is left to
the developer. [31]

The H.264 video uses the same hybrid coding approach that is used in the other
MPEG video standards: motion compensated transform coding. The H.264
employs a hybrid coding approach similar to that of MPEG-2 but differs
significantly from MPEG-2 in terms of the actual coding tools used. The main
differences are: use of an integer transform with energy compaction properties
similar to that of the DCT instead of the DCT, an in-loop deblocking filter (DF) to
reduce block artifacts, and intra frame prediction (IFP). The coder control
operation is responsible for functions such as reference frame management,
coding mode selection, and managing the encoding parameter set. Besides, the
H.264 standard introduces several other new coding tools that improve coding
efficiency.

Multiple reference picture motion compensation uses previously encoded
pictures more flexibly than does MPEG-2. In MPEG-2, a P-frame can use only a
single previously coded frame to predict the motion compensation values for an
incoming picture, while a B-frame can use only the immediately previous P- or I-
frame and the immediately subsequent P- or I-frame.

H.264 permits the use of up to 32 previously coded pictures, and it supports
more flexibility in the selection of motion compensation block sizes and shapes,
down to the use of a luma compensation block as small as 4-by-4 pixels. H.264
also supports quarter-sample motion compensation vector accuracy, as opposed
to MPEG-2's half-sample accuracy.




                                                                                25
These refinements permit more precise segmentation of moving areas within the
image, and more precise description of movement. Further, in H.264, the motion-
compensated prediction signal may be weighted and offset by the encoder,
facilitating significantly improved performance in fades (fades can be problematic
for MPEG-2).

4.3.3 Deblocking filter

Block-based coding can generate blocking artifacts in the decoded pictures. In
H.264, a de-blocking filter is brought within the motion-compensated prediction
loop, so that this filtering may be used to predict an expanded number of pictures
(Figure 8).

Switching slices, which permit a decoder to jump between bitstreams in order to
smoothly change bit-rates or do stunt modes without requiring all streams to
send an I-frame at the switch point (making the decoder's job easier at switch
points), have been incorporated.




                                                                               26
Table 4. Comparison between MPEG-2 and H.264

Algorithm Characteristic       MPEG-2                 H.264
General                        Motion                 Same basic structure as
                               compensated            MPEG
                               predictive, residual
                               transformed,
                               entropy coded
Block size                     8x8                    16x16, 8x16, 16x8, 8x8,
                                                      4x8, 8x4, 4x4
Macroblock size                16x16 (frame mode)     16x16
                               16x8 (field mode)
Intra Prediction               None                   Multi-direction,     Multi-
                                                      pattern
                               Scalar quantization Scalar quantization with
Quantization                   with step size of step size of increase at the
                               constant increment  rate of 12.5%

Entropy coding                 VLC                    CAVLC, CABAC

Weighted prediction            No                     Yes
Reference picture              One picture            Multiple pictures
Motion Estimation Blocks       16x16                  16x16, 8x16, 16x8, 8x8,
                                                      4x8, 8x4, 4x4
Motion vector prediction       Simple                 Uses        median  and
                                                      segmented
Entropy Coding                 Multiple VLC Tables    Arithmetic Coding and
                                                      adaptive VLC Tables
Frame      Distance      for   +/- 1                  Unlimited
Prediction                                            forward/backward
Fractional          Motion     1/2 Pixel              1/4 Pixel
Estimation
Deblocking Filter           None                      Dynamic edge filters
Scalable coding support [2] Yes, layered picture      With some support on
                            spatial,      SNR,        temporal    and      SNR
                            temporal scalability      scalability
Bit rates with same quality 12 -20 Mbps               7 – 8 Mbps
HD video with resolution
(1920 x 1080)
Transmission rate           2 – 15 Mbps               64 kbps – 150 Mbps




                                                                                27
4.3.4 Performance comparison between MPEG-2 and H.264 using standard
      test streams – Simulation results

Test streams (foreman, news and carphone [26]) were encoded using the open-
source MPEG-2 codec [25] and the H.264 codec [24]. The results were
compared against each other for parameters like the signal to noise ratio (SNR),
GOP and compression ratio. CIF files were used for the “Foreman” and the
“News” clips whereas QCIF was used for the “Carphone” clip. The bit rate for
H.264 encoding was taken as the standard one used by the codes. The bit rate
for MPEG-2 encoding was adjusted on the basis of the bit rate of the H.264
encoding process. This helped to compare the two standards on a common
plane. While the aim of this project is to compare the Main profiles of MPEG-2
and H.264, simulations were run for the Simple/Baseline profiles too. This was
done in order to prove quantitatively that encoding using the Main profile for both
MPEG-2 and H.264 gives a better compression ratio and better quality video
than the Simple profile. Tables 5, 6 and 7 tabulate the results obtained after
running the simulations. Figures 11, 12 and 13 show screen shots of the
encoded videos (only for the Main profiles). Section 4.3.5 explains the
conclusions drawn on the basis of the results obtained from simulations.


4.3.5 Conclusion

From the tables below, the following is concluded:
   • For the same bit rate and video resolution, the PSNR (dB) values are
      greater for H.264 encoded videos than for the MPEG-2 encoded videos
      indicating better video quality. This can be verified from the screen shots.




   •   The compression ratio for H.264 encoded video is also better than that for
       MPEG-2 encoded video inspite of better quality video.

            Compression ratio = original file size/compressed file size

   •   The video quality for H.264 video is better than for MPEG-2 video for the
       Simple/Baseline profiles as well. Therefore, it can be concluded that H.264
       video coding standard gives better compression and better video quality
       as compared to MPEG-2.




                                                                                28
Table 5. Performance comparison between MPEG-2 and H.264 main/simple
                          profiles - Foreman


   Parameter                 Main Profile        Simple profile
                       MPEG-2      H.264   MPEG-2     H.264
   Input video         352 x 288 352 x 288 352 x 288 352 x 288
   resolution          (CIF)       (CIF)   (CIF)      (CIF)
   fps                 30          30      30         30
   # frames            90          90      90         90
   encoded
   GOP                 I-P-B-B-P-   I-B-B-P-B-   I-P-P-P         I-P-P-P
                       B-B          B-P
   PSNR (Y) (dB)       30.42        37.03        32.1            37.4
   PSNR (U) (dB)       39.1         41.08        39.01           41
   PSNR (V) (dB)       39.6         43.81        40.2            43.6
   Bit rate
                       481.00       481.06       561.0           561.01
   (kbits/second)
   Compression
                       74:1         78:1         65:1            65:1
   ratio




                 (a)                                       (b)

        Figure 11. (a) Foreman – MPEG-2 (main profile) encoding
                  (b) Foreman – H.264 (main profile) encoding




                                                                           29
Table 6. Performance comparison between MPEG-2 and H.264 main profiles –
                                  News


    Parameter                   Main Profile          Simple Profile
                           MPEG-2      H.264     MPEG-2 H.264
    Input video            352 x 288 352       x 352     x 352 x 288
    resolution             (CIF)       288 (CIF) 288 (CIF) (CIF)
    fps                    30        30         30          30
    #       frames 90                90         90          90
    encoded
    GOP            I-P-B-B-P-        I-B-B-P-   I-P-P-P     I-P-P-P
                   B-B               B-B-P
    PSNR (Y) (dB)  37.02             39.1       34.01       39
    PSNR (U) (dB)          37.02     41.0       39.1        41
    PSNR (V) (dB)          39.02     42.0       39.7        42
    Bit rate
                           376.00    376.00     380.8       380.8
    (kbits/second)
    Compression
                           94:1      99.7:1     95.5:1      95.5:1
    ratio




                     (a)                                  (b)

           Figure 12.(a) News – MPEG-2 (main profile) encoding
                    (b) News – H.264 (main profile) encoding




                                                                       30
Table 7. Performance comparison between MPEG-2 and H.264 main profiles –
                                Carphone

    Parameter               Main Profile        Simple Profile
                        MPEG-2     H.264    MPEG-2 H.264
    Input video         176 x 144 176     x 176    x 176 x 144
    resolution          (QCIF)     144      144       (QCIF)
                                   (QCIF)   (QCIF)
    fps                 30         30       30        30
    # frames            90         90       90        90
    encoded
    GOP                 I-P-B-B-P-   I-B-B-P-   I-P-P-P   I-P-P-P
                        B-B          B-B-P
    PSNR (Y) (dB)       30.46        37.6       31.6      38
    PSNR (U) (dB)       36.36        40.9       39        40.8
    PSNR (V) (dB)       36.5         41.5       39        41.3
    Bit rate
                        128          127.6      147.3     147.3
    (kbits/second)
    Compression
                        69.6:1       72.6:1     60.8:1    61.9:1
    ratio




                  (a)                                     (b)

         Figure 13. (a) Carphone – MPEG-2 (main profile) encoding
                    (b) Carphone – H.264 (main profile) encoding




                                                                       31
5 Transcoding methods

5.1 Introduction to transcoding
In this fast growing world of multimedia and telecommunications there is a great
demand for efficient usage of the available bandwidth. With the growth of
technology there is an increase in the number of networks, types of devices and
different content representation formats as a result of which interoperability
between different systems and networks is gaining in importance. Transcoding of
video content is one such effort in this direction. Besides these, a transcoder can
also be used to insert new information for example company’s logos, watermarks
as well as error resilience features into a compressed video stream. Transcoding
techniques are also useful in supporting VCR trick modes such as fast-forward,
reverse play etc. for on-demand applications. [4]

Technically, transcoding is the coding and recoding of digital content from one
compressed format to another to enable transmission over different media and
playback over various devices [29].

Having said this, now arises the question of why the need for H264/AVC to
MPEG-2 transcoding [14] [15]? In order to provide better compression of video
as compared to previous standards, H.264/AVC was recently developed by the
JVT (Joint Video Team). This new standard fulfills significant coding efficiency,
simple syntax specifications and seamless integration of video coding into all
current protocols and multiplex architectures. The H.264 specification represents
a significant advancement in the field of video coding technology by providing
MPEG-2 comparable video quality at an average of half the required bandwidth.
Since widespread use of H.264 is anticipated, many legacy systems including all
Digital TVs and home receivers use MPEG-2. This leads to the need for an
efficient architecture that significantly employs the lower cost of H.264 video and
does not require a significant investment in additional video coding hardware.




            Figure 14. H.264 to MPEG-2 transcoder applications [12]


                                                                                32
5.2 How is transcoding done – the basic process

The simplest approach to transcoding is to completely decode the MPEG-2 bit
stream and then re-encode it with an H.264 encoder. The decode operation can
be performed either externally or as a part of the H.264 encoder. System issues,
such as handling SCTE-35 digital program insertion (DPI) messages, will require
that the decode and the encode operations be tightly coupled. The quality of
transcoding with this simple approach will not be high.

Figure 15 shows a comparison between direct encoding and transcoding. The
figure shows the PSNR (a measure of mean square error between the input and
decoded output) values computed at different bit rates. The PSNR numbers are
obtained by averaging the results over 18 different sequences of varying content
type and complexities. The top plot shows the performance of direct encoding
using an H.264 encoder. The bottom plot shows the performance of transcoding
where the video is originally coded with MPEG-2 at 4Mb/s, decoded and then re-
encoded with the same encoder used for direct encoding. Transcoding can result
in up to 20 percent loss in compression efficiency.

Similar to the previous approach, the incoming MPEG-2 stream is decoded and
then re-encoded using an H.264 encoder. However, here the relevant information
available from the MPEG-2 bit stream is reused.




 Figure 15. Performance comparison between direct encoding and transcoding
                                   [32]



                                                                             33
5.3 Criteria for transcoding

Transcoding can be of various types [14]. Some of them are bit rate transcoding
to facilitate more efficient transport of video, spatial and temporal resolution
reduction transcoding for use in mobile devices with limited display and
processing power and error-resilience transcoding in order to achieve higher
resilience of the original bit stream to transmission errors.

To achieve optimum results by transcoding, the following criteria have to be
fulfilled:
(i) The quality of the transcoded bitstream should be comparable to the one
obtained by direct decoding and re-encoding of the output stream.
(ii) The information contained in the input stream should be used as much as
possible to avoid multigenerational deterioration.
(iii) The process should be cost efficient, low in complexity and achieve the
highest quality possible.

5.4 Transcoding of H.264 to MPEG-2

In order to provide better compression of video as compared to previous
standards, H.264/AVC video coding standard was recently developed by the JVT
(Joint Video Team) consisting of experts from VCEG (Video Coding Experts
Group) and MPEG. This new standard fulfills significant coding efficiency, simple
syntax specifications, and seamless integration of video coding into all current
protocols and multiplex architectures. Thus H.264 can support various
applications such as video broadcasting, video streaming and video conferencing
over fixed and wireless networks and over different transport protocols. However
MPEG-2 has already been widely used in the field of digital broadcasting, HDTV
and DVD applications. Hence transcoding is a feasible method to solve the
incompatibility problem between H.264 video source and the existing MPEG-2
decoders.

An H.264/AVC to MPEG-2 transcoder is designed to transcode the H.264 video
stream to MPEG-2 format so as to be used by the MPEG-2 end equipment. It is
better to transmit H.264 bitstreams on public networks to save on the much
needed bandwidth and then transcode them into MPEG-2 bitstreams for local
MPEG-2 equipment like a set-top box.




                                                                              34
5.5 Transcoding architectures

This section describes the various transcoding architectures [15]:

5.5.1 Open loop transcoding:
Open loop transcoders include selective transmission where the high frequency
DCT coefficients are discarded and requantization. They are computationally
efficient, since they operate directly on the DCT coefficients. However they suffer
from the drift problem. Drift error occurs due to rounding, quantization loss and
clipping functions.




                Figure 16. Open loop transcoding architecture [15]

5.5.2 Cascaded pixel domain transcoding architecture:
This is a drift free architecture. It is a concatenation of a simplified decoder and
encoder as shown in Figure 17. In this architecture, instead of performing the full
motion estimation, the encoder reuses the motion vectors along with other
information extracted from the input video bitstream thus reducing the
complexity.




        Figure 17. Cascaded pixel domain transcoding architecture [15].


                                                                                 35
Simplified DCT domain transcoding (SDDT):
This architecture is based on the assumption that DCT, IDCT and motion
compensation are all linear operations. Since in this architecture, the motion
compensation is performed in the DCT domain it is a computationally intensive
operation. For instance, as shown in Figure 19, the goal is to compute the target
block B from the four overlapping blocks B1, B2, B3 and B4.




        Figure 18. Simplified DCT domain transcoding architecture [15].




                  Figure 19. DCT- Motion compensation [15].


SDDT eliminates the DCT/IDCT and reduces the frame numbers by half as a
result of which it requires less computation and memory as compared to CPDT.
However the linearity assumptions are not strictly true since there are clipping
functions performed in the video encoder/decoder and rounding operations
performed in the interpolation for fractional pixel MC. These failed assumptions
may cause drift in the transcoded video.




                                                                              36
5.5.3 Cascaded DCT domain transcoding (CDDT)

The cascaded DCT-domain transcoder can be used for spatial and temporal
resolution downscaling and other coding parameter changes. Compared to
SDDT, greater flexibility is achieved using additional DCT-motion compensation
and frame memory resulting in higher cost and complexity. This architecture is
adopted for downscaling operations where the encoder side DCT-MC and
memory will not cost much.




         Figure 20. Cascaded DCT domain transcoding architecture [15]



5.6 Conclusions

The selection of appropriate transcoding architecture depends upon the
application for which it is intended. There is generally a tradeoff between the
accuracy and the complexity and cost of the architecture. For example, the
simplest open loop architecture is the easiest to implement but it suffers from the
problem of drift whereas the cascaded DCT domain transcoding architecture
overcomes this problem but it is a very complex and expensive architecture to
implement.




                                                                                37
6 References
[1] Soon-kak Kwon, A. Tamhankar and K.R. Rao, “Overview of H.264 / MPEG-4
Part 10 (pp.186-216)”, Special issue on “ Emerging H.264/AVC video coding
standard”, J. Visual Communication and Image Representation, vol. 17,
pp.183-552, Apr. 2006.
[2] A. Puri, H. Chen and A. Luthra, “Video Coding using the H.264/MPEG-4 AVC
compression standard”, Signal Processing: Image Communication, vol.19, pp
793-849, Oct. 2004.
[3] H. Kalva, “Issues in H.264/MPEG-2 Video Transcoding”, Computer Science
and Engineering, Florida Atlantic University, Boca Raton, FL.
[4] S. Sharma, “Transcoding of H.264 bitstream to MPEG 2 bitstream”, Master’s
Thesis, May 2006, EE Department, University of Texas at Arlington.
[5] S. Sharma and K. R. Rao, “Transcoding of H.264 bitstream to MPEG-2
bitstream”, Proceedings of Asia-Pacific Conference on Communications 2007.
[6] “Emerging H.264/AVC Video Coding Standard”, J. Visual Communication and
Image Representation, vol.17, pp. 183-552, Apr. 2006.
[7] P.N.Tudor, “Tutorial on MPEG-2 Video Compression”, IEE J Langham
Thomson Prize, Electronics and Communication Engineering Journal, Dec. 1995.
[8] “The MPEG-2 International Standard”, ISO/IEC, Reference number ISO/IEC
13818-2, 1996.
[9] T. Wiegand et. al., “Overview of the H.264/AVC Video Coding Standard”,
IEEE Transactions on Circuits and Systems for Video Technology, Vol. 13, Issue
7, pp. 560-576, July 2003.
[10] J McVeigh et. al., “A software based real time MPEG-2 video encoder”, IEEE
Trans. CSVT, Vol 10, pp 1178-1184, Oct. 2000.
[11] O.J. Morris, “MPEG-2: Where did it come from and what is it?”, IEE
Colloquium, pp. 1/1-1/5, 24 Jan. 1995.
[12] P. Kunzelmann and H. Kalva, “Reduced Complexity H.264 to MPEG-2
Transcoder”, ICCE 2007, pp. 1-2, Jan. 2007.
[13] N. Kamaci and Y. Altunbasak., “Performance Comparison of the Emerging
H.264 Video Coding Standard with the existing standards”, ICME, Vol.1, pp.
345-348, July 2003.
[14] J. Xin, C. Lin and M. Sun , “Digital Video Transcoding”, Proceedings of the
IEEE, Vol. 93, Issue 1,pp 84-97, Jan. 2005.
[15] A. Vetros, C. Christopoulos and H. Sun, “Video transcoding architectures
and techniques: an overview”, IEEE Signal Processing Magazine, Vol. 20, Issue
2, pp 18-29, Mar. 2003.
[16] “MPEG-2”, Wikipedia, Feb. 14, 2008.
     Available at <http://en.wikipedia.org/wiki/Mpeg_2>
[17] “Introduction to MPEG 2 Video Compression”
     Available at <http://www.bretl.com/mpeghtml/codecdia1.HTM>


                                                                             38
[18] “H.264/MPEG-4 AVC”, Wikipedia, Feb. 18, 2008.
    Available at < http://en.wikipedia.org/wiki/H.264>
[19] “H.264 A new Technology for Video Compression” – Available at <
http://www.nuntius.com/technology3.html>
[20] R. Periera, “Efficient transcoding of MPEG-2 to H.264”, Master’s thesis,
Dec. 2005, EE Department, University of Texas at Arlington.
[21] “H.262 : Information technology - Generic coding of moving pictures and
associated audio information: Video”, International Telecommunication Union,
2000-02.
Available at < http://www.itu.int/rec/T-REC-H.262>
[22] “MPEG-2 White paper”, Pinnacle Technical Documentation, Version 0.5,
Pinnacle Systems, Feb. 29, 2000.
[23] M. Ghanbari, “Standard Codecs : Image Compression to Advanced Video
Coding,” Hertz, UK: IEE, 2003.
[24] H.264 software (version 13.2) obtained from:
<http://iphome.hhi.de/suehring/tml/>
[25] MPEG-2 software (version 12) obtained from:
<http://www.mpeg.org/MPEG/video/mssg-free-mpeg-software.html>
[26] Test streams (Foreman, News, Carphone) obtained from:
<http://www-ee.uta.edu/dip/Courses/EE5356/ee_5356.htm>
[27] Implementation Studies Group, “Main Results of the AVC Complexity
analysis”, MPEG document N4964, ISO/IEC JTC11/SC29/WG11, July 2002.
[28] A. Joch et al., “Performance comparison of video coding standards using
Lagarangian coder control”, IEEE Int. Conf. of Image Processing, Vol. 2, pp.
II-501 to II-504, Sept. 2002.
[29] I. Sylvester, “Transcoding: The future of the video market depends on it”,
IDC Executive Brief, Nov. 2006.
Available at < http://www.ed-
china.com/ARTICLES/2006NOV/2/2006NOV10_HA_AVC_HN_12.PDF>
[30] R. Hoffner, “MPEG-4 Advanced Video Coding emerges”,
Available at < http://www.tvtechnology.com/features/Tech-
Corner/F_Hoffner-03.09.05.shtml>
[31] S. Wagston and A. Susin, “IP core for an H.264 Decoder SoC”, 2007,
Available at< www.us.design-reuse.com/news/?id=15746&print=yes>
[32] S. Krishnamachari and K. Yang, “MPEG-2 to H.264 Transcoding: Why and
How?”, Dec. 1, 2006,
Available at <
http://broadcastengineering.com/infrastructure/broadcasting_mpeg_transc
oding_why/index1.html>




                                                                            39

More Related Content

What's hot

haffman coding DCT transform
haffman coding DCT transformhaffman coding DCT transform
haffman coding DCT transformaniruddh Tyagi
 
Canon XF100 & XF105
Canon XF100 & XF105Canon XF100 & XF105
Canon XF100 & XF105AV ProfShop
 
Networking lecture 4 Data Link Layer by Mamun sir
Networking lecture 4 Data Link Layer by Mamun sirNetworking lecture 4 Data Link Layer by Mamun sir
Networking lecture 4 Data Link Layer by Mamun sirsharifbdp
 
en_ETSI_302769v010101v
en_ETSI_302769v010101ven_ETSI_302769v010101v
en_ETSI_302769v010101vaniruddh Tyagi
 
Transrating_Efficiency
Transrating_EfficiencyTransrating_Efficiency
Transrating_Efficiencyaniruddh Tyagi
 
Decimator training
Decimator trainingDecimator training
Decimator trainingOmar Colom
 
Raisul Islam 063441556
Raisul Islam 063441556Raisul Islam 063441556
Raisul Islam 063441556mashiur
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
 
An efficient transcoding algorithm for G.723.1 and G.729A ...
An efficient transcoding algorithm for G.723.1 and G.729A ...An efficient transcoding algorithm for G.723.1 and G.729A ...
An efficient transcoding algorithm for G.723.1 and G.729A ...Videoguy
 
Effects of gop on multiview video
Effects of gop on multiview videoEffects of gop on multiview video
Effects of gop on multiview videocsandit
 

What's hot (19)

haffman coding DCT transform
haffman coding DCT transformhaffman coding DCT transform
haffman coding DCT transform
 
Canon XF100 & XF105
Canon XF100 & XF105Canon XF100 & XF105
Canon XF100 & XF105
 
Networking lecture 4 Data Link Layer by Mamun sir
Networking lecture 4 Data Link Layer by Mamun sirNetworking lecture 4 Data Link Layer by Mamun sir
Networking lecture 4 Data Link Layer by Mamun sir
 
en_ETSI_302769v010101v
en_ETSI_302769v010101ven_ETSI_302769v010101v
en_ETSI_302769v010101v
 
Jb2415831591
Jb2415831591Jb2415831591
Jb2415831591
 
intro_dgital_TV
intro_dgital_TVintro_dgital_TV
intro_dgital_TV
 
Digital TV
Digital TVDigital TV
Digital TV
 
Transrating_Efficiency
Transrating_EfficiencyTransrating_Efficiency
Transrating_Efficiency
 
10 fn s42
10 fn s4210 fn s42
10 fn s42
 
Decimator training
Decimator trainingDecimator training
Decimator training
 
Ijetr021253
Ijetr021253Ijetr021253
Ijetr021253
 
Multi Process Message Formats
Multi Process Message FormatsMulti Process Message Formats
Multi Process Message Formats
 
Raisul Islam 063441556
Raisul Islam 063441556Raisul Islam 063441556
Raisul Islam 063441556
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
Interprocess Message Formats
Interprocess Message FormatsInterprocess Message Formats
Interprocess Message Formats
 
An efficient transcoding algorithm for G.723.1 and G.729A ...
An efficient transcoding algorithm for G.723.1 and G.729A ...An efficient transcoding algorithm for G.723.1 and G.729A ...
An efficient transcoding algorithm for G.723.1 and G.729A ...
 
Effects of gop on multiview video
Effects of gop on multiview videoEffects of gop on multiview video
Effects of gop on multiview video
 
Mpls101
Mpls101Mpls101
Mpls101
 
10 fn s04
10 fn s0410 fn s04
10 fn s04
 

Similar to Comparing MPEG-2 and H.264 Video Standards

Introduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag JainIntroduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag JainVideoguy
 
Basics of Mpeg 4 3D Graphics Compression
Basics of Mpeg 4 3D Graphics CompressionBasics of Mpeg 4 3D Graphics Compression
Basics of Mpeg 4 3D Graphics CompressionMarius Preda PhD
 
Video Coding Standard
Video Coding StandardVideo Coding Standard
Video Coding StandardVideoguy
 
The H.264 Video Compression Standard
The H.264 Video Compression StandardThe H.264 Video Compression Standard
The H.264 Video Compression StandardVideoguy
 
Generic Video Adaptation Framework Towards Content – and Context Awareness in...
Generic Video Adaptation Framework Towards Content – and Context Awareness in...Generic Video Adaptation Framework Towards Content – and Context Awareness in...
Generic Video Adaptation Framework Towards Content – and Context Awareness in...Alpen-Adria-Universität
 
Video Compression Technology
Video Compression TechnologyVideo Compression Technology
Video Compression TechnologyTong Teerayuth
 
An overview Survey on Various Video compressions and its importance
An overview Survey on Various Video compressions and its importanceAn overview Survey on Various Video compressions and its importance
An overview Survey on Various Video compressions and its importanceINFOGAIN PUBLICATION
 
mpeg4copy-120428133000-phpapp01.ppt
mpeg4copy-120428133000-phpapp01.pptmpeg4copy-120428133000-phpapp01.ppt
mpeg4copy-120428133000-phpapp01.pptPawachMetharattanara
 
Overview of the H.264/AVC video coding standard - Circuits ...
Overview of the H.264/AVC video coding standard - Circuits ...Overview of the H.264/AVC video coding standard - Circuits ...
Overview of the H.264/AVC video coding standard - Circuits ...Videoguy
 
presentation
presentationpresentation
presentationVideoguy
 

Similar to Comparing MPEG-2 and H.264 Video Standards (20)

Introduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag JainIntroduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag Jain
 
Basics of Mpeg 4 3D Graphics Compression
Basics of Mpeg 4 3D Graphics CompressionBasics of Mpeg 4 3D Graphics Compression
Basics of Mpeg 4 3D Graphics Compression
 
Video Coding Standard
Video Coding StandardVideo Coding Standard
Video Coding Standard
 
Tutorial MPEG 3D Graphics
Tutorial MPEG 3D GraphicsTutorial MPEG 3D Graphics
Tutorial MPEG 3D Graphics
 
The H.264 Video Compression Standard
The H.264 Video Compression StandardThe H.264 Video Compression Standard
The H.264 Video Compression Standard
 
MPEG2whitepaper
MPEG2whitepaperMPEG2whitepaper
MPEG2whitepaper
 
MPEG2whitepaper
MPEG2whitepaperMPEG2whitepaper
MPEG2whitepaper
 
MPEG2whitepaper
MPEG2whitepaperMPEG2whitepaper
MPEG2whitepaper
 
Hw2
Hw2Hw2
Hw2
 
Mpeg 7-21
Mpeg 7-21Mpeg 7-21
Mpeg 7-21
 
Generic Video Adaptation Framework Towards Content – and Context Awareness in...
Generic Video Adaptation Framework Towards Content – and Context Awareness in...Generic Video Adaptation Framework Towards Content – and Context Awareness in...
Generic Video Adaptation Framework Towards Content – and Context Awareness in...
 
H264 final
H264 finalH264 final
H264 final
 
PPT
PPTPPT
PPT
 
Video Compression Technology
Video Compression TechnologyVideo Compression Technology
Video Compression Technology
 
An overview Survey on Various Video compressions and its importance
An overview Survey on Various Video compressions and its importanceAn overview Survey on Various Video compressions and its importance
An overview Survey on Various Video compressions and its importance
 
mpeg4copy-120428133000-phpapp01.ppt
mpeg4copy-120428133000-phpapp01.pptmpeg4copy-120428133000-phpapp01.ppt
mpeg4copy-120428133000-phpapp01.ppt
 
H263.ppt
H263.pptH263.ppt
H263.ppt
 
Overview of the H.264/AVC video coding standard - Circuits ...
Overview of the H.264/AVC video coding standard - Circuits ...Overview of the H.264/AVC video coding standard - Circuits ...
Overview of the H.264/AVC video coding standard - Circuits ...
 
presentation
presentationpresentation
presentation
 
video compression
video compressionvideo compression
video compression
 

More from Videoguy

Energy-Aware Wireless Video Streaming
Energy-Aware Wireless Video StreamingEnergy-Aware Wireless Video Streaming
Energy-Aware Wireless Video StreamingVideoguy
 
Microsoft PowerPoint - WirelessCluster_Pres
Microsoft PowerPoint - WirelessCluster_PresMicrosoft PowerPoint - WirelessCluster_Pres
Microsoft PowerPoint - WirelessCluster_PresVideoguy
 
Proxy Cache Management for Fine-Grained Scalable Video Streaming
Proxy Cache Management for Fine-Grained Scalable Video StreamingProxy Cache Management for Fine-Grained Scalable Video Streaming
Proxy Cache Management for Fine-Grained Scalable Video StreamingVideoguy
 
Free-riding Resilient Video Streaming in Peer-to-Peer Networks
Free-riding Resilient Video Streaming in Peer-to-Peer NetworksFree-riding Resilient Video Streaming in Peer-to-Peer Networks
Free-riding Resilient Video Streaming in Peer-to-Peer NetworksVideoguy
 
Instant video streaming
Instant video streamingInstant video streaming
Instant video streamingVideoguy
 
Video Streaming over Bluetooth: A Survey
Video Streaming over Bluetooth: A SurveyVideo Streaming over Bluetooth: A Survey
Video Streaming over Bluetooth: A SurveyVideoguy
 
Video Streaming
Video StreamingVideo Streaming
Video StreamingVideoguy
 
Reaching a Broader Audience
Reaching a Broader AudienceReaching a Broader Audience
Reaching a Broader AudienceVideoguy
 
Considerations for Creating Streamed Video Content over 3G ...
Considerations for Creating Streamed Video Content over 3G ...Considerations for Creating Streamed Video Content over 3G ...
Considerations for Creating Streamed Video Content over 3G ...Videoguy
 
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMINGADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMINGVideoguy
 
Impact of FEC Overhead on Scalable Video Streaming
Impact of FEC Overhead on Scalable Video StreamingImpact of FEC Overhead on Scalable Video Streaming
Impact of FEC Overhead on Scalable Video StreamingVideoguy
 
Application Brief
Application BriefApplication Brief
Application BriefVideoguy
 
Video Streaming Services – Stage 1
Video Streaming Services – Stage 1Video Streaming Services – Stage 1
Video Streaming Services – Stage 1Videoguy
 
Streaming Video into Second Life
Streaming Video into Second LifeStreaming Video into Second Life
Streaming Video into Second LifeVideoguy
 
Flash Live Video Streaming Software
Flash Live Video Streaming SoftwareFlash Live Video Streaming Software
Flash Live Video Streaming SoftwareVideoguy
 
Videoconference Streaming Solutions Cookbook
Videoconference Streaming Solutions CookbookVideoconference Streaming Solutions Cookbook
Videoconference Streaming Solutions CookbookVideoguy
 
Streaming Video Formaten
Streaming Video FormatenStreaming Video Formaten
Streaming Video FormatenVideoguy
 
iPhone Live Video Streaming Software
iPhone Live Video Streaming SoftwareiPhone Live Video Streaming Software
iPhone Live Video Streaming SoftwareVideoguy
 
Glow: Video streaming training guide - Firefox
Glow: Video streaming training guide - FirefoxGlow: Video streaming training guide - Firefox
Glow: Video streaming training guide - FirefoxVideoguy
 

More from Videoguy (20)

Energy-Aware Wireless Video Streaming
Energy-Aware Wireless Video StreamingEnergy-Aware Wireless Video Streaming
Energy-Aware Wireless Video Streaming
 
Microsoft PowerPoint - WirelessCluster_Pres
Microsoft PowerPoint - WirelessCluster_PresMicrosoft PowerPoint - WirelessCluster_Pres
Microsoft PowerPoint - WirelessCluster_Pres
 
Proxy Cache Management for Fine-Grained Scalable Video Streaming
Proxy Cache Management for Fine-Grained Scalable Video StreamingProxy Cache Management for Fine-Grained Scalable Video Streaming
Proxy Cache Management for Fine-Grained Scalable Video Streaming
 
Adobe
AdobeAdobe
Adobe
 
Free-riding Resilient Video Streaming in Peer-to-Peer Networks
Free-riding Resilient Video Streaming in Peer-to-Peer NetworksFree-riding Resilient Video Streaming in Peer-to-Peer Networks
Free-riding Resilient Video Streaming in Peer-to-Peer Networks
 
Instant video streaming
Instant video streamingInstant video streaming
Instant video streaming
 
Video Streaming over Bluetooth: A Survey
Video Streaming over Bluetooth: A SurveyVideo Streaming over Bluetooth: A Survey
Video Streaming over Bluetooth: A Survey
 
Video Streaming
Video StreamingVideo Streaming
Video Streaming
 
Reaching a Broader Audience
Reaching a Broader AudienceReaching a Broader Audience
Reaching a Broader Audience
 
Considerations for Creating Streamed Video Content over 3G ...
Considerations for Creating Streamed Video Content over 3G ...Considerations for Creating Streamed Video Content over 3G ...
Considerations for Creating Streamed Video Content over 3G ...
 
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMINGADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
 
Impact of FEC Overhead on Scalable Video Streaming
Impact of FEC Overhead on Scalable Video StreamingImpact of FEC Overhead on Scalable Video Streaming
Impact of FEC Overhead on Scalable Video Streaming
 
Application Brief
Application BriefApplication Brief
Application Brief
 
Video Streaming Services – Stage 1
Video Streaming Services – Stage 1Video Streaming Services – Stage 1
Video Streaming Services – Stage 1
 
Streaming Video into Second Life
Streaming Video into Second LifeStreaming Video into Second Life
Streaming Video into Second Life
 
Flash Live Video Streaming Software
Flash Live Video Streaming SoftwareFlash Live Video Streaming Software
Flash Live Video Streaming Software
 
Videoconference Streaming Solutions Cookbook
Videoconference Streaming Solutions CookbookVideoconference Streaming Solutions Cookbook
Videoconference Streaming Solutions Cookbook
 
Streaming Video Formaten
Streaming Video FormatenStreaming Video Formaten
Streaming Video Formaten
 
iPhone Live Video Streaming Software
iPhone Live Video Streaming SoftwareiPhone Live Video Streaming Software
iPhone Live Video Streaming Software
 
Glow: Video streaming training guide - Firefox
Glow: Video streaming training guide - FirefoxGlow: Video streaming training guide - Firefox
Glow: Video streaming training guide - Firefox
 

Comparing MPEG-2 and H.264 Video Standards

  • 1. EE 5359 Project Report Spring 2008 Study and Comparison of MPEG 2 and H.264 main profiles and available transcoding methods Priyanka Ankolekar 1000 51 4497
  • 2. List of Acronyms AVC: Advanced video coding CABAC: Context-based adaptive binary arithmetic coding CAVLC: Context-based adaptive variable length coding DCT: Discrete Cosine Transform GOP: Group of pictures HDTV: High Definition Television IDCT: Inverse DCT. IQ: Inverse quantization: ISO: International Organization for Standardization. ITU: International telecommunication union. JVT: Joint Video Team M.E: Motion Estimation M.C: Motion Compensation MB: Macroblock MV: Motion vector. NAL: Network abstraction layer QP: Quantization parameter. VLC: Variable length coding VLD: Variable length decoding VCL: Video coding layer VCEG: Video coding experts group. 2
  • 3. Abstract There is a high demand for multimedia applications like digital video recording and teleconferencing. This has led to the development of various video coding standards like MPEG-2 and H.264. The video coding layer of H.264 is superficially similar to that of MPEG-2; however, there are several differences in the details. In this project the MPEG-2 and H.264 video coding standards are compared with a concentration on the main profiles. H.264 gives a better compression performance than MPEG-2. However, MPEG-2 has already been widely used in the field of digital broadcasting, HDTV and DVD applications. This incompatibility problem between H.264 video source and the existing MPEG-2 decoders can be solved using transcoders. This project also discusses the criteria for efficient transcoding and a few transcoding architectures. 3
  • 4. 1 Introduction Development of the international video coding standards such as MPEG-2 [7] [11][17][16] boosted a diverse range of multimedia applications, including digital video recording and teleconferencing. As a result of the growing demand for better compression performance, advanced standards such as H.264 [1][2][6][9] [18] were developed by the ITU-T-ISO/IEC Joint Video Team (JVT) in 2003. The overall scheme of the video coding layer (VCL) of H.264 is superficially similar to the encoding scheme of MPEG-2. However, there are significant differences in the details. In this project the MPEG-2 and H.264 video coding standards are compared, i.e. the similarities and differences are studied, with a concentration on the main profiles. H.264 can support various applications such as video broadcasting, video streaming and video conferencing over fixed and wireless networks and over different transport protocols. However, MPEG-2 has already been widely used in the field of digital broadcasting, HDTV and DVD applications. The incompatibility problem between H.264 video source and the existing MPEG-2 decoders can be solved by using transcoders. In this project, the criteria for transcoding and a few transcoding architectures are discussed. The report has been structured in the following manner: Chapter 1 is an introduction to the topic and explains the scope of the project. Chapter 2 explains the various aspects of the MPEG-2 video coding standard while Chapter 3 covers the same for H.264 video coding standard. Chapter 4 shows a comparison between the two standards. In Chapter 5, the topic of MPEG-2 to H.264 transcoding is covered in greater detail. 4
  • 5. 2 MPEG-2 MPEG-2 is widely used as the format of digital television signals that are broadcast by terrestrial (over-the-air), cable, and direct broadcast satellite TV systems. It also specifies the format of movies and other programs that are distributed on DVD and similar disks. As such, TV stations, TV receivers, DVD players, and other equipment are often designed to this standard. MPEG-2 was the second of several standards developed by the Moving Pictures Expert Group (MPEG) and is an international standard (ISO/IEC 13818). [16] The video section, part 2 of MPEG-2, is similar to the previous MPEG-1 standard, but also provides support for interlaced video; the format used by analog broadcast TV systems. MPEG-2 video is not optimized for low bit-rates, especially less than 1 Mbit/s at standard definition resolutions. However, it outperforms MPEG-1 at 3 Mbit/s and above. MPEG-2 is directed at broadcast formats at higher data rates of 4 Mbps (DVD) and 19 Mbps (HDTV). All standards-compliant MPEG-2 video decoders are fully capable of playing back MPEG-1 video streams. MPEG-2/video is formally known as ISO/IEC 13818-2 and as ITU-T Rec. H.262 [21]. 2.1 MPEG-2 Profiles and Levels MPEG-2 video supports wide range of applications from mobile to high quality HD editing. For many applications, it is unrealistic and too expensive to support the entire standard. To allow such applications to support only subsets of it, the standard defines profile and level. [21] Description MPEG-2 video is a family of systems, each having an arranged degree of commonality and compatibility. It allows four source formats, or ‘Levels’, to be coded, ranging from Limited Definition (about today’s VCR4 quality), to full HDTV5 – each with a range of bit rates [22]. The level defines the subset of quantitative capabilities such as maximum bit rate, maximum frame size, etc [16]. In addition to this flexibility in source formats, MPEG-2 allows different ‘Profiles’. Each profile offers a collection of compression tools that together make up the coding system. A different profile means that a different set of compression tools is available. [22] 5
  • 6. MPEG-2 Profiles 2.1.1.1 Simple Profile This profile has the fewest tools. The Simple profile offers the basic toolkit for MPEG-2 encoding. This is intra and predicted frame encoding and decoding with a color sub sampling of YUV 4:2:0. 2.1.1.2 Main Profile This profile has all the tools of the Simple Profile plus one more (termed bi- directional prediction). It gives better (maximum) quality for the same bit-rate than the Simple Profile. A Main Profile decoder decodes both Main and Simple Profile- encoded pictures. This backward compatibility pattern applies to the succession of profiles. A refinement of the Main Profile, sometimes unofficially known as Main Profile Professional Level or MPEG 422, allows line-sequential color difference signals (4:2:2) to be used, but not the scaleable tools of the higher Profiles. 2.1.1.3 SNR Scalable Profile and Spatially Scalable Profile The two Profiles after the Main Profile are, successively, the SNR Scaleable Profile and the Spatially Scaleable Profile. These add tools which allow the coded video data to be partitioned into a base layer and one or more ‘top-up’ signals. The top-up signals can either improve the noise (SNR Scalability) or the resolution (Spatial Scalability). These Scaleable systems may have interesting uses. The lowest layer can be coded in a more robust way, and thus provide a means to broadcast to a wider area, or provide a service for more difficult reception conditions. Nevertheless there will be a premium to be paid for their use in receiver complexity. Owing to the added complexity, none of the Scaleable Profiles is supported by digital video broadcasting (DVB). The inputs to the system are YUV component radio. However, the first four profiles code the color- difference signals line-sequentially. 2.1.1.4 High Profile It includes all the previous tools plus the ability to code line-simultaneous colour- difference signals. In effect, the High Profile is a ‘super system’, designed for the most sophisticated applications, where there is no constraint on bit rate. Table 1 is a tabulated form of the properties of the various MPEG-2 profiles. 6
  • 7. Table 1. MPEG-2 Profiles MPEG-2 Profiles[16] Picture Chroma Abbr. Name Coding Aspect Ratios Scalable modes Format Types square pixels, SP Simple profile I, P 4:2:0 none 4:3, or 16:9 square pixels, MP Main profile I, P, B 4:2:0 none 4:3, or 16:9 SNR (signal-to- SNR Scalable square pixels, SNR I, P, B 4:2:0 noise ratio) profile 4:3, or 16:9 scalable Spatially Spatia square pixels, SNR- or spatial- Scalable I, P, B 4:2:0 l 4:3, or 16:9 scalable profile 4:2:2 or square pixels, SNR- or spatial- HP High profile I, P, B 4:2:0 4:3, or 16:9 scalable MPEG-2 Levels 7
  • 8. 2.1.1.5 Description of Levels A level is the definition for the MPEG standard for physical parameters such as bit rates, picture sizes and resolutions. There are four levels specified by MPEG2: High level, High 1440, Main level, and Low level. MPEG-2 Video Main Profile and Main level has sampling limits at ITU-R 601 parameters (PAL and NTSC). Profiles limit syntax (i.e. algorithms) whereas Levels limit encoding parameters (sample rates, frame dimensions, coded bitrates, buffer size etc.). Together, Video Main Profile and Main Level (abbreviated as MP@ML) keep complexity within current technical limits, yet still meet the needs of the majority of applications. MP@ML is the most widely accepted combination for most cable and satellite systems; however different combinations are possible to suit other applications. [4] Table 2 shows a comparison between the four MPEG-2 levels on the basis of the frame size (PAL/NTSC) and the maximum bit rate for each. Table 2. MPEG-2 Levels [22] 2.2 MPEG-2 Encoder 8
  • 9. Figure 1. MPEG 2 encoder [10] The various blocks of the MPEG-2 encoder are explained below: DCT The MPEG-2 encoder uses 8x8 2-D DCT. In the case of intra frames, it is applied to 8x8 blocks of pels and in the case of inter frames it is applied to 8x8 blocks of the residual (motion compensated prediction errors). Since DCT is more efficient in compressing correlated sources, intra pictures DCT compress more efficiently than inter pictures. 2.2.1 Quantizer The DCT coefficients obtained above are then quantized by using a default or modified matrix. User defined matrices may be downloaded and can occur in the sequence header or in the quant matrix extension header. The quantizer step sizes for DC coefficients of the luminance and chrominance components are 8, 4, 2 and 1 according to the intra DC precision of 8, 9, 10 and 11 bits respectively. 2.2.2 Motion estimation and compensation Motion estimation and compensation: In the motion estimation process, motion vectors for predicted and interpolated pictures are coded differentially between macroblocks. The two motion vector components, the horizontal component first 9
  • 10. and then the vertical component are coded independently. The motion compensation process forms prediction from previously decoded pictures using the motion vectors that are of integer and half-pel resolutions. 2.2.3 Coding decisions There are four different coding modes in MPEG-2. These modes are chosen based on whether the encoder encodes a frame picture as a frame or two fields or in the case of interlaced pictures it can chose to encode it as two fields or use 16x8 motion compensation. 2.2.4 Scanning and VLC The quantized transform coefficients are scanned and converted to a one dimensional array. Two scanning methods are available: a. Zigzag scan (Figure 2(a)): For progressive (non-interlaced) mode processing b. Alternate scan (Figure 2(b)): For interlaced format video. (a) (b) Figure 2 (a) Zig Zag scan pattern (4x4) [4] (b) Alternate scan pattern (4x4) 10
  • 11. (a) (b) Figure 3. Scan matrices in MPEG-2 [20] (8x8) (a) Zigzag scan (b) Alternate scan The list of values produced by scanning is then entropy coded using a variable length code (VLC). 2.3 MPEG-2 Decoder 11
  • 12. Figure 4. MPEG 2 Decoder [7] At the decoder side, the quantized DCT coefficients are reconstructed and inverse transformed to produce the prediction error. This predicted error is then added to the motion compensated prediction generated from previously decoded picture to produce the reconstructed output. The various parts of the MPEG-2 decoder are: 2.3.1 Variable length decoding This process involves the use of a table defined for decoding intra DC coefficients and three tables one each for non intra DC coefficients, intra AC coefficients and non intra AC coefficients. The decoded values basically infer one of three courses of action: end of block, normal coefficients and escape coding. 2.3.2 Inverse scan The output of the variable length decoding stage is one dimensional and of length 64. Inverse scan process converts this one dimensional data into a two dimensional array of coefficients according to a predefined scan matrix. 2.3.3 Inverse quantization At this stage the two dimensional DCT coefficients are inverse quantized to produce the reconstructed DCT coefficients. This process involves the rescaling of the coefficients by essentially multiplying them by the quantizer step size. The quantizer step size can be modified by using either a weighing matrix or a scale 12
  • 13. factor. After performing inversion quantization, saturation and mismatch control operations are performed. 2.3.4 Inverse DCT Once the reconstructed DCT coefficients are obtained, a 2D 8x8 inverse DCT is applied to obtain the inverse transformed values. These values are then saturated to keep them in the range of [-256:+255]. 2.3.5 Motion Compensation During this stage, predictions from previously decoded pictures are combined with the inverse DCT transformed coefficient data to get the final decoded output. 3 H.264 13
  • 14. H.264/AVC [1][9] was developed by the JVT (Joint Video Team) to achieve MPEG-2 [7] quality compression at almost half the bit rate. H.264/AVC provides significant coding efficiency, simple syntax specifications, and seamless integration of video coding into all current protocols and multiplex architectures. H.264 supports various applications such as video broadcasting, video streaming, and video conferencing over fixed and wireless networks and over different transport protocols. [4] H.264 video coding standard has the same basic functional elements as previous standards (MPEG-1, MPEG-2, MPEG-4 part 2, H.261, H.263) [23], i.e., transform for reduction of spatial correlation, quantization for bitrate control, motion compensated prediction for reduction of temporal correlation, entropy encoding for reduction of statistical correlation. However, in order to fulfill better coding performance, the important changes in H.264 occur in the details of each functional element by including intra-picture prediction, a new 4x4 integer transform, multiple reference pictures, variable block sizes and a quarter pel precision for motion compensation, a deblocking filter, and improved entropy coding. [1] 3.1 H.264 Profiles Each Profile specifies a subset of entire bitstream of syntax and limits that shall be supported by all decoders conforming to that Profile. There are three Profiles in the first version: Baseline, Main, and Extended. Baseline Profile is to be applicable to real-time conversational services such as video conferencing and videophone. Main Profile is designed for digital storage media and television broadcasting. Extended Profile is aimed at multimedia services over Internet. Also there are four High Profiles defined in the fidelity range extensions[19] for applications such as content-contribution, content-distribution, and studio editing and post-processing : High, High 10, High 4:2:2, and High 4:4:4. High Profile is to support the 8-bit video with 4:2:0 sampling for applications using high resolution. High 10 Profile is to support the 4:2:0 sampling with up to 10 bits of representation accuracy per sample. High 4:2:2 Profile is to support up to 4:2:2 chroma sampling and up to 10 bits per sample. High 4:4:4 Profile is to support up to 4:4:4 chroma sampling, up to 12 bits per sample, and integer residual color transform for coding RGB signal. The Profiles have both the common coding parts and as well specific coding parts as shown in Figure 5. [1] 3.1.1 Common Parts of All Profiles 14
  • 15. 3.1.1.1 I slice (Intra-coded slice) This slice is coded by using prediction only from decoded samples within the same slice. 3.1.1.2 P slice (Predictive-coded slice) This slice (Figure 6) is coded by using inter prediction from previously-decoded reference pictures, using at most one motion vector and reference index to predict the sample values of each block. 3.1.1.3 CAVLC (Context-based Adaptive Variable Length Coding) This is used for entropy coding. After transform and quantization, the probability that the level of coefficients is zero or +/-1 is very high. CAVLC handles the zero and +/-1 coefficients as the different manner with the levels of coefficients. The total numbers of zero and +/-1 are coded. For other coefficients, their levels are coded. 3.1.2 Baseline Profile 3.1.2.1 Flexible macroblock order Macroblocks may not necessarily be in the raster scan order. The map assigns macroblocks to a slice group. 3.1.2.2 Arbitrary slice order The macroblock address of the first macroblock of a slice of a picture may be smaller than the macroblock address of the first macroblock of some other preceding slice of the same coded picture. 3.1.2.3 Redundant slice This slice belongs to the redundant coded data obtained by same or different coding rate, in comparison with previous coded data of same slice. 15
  • 16. Figure 5. The specific coding parts of the Profiles in H.264 [1]. 3.1.3 Main Profile 3.1.3.1 B slice (Bi-directionally predictive-coded slice) This slice (Figure 6) is coded by using inter prediction from previously-decoded reference pictures, using at most two motion vectors and reference indices to predict the sample values of each block. 3.1.3.2 Weighted prediction This is a scaling operation performed by applying a weighting factor to the samples of motion-compensated prediction data in P or B slice. A prediction signal p for B slice is obtained using different weights from two reference signals, r1 and r2. Equation 1 [1]: p = w1 × r1 + w2 × r2 where w1 and w2 are weights. 16
  • 17. 3.1.3.3 CABAC (Context-based Adaptive Binary Arithmetic Coding) This is used for entropy coding. It utilizes arithmetic coding in order to achieve good compression. Figure 6. Illustration of temporal prediction (B and P slices) 3.1.4 Extended Profile This profile includes all parts of Baseline Profile: flexible macroblock order, arbitrary slice order, and redundant slice. The other features of this profile are: 3.1.4.1 SP slice The specially coded slice for efficient switching between video streams, similar to coding of a P slice. 3.1.4.2 SI slice The switched slice, similar to coding of an I slice. 3.1.4.3 Data partition The coded data is placed in separate data partitions, each partition can be placed in different layer unit. 17
  • 18. 3.1.4.4 B slice H.264 generalizes the concept of bidirectional prediction and supports not only forward/backward prediction pairs but also forward/forward and backward/backward pairs. 3.1.4.5 Weighted prediction All existing standards consider equal weights for reference pictures, i.e. a prediction signal is obtained by averaging using equal weights of reference signals. But gradual transitions from scene to scene need different weights. H.264 uses weighted prediction method for a macroblock of P slice or B slice. 3.1.5 High Profiles High profiles include all parts of the Main Profile: B slice, weighted prediction, CABAC. The salient features of this profile are: 3.1.5.1 Adaptive transform block size H.264 uses an adaptive transform block size, 4 x 4 and 8 x 8 (High Profiles only), whereas previous video coding standards used the 8 x 8 DCT. The smaller block size leads to a significant reduction in ringing artifacts. Also, the 4 x 4 transform has the additional benefit of removing the need for multiplications. [1] 3.1.5.2 Quantization scaling matrices Different scaling according to specific frequency associated with the transform coefficients in the quantization process to optimize the subjective quality. The High Profiles support the perceptual-based quantization scaling matrices similar to those used in MPEG-2. The encoder can specify a matrix for scaling factor according to the specific frequency associated with the transform coefficient for use in inverse quantization scaling by the decoder. This allows optimization of the subjective quality according to the sensitivity of the human visual system which is less sensitive to the coded error in high frequency transform coefficients. Table 3 shows a comparison between the baseline, extended, main and high profiles of H.264. 18
  • 19. Table 3. Comparison chart for the various profiles of H.264 [18] Baseline Extended Main High I and P Slices Yes Yes Yes Yes B Slices No Yes Yes Yes SI and SP Slices No Yes No No Multiple Reference Frames Yes Yes Yes Yes In-Loop Deblocking Filter Yes Yes Yes Yes CAVLC Entropy Coding Yes Yes Yes Yes CABAC Entropy Coding No No Yes Yes Flexible Macroblock Ordering (FMO) Yes Yes No No Arbitrary Slice Ordering (ASO) Yes Yes No No Redundant Slices (RS) Yes Yes No No 19
  • 20. 3.2 H.264 Encoder Figure 7. H.264 encoder [9] The encoder blocks are explained below: 3.2.1.1 4x4 Integer transform The H.264 employs a 4x4 integer DCT as compared to 8x8 DCT adopted by the previous standards. The smaller block size leads to a significant reduction in ringing artifacts. Also, the 4 x 4 transform has the additional benefit of removing the need for multiplications. 3.2.1.2 Quantization and scan The H.264 standard specifies the mathematical formulae of the quantization process. The scale factor for each element in each sub block varies as a function of the quantization parameter associated with the macroblock and as a function 20
  • 21. of the position of the element within the sub block. The rate control algorithm controls the value of the quantization parameter. Two types of scan pattern are used for 4x4 blocks – one for frame coded macroblocks and one for field coded macroblocks. 3.2.1.3 Context-based adaptive variable length coding (CAVLC) and Context- based adaptive binary arithmetic coding (CABAC) entropy coding H.264 uses different variable length coding methods in order to match a symbol to a code based on the context characteristics. They are context-based adaptive variable length coding (CAVLC) and context-based adaptive binary arithmetic coding (CABAC). All syntax elements except for the residual data are encoded by the Exp-Golomb codes. In order to read the residual data (quantized transform coefficients), zig-zag scan (interlaced) or alternate scan (non-interlaced or field) is used. For coding the residual data, a more sophistical method called CAVLC is employed. Also, CABAC is employed in Main and High profiles, CABAC has more coding efficiency but higher complexity compared to CAVLC. 3.2.1.4 Deblocking filter H.264 employs a deblocking filter to reduce the blocking artifacts in the block boundaries and stops the propagation of accumulated coded noise. The filter is applied after the inverse transform (before reconstructing and storing the macroblock for future predictions) and in the decoder (before reconstructing and displaying the macroblocks). The deblocking filter is applied across the edges of the macroblocks and the sub-blocks. The filtered image is used in motion compensated prediction of future frames and helps achieve more compression. Figure 8. Diagram depicting how the loop filter works on the edges of the blocks and sub-blocks [4] 21
  • 22. 3.2.1.5 Intra prediction During intra prediction, the encoder derives a predicted block based on its prediction with previously decoded samples. The predicted block is then subtracted from the current block and then encoded. There are a total of nine prediction modes (Figure 9) for each 4x4 luma block, four prediction modes for each 16x16 luma block and four modes for each chroma block. Figure 9. Intra prediction 4x4 [31] 3.2.1.6 Inter prediction Inter prediction is performed on the basis of temporal correlation and consists of motion estimation and motion compensation. As compared to the previous standards, H.264 supports a large number of block sizes from 16x16 to 4x4. Moreover H.264 supports motion vector accuracy of one-quarter of the luma sample. 3.2.1.7 Reference pictures Unlike the previous standards that just use the immediate previous I or P picture for inter prediction, H.264 has the ability to use more than one previous reference picture for inter prediction thus enabling the encoder to search for the best match for the current picture from a wider set of reference pictures than just the previously encoded one. 22
  • 23. 3.3 H.264 Decoder Figure 10 shows the block diagram of a general H.264/MPEG-4 AVC decoder. Figure 10. H.264 decoder [7] It includes all the control information such as picture or slice type, macroblock types and subtypes, reference frames index, motion vectors, loop filter control, quantizer step size etc, as well as coded data comprising of quantized transform coefficients. The decoder of Figure 10 works similar to the local decoder at the encoder; a simplified description is as follows. After entropy (CABAC or CAVLC) decoding, the transform coefficients are inverse scanned and inverse quantized prior to being inverse transformed. To the resulting 4_4 blocks of residual signal, an appropriate prediction signal (intra or motion compensated inter) is added depending on the macroblock type mbtyp (and submbtype) mode, the reference frame, the motion vector/s, and decoded pictures store, or in intra mode. The reconstructed video frames undergo deblock filtering prior to being stored for future use for prediction. The frames at the output of deblocking filter may need to undergo reordering prior to display. [2] 23
  • 24. 4 Comparison between MPEG-2 and H.264 4.1 Key features of MPEG-2 video The MPEG-2 coding standard has been designed to efficiently support both interlaced and progressive video coding and produce high quality standard definition video at about 4 Mbps. The MPEG-2 video standard uses a block- based hybrid transform coding algorithm that employs transform coding of the motion-compensated prediction error. While motion compensation exploits temporal redundancies, the DCT transform exploits the spatial redundancies. The asymmetric encoder-decoder complexity allows for a simpler decoder while maintaining high quality and efficiency through a more complex encoder. [3] 4.2 Key features of H.264 video The H.264 video coding standard has been developed recently through the joint work of the ITU’s video coding experts group (VCEG) and ISO moving pictures experts group (MPEG). The H.264 video coding standard is flexible and offers a number of tools to support a range of applications with very low as well as very high bitrate requirements. 4.3 Comparison: Similarities and Differences between MPEG-2 video and H.264 video In this section, the MPEG-2 and H.264 video coding standards are compared with respect to their various aspects such as bit rate, block size, macroblock size, intra prediction, motion estimation blocks, quantization, motion vector prediction, intra prediction amongst various other. Table 4 tabulates these comparisons systematically. They are also further elaborated in the sub sections below. 4.3.1 Increased efficiency Compared with MPEG-2 video, the H.264 video format gives perceptually equivalent video at 1/3 to1/2 of the MPEG-2 bit rates. Some extensions--known as "Fidelity Range Extensions" facilitate higher-fidelity video coding by supporting higher bit-depths, including 10-bit and 12-bit encoding, and higher color resolution using the sampling structures YUV 4:2:2 and YUV 4:4:4. This naturally makes it attractive to video distributors, because it permits them to maximize the number of services that may be contained in a given amount of bandwidth [30]. 24
  • 25. The bit rate gains are not a result of any single feature but a combination of a number of encoding tools. These gains come with a significant increase in encoding and decoding complexity [27]. In spite of the increased complexity, the dramatic bandwidth savings encourages TV broadcasters to adopt the new technology as they can use the bandwidth savings to provide new channels or new data and interactive services. With the coding gains of H.264, full length HDTV resolution movies can be stored on DVDs. Further more, the fact that the same video coding format can be used to broadcast TV as well as for internet streaming. 4.3.2 Coding flexibility ISO14496-10/H.264, like previous MPEG standards, does not define a specific encoder and decoder. Instead, it defines the syntax of an encoded bitstream and describes the method of decoding that bitstream. The implementation is left to the developer. [31] The H.264 video uses the same hybrid coding approach that is used in the other MPEG video standards: motion compensated transform coding. The H.264 employs a hybrid coding approach similar to that of MPEG-2 but differs significantly from MPEG-2 in terms of the actual coding tools used. The main differences are: use of an integer transform with energy compaction properties similar to that of the DCT instead of the DCT, an in-loop deblocking filter (DF) to reduce block artifacts, and intra frame prediction (IFP). The coder control operation is responsible for functions such as reference frame management, coding mode selection, and managing the encoding parameter set. Besides, the H.264 standard introduces several other new coding tools that improve coding efficiency. Multiple reference picture motion compensation uses previously encoded pictures more flexibly than does MPEG-2. In MPEG-2, a P-frame can use only a single previously coded frame to predict the motion compensation values for an incoming picture, while a B-frame can use only the immediately previous P- or I- frame and the immediately subsequent P- or I-frame. H.264 permits the use of up to 32 previously coded pictures, and it supports more flexibility in the selection of motion compensation block sizes and shapes, down to the use of a luma compensation block as small as 4-by-4 pixels. H.264 also supports quarter-sample motion compensation vector accuracy, as opposed to MPEG-2's half-sample accuracy. 25
  • 26. These refinements permit more precise segmentation of moving areas within the image, and more precise description of movement. Further, in H.264, the motion- compensated prediction signal may be weighted and offset by the encoder, facilitating significantly improved performance in fades (fades can be problematic for MPEG-2). 4.3.3 Deblocking filter Block-based coding can generate blocking artifacts in the decoded pictures. In H.264, a de-blocking filter is brought within the motion-compensated prediction loop, so that this filtering may be used to predict an expanded number of pictures (Figure 8). Switching slices, which permit a decoder to jump between bitstreams in order to smoothly change bit-rates or do stunt modes without requiring all streams to send an I-frame at the switch point (making the decoder's job easier at switch points), have been incorporated. 26
  • 27. Table 4. Comparison between MPEG-2 and H.264 Algorithm Characteristic MPEG-2 H.264 General Motion Same basic structure as compensated MPEG predictive, residual transformed, entropy coded Block size 8x8 16x16, 8x16, 16x8, 8x8, 4x8, 8x4, 4x4 Macroblock size 16x16 (frame mode) 16x16 16x8 (field mode) Intra Prediction None Multi-direction, Multi- pattern Scalar quantization Scalar quantization with Quantization with step size of step size of increase at the constant increment rate of 12.5% Entropy coding VLC CAVLC, CABAC Weighted prediction No Yes Reference picture One picture Multiple pictures Motion Estimation Blocks 16x16 16x16, 8x16, 16x8, 8x8, 4x8, 8x4, 4x4 Motion vector prediction Simple Uses median and segmented Entropy Coding Multiple VLC Tables Arithmetic Coding and adaptive VLC Tables Frame Distance for +/- 1 Unlimited Prediction forward/backward Fractional Motion 1/2 Pixel 1/4 Pixel Estimation Deblocking Filter None Dynamic edge filters Scalable coding support [2] Yes, layered picture With some support on spatial, SNR, temporal and SNR temporal scalability scalability Bit rates with same quality 12 -20 Mbps 7 – 8 Mbps HD video with resolution (1920 x 1080) Transmission rate 2 – 15 Mbps 64 kbps – 150 Mbps 27
  • 28. 4.3.4 Performance comparison between MPEG-2 and H.264 using standard test streams – Simulation results Test streams (foreman, news and carphone [26]) were encoded using the open- source MPEG-2 codec [25] and the H.264 codec [24]. The results were compared against each other for parameters like the signal to noise ratio (SNR), GOP and compression ratio. CIF files were used for the “Foreman” and the “News” clips whereas QCIF was used for the “Carphone” clip. The bit rate for H.264 encoding was taken as the standard one used by the codes. The bit rate for MPEG-2 encoding was adjusted on the basis of the bit rate of the H.264 encoding process. This helped to compare the two standards on a common plane. While the aim of this project is to compare the Main profiles of MPEG-2 and H.264, simulations were run for the Simple/Baseline profiles too. This was done in order to prove quantitatively that encoding using the Main profile for both MPEG-2 and H.264 gives a better compression ratio and better quality video than the Simple profile. Tables 5, 6 and 7 tabulate the results obtained after running the simulations. Figures 11, 12 and 13 show screen shots of the encoded videos (only for the Main profiles). Section 4.3.5 explains the conclusions drawn on the basis of the results obtained from simulations. 4.3.5 Conclusion From the tables below, the following is concluded: • For the same bit rate and video resolution, the PSNR (dB) values are greater for H.264 encoded videos than for the MPEG-2 encoded videos indicating better video quality. This can be verified from the screen shots. • The compression ratio for H.264 encoded video is also better than that for MPEG-2 encoded video inspite of better quality video. Compression ratio = original file size/compressed file size • The video quality for H.264 video is better than for MPEG-2 video for the Simple/Baseline profiles as well. Therefore, it can be concluded that H.264 video coding standard gives better compression and better video quality as compared to MPEG-2. 28
  • 29. Table 5. Performance comparison between MPEG-2 and H.264 main/simple profiles - Foreman Parameter Main Profile Simple profile MPEG-2 H.264 MPEG-2 H.264 Input video 352 x 288 352 x 288 352 x 288 352 x 288 resolution (CIF) (CIF) (CIF) (CIF) fps 30 30 30 30 # frames 90 90 90 90 encoded GOP I-P-B-B-P- I-B-B-P-B- I-P-P-P I-P-P-P B-B B-P PSNR (Y) (dB) 30.42 37.03 32.1 37.4 PSNR (U) (dB) 39.1 41.08 39.01 41 PSNR (V) (dB) 39.6 43.81 40.2 43.6 Bit rate 481.00 481.06 561.0 561.01 (kbits/second) Compression 74:1 78:1 65:1 65:1 ratio (a) (b) Figure 11. (a) Foreman – MPEG-2 (main profile) encoding (b) Foreman – H.264 (main profile) encoding 29
  • 30. Table 6. Performance comparison between MPEG-2 and H.264 main profiles – News Parameter Main Profile Simple Profile MPEG-2 H.264 MPEG-2 H.264 Input video 352 x 288 352 x 352 x 352 x 288 resolution (CIF) 288 (CIF) 288 (CIF) (CIF) fps 30 30 30 30 # frames 90 90 90 90 encoded GOP I-P-B-B-P- I-B-B-P- I-P-P-P I-P-P-P B-B B-B-P PSNR (Y) (dB) 37.02 39.1 34.01 39 PSNR (U) (dB) 37.02 41.0 39.1 41 PSNR (V) (dB) 39.02 42.0 39.7 42 Bit rate 376.00 376.00 380.8 380.8 (kbits/second) Compression 94:1 99.7:1 95.5:1 95.5:1 ratio (a) (b) Figure 12.(a) News – MPEG-2 (main profile) encoding (b) News – H.264 (main profile) encoding 30
  • 31. Table 7. Performance comparison between MPEG-2 and H.264 main profiles – Carphone Parameter Main Profile Simple Profile MPEG-2 H.264 MPEG-2 H.264 Input video 176 x 144 176 x 176 x 176 x 144 resolution (QCIF) 144 144 (QCIF) (QCIF) (QCIF) fps 30 30 30 30 # frames 90 90 90 90 encoded GOP I-P-B-B-P- I-B-B-P- I-P-P-P I-P-P-P B-B B-B-P PSNR (Y) (dB) 30.46 37.6 31.6 38 PSNR (U) (dB) 36.36 40.9 39 40.8 PSNR (V) (dB) 36.5 41.5 39 41.3 Bit rate 128 127.6 147.3 147.3 (kbits/second) Compression 69.6:1 72.6:1 60.8:1 61.9:1 ratio (a) (b) Figure 13. (a) Carphone – MPEG-2 (main profile) encoding (b) Carphone – H.264 (main profile) encoding 31
  • 32. 5 Transcoding methods 5.1 Introduction to transcoding In this fast growing world of multimedia and telecommunications there is a great demand for efficient usage of the available bandwidth. With the growth of technology there is an increase in the number of networks, types of devices and different content representation formats as a result of which interoperability between different systems and networks is gaining in importance. Transcoding of video content is one such effort in this direction. Besides these, a transcoder can also be used to insert new information for example company’s logos, watermarks as well as error resilience features into a compressed video stream. Transcoding techniques are also useful in supporting VCR trick modes such as fast-forward, reverse play etc. for on-demand applications. [4] Technically, transcoding is the coding and recoding of digital content from one compressed format to another to enable transmission over different media and playback over various devices [29]. Having said this, now arises the question of why the need for H264/AVC to MPEG-2 transcoding [14] [15]? In order to provide better compression of video as compared to previous standards, H.264/AVC was recently developed by the JVT (Joint Video Team). This new standard fulfills significant coding efficiency, simple syntax specifications and seamless integration of video coding into all current protocols and multiplex architectures. The H.264 specification represents a significant advancement in the field of video coding technology by providing MPEG-2 comparable video quality at an average of half the required bandwidth. Since widespread use of H.264 is anticipated, many legacy systems including all Digital TVs and home receivers use MPEG-2. This leads to the need for an efficient architecture that significantly employs the lower cost of H.264 video and does not require a significant investment in additional video coding hardware. Figure 14. H.264 to MPEG-2 transcoder applications [12] 32
  • 33. 5.2 How is transcoding done – the basic process The simplest approach to transcoding is to completely decode the MPEG-2 bit stream and then re-encode it with an H.264 encoder. The decode operation can be performed either externally or as a part of the H.264 encoder. System issues, such as handling SCTE-35 digital program insertion (DPI) messages, will require that the decode and the encode operations be tightly coupled. The quality of transcoding with this simple approach will not be high. Figure 15 shows a comparison between direct encoding and transcoding. The figure shows the PSNR (a measure of mean square error between the input and decoded output) values computed at different bit rates. The PSNR numbers are obtained by averaging the results over 18 different sequences of varying content type and complexities. The top plot shows the performance of direct encoding using an H.264 encoder. The bottom plot shows the performance of transcoding where the video is originally coded with MPEG-2 at 4Mb/s, decoded and then re- encoded with the same encoder used for direct encoding. Transcoding can result in up to 20 percent loss in compression efficiency. Similar to the previous approach, the incoming MPEG-2 stream is decoded and then re-encoded using an H.264 encoder. However, here the relevant information available from the MPEG-2 bit stream is reused. Figure 15. Performance comparison between direct encoding and transcoding [32] 33
  • 34. 5.3 Criteria for transcoding Transcoding can be of various types [14]. Some of them are bit rate transcoding to facilitate more efficient transport of video, spatial and temporal resolution reduction transcoding for use in mobile devices with limited display and processing power and error-resilience transcoding in order to achieve higher resilience of the original bit stream to transmission errors. To achieve optimum results by transcoding, the following criteria have to be fulfilled: (i) The quality of the transcoded bitstream should be comparable to the one obtained by direct decoding and re-encoding of the output stream. (ii) The information contained in the input stream should be used as much as possible to avoid multigenerational deterioration. (iii) The process should be cost efficient, low in complexity and achieve the highest quality possible. 5.4 Transcoding of H.264 to MPEG-2 In order to provide better compression of video as compared to previous standards, H.264/AVC video coding standard was recently developed by the JVT (Joint Video Team) consisting of experts from VCEG (Video Coding Experts Group) and MPEG. This new standard fulfills significant coding efficiency, simple syntax specifications, and seamless integration of video coding into all current protocols and multiplex architectures. Thus H.264 can support various applications such as video broadcasting, video streaming and video conferencing over fixed and wireless networks and over different transport protocols. However MPEG-2 has already been widely used in the field of digital broadcasting, HDTV and DVD applications. Hence transcoding is a feasible method to solve the incompatibility problem between H.264 video source and the existing MPEG-2 decoders. An H.264/AVC to MPEG-2 transcoder is designed to transcode the H.264 video stream to MPEG-2 format so as to be used by the MPEG-2 end equipment. It is better to transmit H.264 bitstreams on public networks to save on the much needed bandwidth and then transcode them into MPEG-2 bitstreams for local MPEG-2 equipment like a set-top box. 34
  • 35. 5.5 Transcoding architectures This section describes the various transcoding architectures [15]: 5.5.1 Open loop transcoding: Open loop transcoders include selective transmission where the high frequency DCT coefficients are discarded and requantization. They are computationally efficient, since they operate directly on the DCT coefficients. However they suffer from the drift problem. Drift error occurs due to rounding, quantization loss and clipping functions. Figure 16. Open loop transcoding architecture [15] 5.5.2 Cascaded pixel domain transcoding architecture: This is a drift free architecture. It is a concatenation of a simplified decoder and encoder as shown in Figure 17. In this architecture, instead of performing the full motion estimation, the encoder reuses the motion vectors along with other information extracted from the input video bitstream thus reducing the complexity. Figure 17. Cascaded pixel domain transcoding architecture [15]. 35
  • 36. Simplified DCT domain transcoding (SDDT): This architecture is based on the assumption that DCT, IDCT and motion compensation are all linear operations. Since in this architecture, the motion compensation is performed in the DCT domain it is a computationally intensive operation. For instance, as shown in Figure 19, the goal is to compute the target block B from the four overlapping blocks B1, B2, B3 and B4. Figure 18. Simplified DCT domain transcoding architecture [15]. Figure 19. DCT- Motion compensation [15]. SDDT eliminates the DCT/IDCT and reduces the frame numbers by half as a result of which it requires less computation and memory as compared to CPDT. However the linearity assumptions are not strictly true since there are clipping functions performed in the video encoder/decoder and rounding operations performed in the interpolation for fractional pixel MC. These failed assumptions may cause drift in the transcoded video. 36
  • 37. 5.5.3 Cascaded DCT domain transcoding (CDDT) The cascaded DCT-domain transcoder can be used for spatial and temporal resolution downscaling and other coding parameter changes. Compared to SDDT, greater flexibility is achieved using additional DCT-motion compensation and frame memory resulting in higher cost and complexity. This architecture is adopted for downscaling operations where the encoder side DCT-MC and memory will not cost much. Figure 20. Cascaded DCT domain transcoding architecture [15] 5.6 Conclusions The selection of appropriate transcoding architecture depends upon the application for which it is intended. There is generally a tradeoff between the accuracy and the complexity and cost of the architecture. For example, the simplest open loop architecture is the easiest to implement but it suffers from the problem of drift whereas the cascaded DCT domain transcoding architecture overcomes this problem but it is a very complex and expensive architecture to implement. 37
  • 38. 6 References [1] Soon-kak Kwon, A. Tamhankar and K.R. Rao, “Overview of H.264 / MPEG-4 Part 10 (pp.186-216)”, Special issue on “ Emerging H.264/AVC video coding standard”, J. Visual Communication and Image Representation, vol. 17, pp.183-552, Apr. 2006. [2] A. Puri, H. Chen and A. Luthra, “Video Coding using the H.264/MPEG-4 AVC compression standard”, Signal Processing: Image Communication, vol.19, pp 793-849, Oct. 2004. [3] H. Kalva, “Issues in H.264/MPEG-2 Video Transcoding”, Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL. [4] S. Sharma, “Transcoding of H.264 bitstream to MPEG 2 bitstream”, Master’s Thesis, May 2006, EE Department, University of Texas at Arlington. [5] S. Sharma and K. R. Rao, “Transcoding of H.264 bitstream to MPEG-2 bitstream”, Proceedings of Asia-Pacific Conference on Communications 2007. [6] “Emerging H.264/AVC Video Coding Standard”, J. Visual Communication and Image Representation, vol.17, pp. 183-552, Apr. 2006. [7] P.N.Tudor, “Tutorial on MPEG-2 Video Compression”, IEE J Langham Thomson Prize, Electronics and Communication Engineering Journal, Dec. 1995. [8] “The MPEG-2 International Standard”, ISO/IEC, Reference number ISO/IEC 13818-2, 1996. [9] T. Wiegand et. al., “Overview of the H.264/AVC Video Coding Standard”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 13, Issue 7, pp. 560-576, July 2003. [10] J McVeigh et. al., “A software based real time MPEG-2 video encoder”, IEEE Trans. CSVT, Vol 10, pp 1178-1184, Oct. 2000. [11] O.J. Morris, “MPEG-2: Where did it come from and what is it?”, IEE Colloquium, pp. 1/1-1/5, 24 Jan. 1995. [12] P. Kunzelmann and H. Kalva, “Reduced Complexity H.264 to MPEG-2 Transcoder”, ICCE 2007, pp. 1-2, Jan. 2007. [13] N. Kamaci and Y. Altunbasak., “Performance Comparison of the Emerging H.264 Video Coding Standard with the existing standards”, ICME, Vol.1, pp. 345-348, July 2003. [14] J. Xin, C. Lin and M. Sun , “Digital Video Transcoding”, Proceedings of the IEEE, Vol. 93, Issue 1,pp 84-97, Jan. 2005. [15] A. Vetros, C. Christopoulos and H. Sun, “Video transcoding architectures and techniques: an overview”, IEEE Signal Processing Magazine, Vol. 20, Issue 2, pp 18-29, Mar. 2003. [16] “MPEG-2”, Wikipedia, Feb. 14, 2008. Available at <http://en.wikipedia.org/wiki/Mpeg_2> [17] “Introduction to MPEG 2 Video Compression” Available at <http://www.bretl.com/mpeghtml/codecdia1.HTM> 38
  • 39. [18] “H.264/MPEG-4 AVC”, Wikipedia, Feb. 18, 2008. Available at < http://en.wikipedia.org/wiki/H.264> [19] “H.264 A new Technology for Video Compression” – Available at < http://www.nuntius.com/technology3.html> [20] R. Periera, “Efficient transcoding of MPEG-2 to H.264”, Master’s thesis, Dec. 2005, EE Department, University of Texas at Arlington. [21] “H.262 : Information technology - Generic coding of moving pictures and associated audio information: Video”, International Telecommunication Union, 2000-02. Available at < http://www.itu.int/rec/T-REC-H.262> [22] “MPEG-2 White paper”, Pinnacle Technical Documentation, Version 0.5, Pinnacle Systems, Feb. 29, 2000. [23] M. Ghanbari, “Standard Codecs : Image Compression to Advanced Video Coding,” Hertz, UK: IEE, 2003. [24] H.264 software (version 13.2) obtained from: <http://iphome.hhi.de/suehring/tml/> [25] MPEG-2 software (version 12) obtained from: <http://www.mpeg.org/MPEG/video/mssg-free-mpeg-software.html> [26] Test streams (Foreman, News, Carphone) obtained from: <http://www-ee.uta.edu/dip/Courses/EE5356/ee_5356.htm> [27] Implementation Studies Group, “Main Results of the AVC Complexity analysis”, MPEG document N4964, ISO/IEC JTC11/SC29/WG11, July 2002. [28] A. Joch et al., “Performance comparison of video coding standards using Lagarangian coder control”, IEEE Int. Conf. of Image Processing, Vol. 2, pp. II-501 to II-504, Sept. 2002. [29] I. Sylvester, “Transcoding: The future of the video market depends on it”, IDC Executive Brief, Nov. 2006. Available at < http://www.ed- china.com/ARTICLES/2006NOV/2/2006NOV10_HA_AVC_HN_12.PDF> [30] R. Hoffner, “MPEG-4 Advanced Video Coding emerges”, Available at < http://www.tvtechnology.com/features/Tech- Corner/F_Hoffner-03.09.05.shtml> [31] S. Wagston and A. Susin, “IP core for an H.264 Decoder SoC”, 2007, Available at< www.us.design-reuse.com/news/?id=15746&print=yes> [32] S. Krishnamachari and K. Yang, “MPEG-2 to H.264 Transcoding: Why and How?”, Dec. 1, 2006, Available at < http://broadcastengineering.com/infrastructure/broadcasting_mpeg_transc oding_why/index1.html> 39