SlideShare uma empresa Scribd logo
1 de 13
Baixar para ler offline
Emerging H.264 Standard:
Overview and TMS320C64x Digital Media Platform Implementation


White Paper




UB Video Inc.

Suite 400, 1788 west 5th Avenue

Vancouver, British Columbia, Canada V6J 1P2

Tel: 604-737-2426; Fax: 604-737-1514

www.ubvideo.com




Copyright © 2002 UB Video Inc.                www.ubvideo.com   2-2002
H.264: Introduction
Digital video is being adopted in an increasingly proliferating array of applications ranging from video
telephony and videoconferencing to DVD and digital TV. The adoption of digital video in many applications
has been fuelled by the development of many video coding standards, which have emerged targeting
different application areas. These standards provide the means needed to achieve interoperability between
systems designed by different manufacturers for any given application, hence facilitating the growth of the
video market. The ITU-T1 is now one of two formal organizations that develop video coding standards - the
other being ISO/IEC JTC12. The ITU-T video coding standards are called recommendations, and they are
denoted with H.26x (e.g., H.261, H.262, H.263 and H.264). The ISO/IEC standards are denoted with MPEG-
x (e.g., MPEG-1, MPEG-2 and MPEG-4).

The ITU-T recommendations have been designed mostly for real-time video communication applications,
such as video conferencing and video telephony. On the other hand, the MPEG standards have been designed
mostly to address the needs of video storage (DVD), broadcast video (broadcast TV), and video streaming
(e.g., video over the Internet, video over DSL, video over wireless) applications. For the most part, the two
standardization committees have worked independently on the different standards. The only exception has
been the H.262/MPEG-2 standard, which was developed jointly by the two committees. Recently, the ITU-T
and the ISO/IEC JTC1 have agreed to join their efforts in the development of the emerging H.264 standard,
which was initiated by the ITU-T committee. H.264 (a.k.a. MPEG-4 Part 10 or MPEG-4 AVC) is being
adopted by the two committees because it represents a departure in terms of performance from all existing
video coding standards. Figure 1 summarizes the evolution of the ITU-T recommendations and the ISO/IEC
MPEG standards. Please see [1] and [2] for more information on the H.263 and MPEG-4 video coding
standards.

         ITU-T
                                  H.261                           H.263    H.263+      H.263++
       Standards



          Joint
      ITU-T / MPEG                                  H.262/MPEG-2                           H.264
        Standards



          MPEG
        Standards                               MPEG-1                    MPEG-4



                      1984    1986     1988    1990      1992    1994     1996      1998   2000    2002   2004
                      Figure 1. Progression of the ITU-T Recommendations and MPEG standards.
UB Video has been developing a complete video processing solution that is based on the H.264 video coding
standard and that is optimized for the Texas Instruments TMS320C64x digital media platform family of
DSPs. UBLive-264-C64, UB Video’s efficient H.264-based video processing solution, and the high-
performance TMS320C64x family of DSPs, represent a compelling integrated software/hardware solution to

1
 The International Telecommunications Union, Telecommunication Standardization Sector.
2
  International Standardization Organization and the International Electrotechnical Commission, Joint Technical Committee
number 1.


Copyright © 2002 UB Video Inc.                                  www.ubvideo.com                                  2-2002
most high-performance video applications. The purpose of this paper is to present an overview of the
emerging H.264 standard and the TMS320C64x digital media platform, as well as a discussion of UBLive-
264-C64. First, an overview of the H.264 standard and its benefits are presented, followed by a more detailed
description of the main H.264 video coding features. The Texas Instruments TMS320C64x family of DSPs is
then presented. UBLive-264-C64’s main features are then described, and its benefits for real-time video
applications are finally discussed.

H.264: Overview
The main objective behind the H.264 project was to develop a high-performance video coding standard by
adopting a “back to basics” approach where simple and straightforward design using well-known building
blocks is used. The ITU-T Video Coding Experts Group (VCEG) has initiated the work on the H.264
standard in 1997. Towards the end of 2001, and witnessing the superiority of video quality offered by H.264-
based software over that achieved by the (existing) most optimized MPEG-4 based software, ISO/IEC MPEG
joined ITU-T VCEG by forming a Joint Video Team (JVT) that took over the H.264 project of the ITU-T.
The JVT objective was to create a single video coding standard that would simultaneously result in a new
part (i.e., Part-10) of the MPEG-4 family of standards and a new ITU-T (i.e., H.264) Recommendation. The
H.264 development work is an on-going activity, with the first version of the standard expected to be
finalized technically before the end of this year 2002 and officially before the end of the year 2003.

The emerging H.264 standard has a number of advantages that distinguish it from existing standards, while at
the same time, sharing common features with other existing standards. The following are some of the key
advantages of H.264:

1. Up to 50% in bit rate savings: Compared to H.263v2 (H.263+) or MPEG-4 Simple Profile, H.264
   permits a reduction in bit rate by up to 50% for a similar degree of encoder optimization at most bit rates.

2. High quality video: H.264 offers consistently good video quality at high and low bit rates.

3. Error resilience: H.264 provides the tools necessary to deal with packet loss in packet networks and bit
   errors in error-prone wireless networks.

4. Network friendliness: Through the Network Adaptation Layer, H.264 bit streams can be easily
   transported over different networks.

The above advantages make H.264 an ideal standard for several applications such as video conferencing and
broadcast video. This is discussed in detail in two related white papers. In this paper, we provide a detailed
(but high-level) description of the H.264 video coding standard, as well as a discussion of UBLive-264-C64,
UB Video’s efficient H.264-based video processing solution on the TMS320C64x digital media processor
platform.




Copyright © 2002 UB Video Inc.                          www.ubvideo.com                                 2-2002
Quantization    step    sizes
                                                                      increased at a compounding
                                   Coding Control                     rate of approximately 12.5%


Video                                                                    Quantized Transform
Source
                 +                  Transform          Quantization
                                                                         Coefficients
                 _

                             4x4 Integer Transform (fixed)


                                   Predicted                  Inverse
                                   Macroblock               Quantization

         Intra            Inter                                                            Bit Stream Out
                                                                                 Entropy
                                                              Inverse            Coding
             Intra            Inter Motion                   Transform
         Prediction and       Compensation
         Compensation                                                                  Single Universal VLC &
                                                                                       Context-Adaptive VLC OR
                                                               +                       Context-Based Adaptive
                                                                                       Binary Arithmetic Coding
                                  Frame Store

                                                           Loop Filter
                                                                                        No mismatch
                            Motion Estimation
                                                    Motion Vectors



Intra Prediction Modes                            Seven block sizes and shapes
9 4x4, 4 16x16 modes (luma) & 4                   ¼-pel-motion estimation accuracy
modes (chroma) = 17 modes                         Five Prediction modes with B-pictures
                                                  Multiple reference picture selection


                                     Figure 2. Block Diagram of the H.264 encoder.




Copyright © 2002 UB Video Inc.                              www.ubvideo.com                                 2-2002
H.264: Technical Description
Overview

The main objective of the emerging H. 264 standard is to provide a means to achieve substantially higher
video quality as compared to what could be achieved using any of the existing video coding standards.
Nonetheless, the underlying approach of H.264 is similar to that adopted in previous standards such as H.263
and MPEG-4, and consists of the following four main stages:

   1. Dividing each video frame into blocks of pixels so that processing of the video frame can be
      conducted at the block level.

   2. Exploiting the spatial redundancies that exist within the video frame by coding some of the original
      blocks through transform, quantization and entropy coding (or variable-length coding).

   3. Exploiting the temporal dependencies that exist between blocks in successive frames, so that only
      changes between successive frames need to be encoded. This is accomplished by using motion
      estimation and compensation. For any given block, a search is performed in the previously coded one
      or more frames to determine the motion vectors that are then used by the encoder and the decoder to
      predict the subject block.

   4. Exploiting any remaining spatial redundancies that exist within the video frame by coding the
      residual blocks, i.e., the difference between the original blocks and the corresponding predicted
      blocks, again through transform, quantization and entropy coding.

From the coding point of view, the main differences between H.264 and the other standards are summarized
in Figure 2 through an encoder block diagram. On the motion estimation/compensation side, H.264 employs
blocks of different sizes and shapes, higher resolution 1/4-pel motion estimation, multiple reference frame
selection and multiple bi-directional mode selection. On the transform side, H.264 uses an integer based
transform that approximates roughly the DCT transform used in previous standards, but does not have the
mismatch problem in the inverse transform. In H.264, entropy coding can be performed using either a
combination of a single Universal Variable Length Codes (UVLC) table with a Context Adaptive Variable
Length Codes (CAVLC) for the transform coefficients or using Context-based Adaptive Binary Arithmetic
Coding (CABAC). These and other features are discussed in more detail in the following sections.




                                                                9



                                                  11



Copyright © 2002 UB Video Inc.                         www.ubvideo.com                               2-2002
Figure 3. Subdivision of a QCIF picture into 16x16 macroblocks.


Organization of the Bit Stream
As mentioned above, a given video picture is divided into a number of small blocks referred to as
macroblocks. For example, a picture with QCIF resolution (176x144) is divided into 99 16x16 macroblocks
as indicated in Figure 3. A similar macroblock segmentation is used for other frame sizes. The luminance
component of the picture is sampled at these frame resolutions, while the chrominance components, Cb and
Cr, are down-sampled by two in the horizontal and vertical directions. In addition, a picture may be divided
into an integer number of “slices”, which are valuable for resynchronization should some data be lost.

Intra Prediction and Coding

Intra coding refers to the case where only spatial redundancies within a video picture are exploited. The
resulting frame is referred to as an I-picture. I-pictures are typically encoded by directly applying the
transform to the different macroblocks in the frame. As a consequence, encoded I-pictures are large in size
since a large amount of information is usually present in the frame, and no temporal information is used as
part of the encoding process. In order to increase the efficiency of the intra coding process in H.264, spatial
correlation between adjacent macroblocks in a given frame is exploited. The idea is based on the observation
that adjacent macroblocks tend to have similar properties. Therefore, as a first step in the encoding process
for a given macroblock, one may predict the macroblock of interest from the surrounding macroblocks
(typically the ones located on top and to the left of the macroblock of interest, since those macroblocks would
have already been encoded). The difference between the actual macroblock and its prediction is then coded,
which results in fewer bits to represent the macroblock of interest as compared to when applying the
transform directly to the macroblock itself.



                                                            +
                                                            +
                                                            +




                             Figure 4. Intra prediction modes for 4x4 luminance blocks.


In order to perform the intra prediction mentioned above, H.264 offers 9 modes for prediction of 4x4
luminance blocks, including DC prediction (Mode 2) and 8 directional modes, labelled 0 thru 8 in Figure 4.
This process is illustrated in Figure 4, in which pixels A to M from neighbouring blocks have already been




Copyright © 2002 UB Video Inc.                             www.ubvideo.com                             2-2002
encoded and may be used for prediction. For example, if Mode 0 (Vertical prediction) is selected, then the
values of the pixels a to p are assigned as follows:

       a, e, i and m are equal to A,
       b, f, j and n are equal to B,
       c, g, k and o are equal to C, and
       d, h, l and p are equal to D.

In the case where Mode 3 (Diagonal_Down_Left prediction) is chosen, the values of a to p are given as
follows:

       a is equal to (A+2B+C+2)/4,
       b, e are equal to (B+2C+D+2)/4,
       c, f, i are equal to (C+2D+E+2)/4,
       d, g, j, m are equal to (D+2E+F+2)/4,
       h, k, n are equal to (E+2F+G+2)/4,
       l, o are equal to (F+2G+H+2)/4, and
       p is equal to (G+3H+2)/4.

For regions with less spatial detail (i.e., flat regions), H.264 supports 16x16 intra coding, in which one of
four prediction modes (DC, Vertical, Horizontal and Planar) is chosen for the prediction of the entire
luminance component of the macroblock. In addition, H.264 supports intra prediction for the 8x8
chrominance blocks also using four prediction modes (DC, Vertical, Horizontal and Planar). Finally, the
prediction mode for each block is efficiently coded by assigning shorter symbols to more likely modes,
where the probability of each mode is determined based on the modes used for coding the surrounding
blocks.

Inter Prediction and Coding                                 H.264 derives most of its coding efficiency
                                                            advantage from motion estimation.
Inter prediction and coding is based on using motion
estimation and compensation to take advantage of the temporal redundancies that exist between successive
frames, hence, providing very efficient coding of video sequences. When a selected reference frame for
motion estimation is a previously encoded frame, the frame to be encoded is referred to as a P-picture. When
both a previously encoded frame and a future frame are chosen as reference frames, then the frame to be
encoded is referred to as a B-picture. Motion estimation in H.264 supports most of the key features adopted
in earlier video standards, but its efficiency is improved through added flexibility and functionality. In
addition to supporting P-pictures (with single and multiple reference frames) and B-pictures, H.264 supports
a new inter-stream transitional picture called an SP-picture. The inclusion of SP-pictures in a bit stream
enables efficient switching between bit streams with similar content encoded at different bit rates, as well as
random access and fast playback modes. The following four sections describe in more detail the four main
motion estimation features used in H.264 namely, (1) the use of various block sizes and shapes, (2) the use of
high-precision sub-pel motion vectors, (3) the use of multiple reference frames, and (4) the use of de-
blocking filters in the prediction loop.




Copyright © 2002 UB Video Inc.                          www.ubvideo.com                                2-2002
Mode 1                      Mode 2                          Mode 3                  Mode 4
         One 16x16 block             Two 8x16 blocks                Two 16x8 blocks           Four 8x8 blocks
        One motion vector           Two motion vectors             Two motion vectors       Four motion vectors

                                                                              0                    0     1
                0                         0     1
                                                                              1                    2     3



              Mode 5                     Mode 6                           Mode 7
        Eight 4x8 blocks             Eight 8x4 blocks               Sixteen 4x4 blocks
      Eight motion vectors         Eight motion vectors           Sixteen motion vectors

                                         0      1                        0 1 2 3
           0 1 2 3
                                         2      3                        4 5 6 7
                                         4      5                        8 9 10 11
           4 5 6 7
                                         6      7                        12 13 14 15


                    Figure 5. Different modes of dividing a macroblock for motion estimation in H.264.



Block sizes

Motion compensation for each 16x16 macroblock can be performed using a number of different block sizes
and shapes. These are illustrated in Figure 5. Individual motion vectors can be transmitted for blocks as small
as 4x4, so up to 16 motion vectors may be transmitted for a
single macroblock. Block sizes of 16x8, 8x16, 8x8, 8x4, and Using seven different block sizes and
                                                                  shapes can translate into bit rate savings
4x8 are also supported as shown. The availability of smaller of more than 15% as compared to using
motion compensation blocks improves prediction in general, only a 16x16 block size.
and in particular, the small blocks improve the ability of the
model to handle fine motion detail and result in better subjective viewing quality because they do not
produce large blocking artefacts.

Moreover, through the recently adopted tree structure segmentation method, it is possible to have a
combination of 4x8, 8x4, or 4x4 sub-blocks within an 8x8 sub-block. Figure 6 shows an example of such a
configuration for a 16x16 macroblock.

                                                            4x4     4x4
                                                    8x8
                                                            4x4    4x4

                                                              8x4
                                              4x8     4x8
                                                              8x4

   Figure 6. A configuration of sub-blocks within a macroblock based on tree structure segmentation method in H.264




Copyright © 2002 UB Video Inc.                                    www.ubvideo.com                                 2-2002
Motion estimation accuracy

The prediction capability of the motion compensation Using ¼-pel spatial accuracy can yield as
algorithm in H.264 is further improved by allowing motion much as 20% in bit rate savings as
vectors to be determined with higher levels of spatial compared to using integer-pel spatial
                                                             accuracy.
accuracy than in existing standards. Quarter-pixel accurate
motion compensation is the lowest-accuracy form of motion compensation in H.264 (in contrast with prior
standards based primarily on half-pel accuracy, with quarter-pel accuracy only available in the newest
version of MPEG-4).

Multiple reference picture selection

The H.264 standard offers the option of having multiple reference frames in inter picture coding, resulting in
better subjective video quality and more efficient coding of
the video frame under consideration. Moreover, using Using 5 reference frames for prediction
                                                                can yield 5-10% in bit rate savings as
multiple reference frames might help making the H.264 bit
                                                                compared to using only one reference
stream error resilient. However, from an implementation frame.
point of view, there would be additional processing delays
and higher memory requirements at both the encoder and decoder.

De-blocking (Loop) Filter

H.264 specifies the use of an adaptive de-blocking filter that operates on the horizontal and vertical block
edges within the prediction loop in order to remove artefacts
                                                                  The de-blocking filter yields a substantial
caused by block prediction errors. The filtering is generally improvement in subjective quality.
based on 4x4 block boundaries, in which two pixels on either
side of the boundary may be updated using a different filter. The rules for applying the de-blocking filter are
intricate and quite complex, however, its use is optional for each slice (loosely defined as an integer number
of macroblocks). Nonetheless, the improvement in subjective quality often more than justifies the increase in
complexity.

Integer Transform

The information contained in a prediction error block resulting from either intra prediction or inter prediction
is then re-expressed in the form of transform coefficients. H.264 is unique in that it employs a purely integer
spatial transform (a rough approximation of the DCT) which is primarily 4x4 in shape, as opposed to the
usual floating-point 8x8 DCT specified with rounding-error tolerances as used in earlier standards. The small
shape helps reduce blocking and ringing artifacts, while the precise integer specification eliminates any
mismatch issues between the encoder and decoder in the inverse transform.

Quantization and Transform Coefficient Scanning

The quantization step is where a significant portion of data compression takes place. In H.264, the transform
coefficients are quantized using scalar quantization with no widened dead-zone. Fifty-two different
quantization step sizes can be chosen on a macroblock basis – this being different from prior standards
(H.263 supports thirty-one, for example). Moreover, in H.264 the step sizes are increased at a compounding


Copyright © 2002 UB Video Inc.                           www.ubvideo.com                                2-2002
rate of approximately 12.5%, rather than increasing it by a constant increment. The fidelity of chrominance
components is improved by using finer quantization step sizes as compared to those used for the luminance
coefficients, particularly when the luminance coefficients are coarsely quantized.


                                              0    1    5     6

                                              2    4    7    12

                                              3    8   11    13

                                              9   10   14    15



                                 Figure 7. Scan pattern for frame coding in H.264.


The quantized transform coefficients correspond to different frequencies, with the coefficient at the top left
hand corner in Figure 7 representing the DC value, and the rest of the coefficients corresponding to different
nonzero frequency values. The next step in the encoding process is to arrange the quantized coefficients in
an array, starting with the DC coefficient. A single coefficient-scanning pattern is available in H.264 (Figure
7) for frame coding, and another one is being added for field coding. The zigzag scan illustrated in Figure 7
is used in all frame-coding cases, and it is identical to the conventional scan used in earlier video coding
standards. The zigzag scan arranges the coefficient in an ascending order of the corresponding frequencies.

Entropy Coding

The last step in the video coding process is entropy coding. Entropy coding is based on assigning shorter
codewords to symbols with higher probabilities of occurrence, and longer codewords to symbols with less
frequent occurrences. Some of the parameters to be entropy coded include transform coefficients for the
residual data, motion vectors and other encoder information. Two types of entropy coding have been
adopted: Variable-Length coding (VLC) and Context-Based Adaptive Binary Arithmetic Coding (CABAC).
We next discuss briefly the current two approaches:

Variable length coding

In some video coding standards, symbols and the associated codewords are organized in look-up tables,
referred to as VLC tables, which are stored at both the encoder and decoder. In H.263, a number of VLC
tables are used, depending on the type of data under consideration (e.g., transform coefficients, motion
vectors). H.264 offers a single Universal VLC (UVLC) table that is to be used in entropy coding of all
symbols in the encoder except for the transform coefficients. Although the use of a single UVLC table is
simple, is has a major disadvantage, which is that the single table is usually derived using a static probability
distribution model, which ignores the correlations between the encoder symbols.

In H.264, the transform coefficients are coded using Context Adaptive Variable Length Coding (CAVLC).
CAVLC is designed to take advantage of several characteristics of quantized 4x4 blocks. First, non-zero
coefficients at the end of the zigzag scan are often equal to +/- 1. CAVLC encodes the number of these
coefficients (“trailing 1s”) in a compact way. Second, CAVLC employs run-level coding efficiently to
represent the string of zeros in a quantized 4x4 block. Moreover, the numbers of non-zero coefficients in


Copyright © 2002 UB Video Inc.                              www.ubvideo.com                              2-2002
neighboring blocks are usually correlated. Thus, the number of non-zero coefficients is encoded using a look-
up table that depends on the numbers of non-zero coefficients in neighboring blocks. Finally, the magnitude
(level) of non-zero coefficients gets larger near the DC coefficient and get smaller around the high frequency
coefficients. CAVLC takes advantage of this by making the choice of the VLC look-up table for the level
adaptive in a way where the choice depends on the recently coded levels.

Context-Based Adaptive Binary Arithmetic Coding (CABAC)

Arithmetic coding makes use of a probability model at both
the encoder and decoder for all the syntax elements            The use of Context based Adaptive
                                                               Binary Arithmetic Coding in H.264 yields
(transform coefficients, motion vectors). To increase the      an improvement of approximately 10%
coding efficiency of arithmetic coding, the underlying         in bit savings.
probability model is adapted to the changing statistics with
a video frame, through a process called context modeling.

Context modeling provides estimates of conditional probabilities of the coding symbols. Utilizing suitable
context models, given inter-symbol redundancy can be exploited by switching between different probability
models according to already coded symbols in the neighborhood of the current symbol to encode.

Different models are often maintained for each syntax element (e.g., motion vectors and transform
coefficients have different models). If a given symbol is non-binary valued, it will be mapped onto a
sequence of binary decisions, so-called bins. The actual binarization is done according to a given binary tree
– and in this case the UVLC binary tree is often used. Each binary decision is then encoded with the
arithmetic encoder using the new probability estimates, which have been updated during the previous context
modeling stage. After encoding of each bin, we adjust upward the probability estimate for the binary symbol
that was just encoded. Hence, the model keeps track of the actual statistics.



UBLive-264-C64x: TMS320C64x Digital Media Platform Implementation
Designed to support both fixed and portable video/imaging applications, TI’s digital media platform of DSPs
combines the silicon, software, system-level expertise and support necessary to bring OEMs to market
quickly with differentiated products. Together with its third party network members such as UB Video, TI
has invested heavily in the video/imaging end equipment market. Products and support tools within the
digital media platform are optimized to meet customers’ unique performance needs.

The TMS320C64x DSP Platform
                                                         The H.264 standard derives a substantial part
The TMS320C64x™ DSPs are the highest-                    of its performance gain from improved motion
performance fixed-point DSP generation in the            estimation. The TI TMS32064x family of DSPs
TMS320C6000™ family of DSPs from Texas                   from TI has been well optimized for the type of
Instruments (TI). The TMS320C64x DSP core                operations performed in motion estimation.
delivers the highest performance at the lowest power     Consequently, the TM320C64x family of DSPs
consumption of any available DSP in the market to        represents an ideal and powerful platform for
                                                         H.264-based video coding systems, particularly
date. The TMS320C64x DSPs have broad capabilities        where programmability is highly desired.
and will enable multimedia communications and the


Copyright © 2002 UB Video Inc.                          www.ubvideo.com                               2-2002
full convergence of video, voice and data on all types of broadband networks. With the 600Mhz processing
power of TI's TMS320C64x DSPs, these DSPs are most suited to overcome the complexity and
computational requirements of H.264 and to deliver high-quality full-screen video for real-time video
conferencing and broadcast video applications.

The TMS320C64x core is based on the second-generation VelociTI™ very-long-instruction-word (VLIW)
architecture (VelociTI.2™) developed by TI, making the corresponding DSPs an ideal choice for multi-
channel and multi-function applications such as image and video processing. The C64x™ is a code-
compatible member of the C6000™ DSP platform.

A C64x™ DSP is capable of a performance of up to 4800 million instructions per second (MIPS) at a clock
rate of 600 MHz. The C64x™ DSP core processor has 64 general-purpose 32-bit registers and eight highly
independent functional units - two multipliers and six arithmetic logic units (ALUs) with VelociTI.2
extensions. The VelociTI.2™ extensions in the eight functional units include new instructions to accelerate
the performance in key applications and extend the parallelism of the VelociTI architecture. The C64x can
produce four 16-bit multiply-accumulates (MACs) per cycle for a total of 2400 million MACs per second, or
eight 8-bit MACs per cycle for a total of 4800 million MACS. The C64x also has application-specific
hardware logic, on-chip memory, and additional on-chip peripherals similar to the other C6000™ DSP
platform devices.

The C64x uses a two-level cache-based architecture and has a powerful and diverse set of peripherals. The
Level 1 program cache (L1P) is a 128-Kbit direct-mapped cache and the Level 1 data cache (L1D) is a 128-
Kbit 2-way set-associative cache. The Level 2 memory/cache (L2) consists of an 8-Mbit-memory space that
is shared between program and data space. The L2 memory can be configured as mapped memory, cache, or
combinations of the two.

Example Application: Video
Conferencing                                                                      - Much better visual quality than
                                               Video                                that obtained using H.263-based
                                                                                    video coding products
UBLive-26L-C64 is an H.264-based            conferencing                          - Lower processing delay
solution on the C64x platform for
real-time video applications. The
                                             Application
main features of the UBLive-26L-
C64 are: (1) efficient algorithms that
reduce substantially the H.264
complexity, (2) C64x optimizations
that take full advantage of the C64x
architecture, and (3) intelligent pre-                                       Network
processing that improves the video
quality. The above features make
UBLive-26L-C64 an ideal solution
for real-time video communication
applications. We next illustrate the
benefits UBLive-26L-C64 in the            - Up to 50% in bit rate saving relative to H.263
context of a video conferencing           - Lower processing      delay   using   one-pass
application.                                encoding
                                          - Higher error resilience



Copyright © 2002 UB Video Inc.                           www.ubvideo.com                                       2-2002
The major requirements for a typical video conferencing session are consistently good video quality even
using a limited bandwidth, low delay and robustness to packet loss. In a video conferencing call involving
audio, video and data, the video component typically consumes most of the available bandwidth for the call.
Therefore, there would be a much wider acceptance of video conferencing systems over low bandwidth
networks if the required bandwidth for the video component part would be significantly reduced. This is
precisely where UBLive-26L-C64 brings most of its value, mainly reducing the required bandwidth by as
much as 50% as compared to H.263-based video solutions. For example, many of the video conferencing
calls currently take place at around 384 kbps, of which video consumes about 320 kbps. For essentially the
same video quality that would be obtained with H.263 at 320 kbps, UB Video’s UBLive-26L-C64 requires
as little as 160 kbps, hence enabling the conference call to take place using a lower bandwidth network.

UBLive-26L-C64 also offers a low processing latency, a key requirement in video conferencing applications.
Many of the existing video conferencing solutions still perform two-pass encoding in order to guarantee a
satisfactory video quality, however the two-pass encoding method can introduce an objectionable delay
during a conference call. UBLive-26L-C64 guarantees an excellent video quality even for single-pass
encoding, thereby reducing processing latency.

Although most of the current video conferencing calls take place over local private networks, there is still a
certain level of packet loss that takes place during packet transmission. UBLive-26L-C64 offers error
resilience features at the encoder side that combat effectively packet loss even when the loss rate is high.

The UBLive-26L-C64 Demonstration Software: Description

The UBLive-26L-C64x demonstrates the ability to do real-time H264 encoding and decoding with a feature
set optimized for low-delay coding. The demonstration runs on the TMS3200C64x Network Video
Developer's Kit (NVDK) from Texas Instruments that includes the 6415 DSP. A block diagram of the
demonstration is shown in Figure 8. Video is captured from a camera and (optionally) scaled to SIF
resolution. The video is then real-time encoded at the bit rates of 128 and 384 Kbps for SIF content and
1Mbps for full-screen content, and the bit stream is then fed back to a real-time decoder. The DSP applies
format (4:2:2 to 4:2:0) and color conversion to the decoded video before display. The encoder and decoder
modules have a documented API that makes it easy to integrate the modules in end equipment products.




                              Figure 8. Block diagram of the UBLive-26L-C64 demo.
For more information on UBLive-26L-C64, please contact UB Video at www.ubvideo.com.

References
[1] H.263 Standard - Overview and TMS320C6000 Implementation. UB Video Inc. www.ubvideo.com.
[2] MPEG-4 Standard. UB Video Inc. www.ubvideo.com.


Copyright © 2002 UB Video Inc.                          www.ubvideo.com                               2-2002

Mais conteúdo relacionado

Mais procurados

Applied technology
Applied technologyApplied technology
Applied technologyErica Fressa
 
HEVC VIDEO CODEC By Vinayagam Mariappan
HEVC VIDEO CODEC By Vinayagam MariappanHEVC VIDEO CODEC By Vinayagam Mariappan
HEVC VIDEO CODEC By Vinayagam MariappanVinayagam Mariappan
 
Video Coding Standard
Video Coding StandardVideo Coding Standard
Video Coding StandardVideoguy
 
Subjective quality evaluation of the upcoming HEVC video compression standard
Subjective quality evaluation of the upcoming HEVC video compression standard Subjective quality evaluation of the upcoming HEVC video compression standard
Subjective quality evaluation of the upcoming HEVC video compression standard Touradj Ebrahimi
 
Designing an 4K/UHD1 HDR OB Truck as 12G-SDI or IP-based
Designing an 4K/UHD1 HDR OB Truck as 12G-SDI or IP-basedDesigning an 4K/UHD1 HDR OB Truck as 12G-SDI or IP-based
Designing an 4K/UHD1 HDR OB Truck as 12G-SDI or IP-basedDr. Mohieddin Moradi
 
High Efficiency Video Codec
High Efficiency Video CodecHigh Efficiency Video Codec
High Efficiency Video CodecTejus Adiga M
 
Compressed Video Quality
Compressed Video QualityCompressed Video Quality
Compressed Video QualityIain Richardson
 
ICME 2016 - High Efficiency Video Coding - Coding Tools and Specification: HE...
ICME 2016 - High Efficiency Video Coding - Coding Tools and Specification: HE...ICME 2016 - High Efficiency Video Coding - Coding Tools and Specification: HE...
ICME 2016 - High Efficiency Video Coding - Coding Tools and Specification: HE...Mathias Wien
 
Presentazione Broadcast H.265 & H.264 Sematron Italia - Maggio 2016
Presentazione Broadcast H.265 & H.264 Sematron Italia  - Maggio 2016Presentazione Broadcast H.265 & H.264 Sematron Italia  - Maggio 2016
Presentazione Broadcast H.265 & H.264 Sematron Italia - Maggio 2016Sematron Italia S.r.l.
 
h.264 video compression standard.
h.264 video compression standard.h.264 video compression standard.
h.264 video compression standard.Videoguy
 

Mais procurados (16)

Applied technology
Applied technologyApplied technology
Applied technology
 
HEVC VIDEO CODEC By Vinayagam Mariappan
HEVC VIDEO CODEC By Vinayagam MariappanHEVC VIDEO CODEC By Vinayagam Mariappan
HEVC VIDEO CODEC By Vinayagam Mariappan
 
H.263 Video Codec
H.263 Video CodecH.263 Video Codec
H.263 Video Codec
 
Video Coding Standard
Video Coding StandardVideo Coding Standard
Video Coding Standard
 
Subjective quality evaluation of the upcoming HEVC video compression standard
Subjective quality evaluation of the upcoming HEVC video compression standard Subjective quality evaluation of the upcoming HEVC video compression standard
Subjective quality evaluation of the upcoming HEVC video compression standard
 
[IJET-V1I2P1] Authors :Imran Ullah Khan ,Mohd. Javed Khan ,S.Hasan Saeed ,Nup...
[IJET-V1I2P1] Authors :Imran Ullah Khan ,Mohd. Javed Khan ,S.Hasan Saeed ,Nup...[IJET-V1I2P1] Authors :Imran Ullah Khan ,Mohd. Javed Khan ,S.Hasan Saeed ,Nup...
[IJET-V1I2P1] Authors :Imran Ullah Khan ,Mohd. Javed Khan ,S.Hasan Saeed ,Nup...
 
Designing an 4K/UHD1 HDR OB Truck as 12G-SDI or IP-based
Designing an 4K/UHD1 HDR OB Truck as 12G-SDI or IP-basedDesigning an 4K/UHD1 HDR OB Truck as 12G-SDI or IP-based
Designing an 4K/UHD1 HDR OB Truck as 12G-SDI or IP-based
 
High Efficiency Video Codec
High Efficiency Video CodecHigh Efficiency Video Codec
High Efficiency Video Codec
 
HEVC overview main
HEVC overview mainHEVC overview main
HEVC overview main
 
Deblocking_Filter_v2
Deblocking_Filter_v2Deblocking_Filter_v2
Deblocking_Filter_v2
 
HEVC intra coding
HEVC intra codingHEVC intra coding
HEVC intra coding
 
Compressed Video Quality
Compressed Video QualityCompressed Video Quality
Compressed Video Quality
 
ICME 2016 - High Efficiency Video Coding - Coding Tools and Specification: HE...
ICME 2016 - High Efficiency Video Coding - Coding Tools and Specification: HE...ICME 2016 - High Efficiency Video Coding - Coding Tools and Specification: HE...
ICME 2016 - High Efficiency Video Coding - Coding Tools and Specification: HE...
 
Presentazione Broadcast H.265 & H.264 Sematron Italia - Maggio 2016
Presentazione Broadcast H.265 & H.264 Sematron Italia  - Maggio 2016Presentazione Broadcast H.265 & H.264 Sematron Italia  - Maggio 2016
Presentazione Broadcast H.265 & H.264 Sematron Italia - Maggio 2016
 
H263.ppt
H263.pptH263.ppt
H263.ppt
 
h.264 video compression standard.
h.264 video compression standard.h.264 video compression standard.
h.264 video compression standard.
 

Semelhante a Emerging H.264 Standard:

/conferences/spr2004/presentations/eubanks/eubanks_mpeg4.ppt
/conferences/spr2004/presentations/eubanks/eubanks_mpeg4.ppt/conferences/spr2004/presentations/eubanks/eubanks_mpeg4.ppt
/conferences/spr2004/presentations/eubanks/eubanks_mpeg4.pptVideoguy
 
martelli.ppt
martelli.pptmartelli.ppt
martelli.pptVideoguy
 
Spatial Scalable Video Compression Using H.264
Spatial Scalable Video Compression Using H.264Spatial Scalable Video Compression Using H.264
Spatial Scalable Video Compression Using H.264IOSR Journals
 
Motion Vector Recovery for Real-time H.264 Video Streams
Motion Vector Recovery for Real-time H.264 Video StreamsMotion Vector Recovery for Real-time H.264 Video Streams
Motion Vector Recovery for Real-time H.264 Video StreamsIDES Editor
 
H.264 video compression standard.
H.264 video compression standard.H.264 video compression standard.
H.264 video compression standard.Axis Communications
 
H264 video compression explained
H264 video compression explainedH264 video compression explained
H264 video compression explainedcnssources
 
Polycom Video Communications
Polycom Video CommunicationsPolycom Video Communications
Polycom Video CommunicationsVideoguy
 
Introduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag JainIntroduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag JainVideoguy
 
Generic Video Adaptation Framework Towards Content – and Context Awareness in...
Generic Video Adaptation Framework Towards Content – and Context Awareness in...Generic Video Adaptation Framework Towards Content – and Context Awareness in...
Generic Video Adaptation Framework Towards Content – and Context Awareness in...Alpen-Adria-Universität
 
Next generation video compression
Next generation video compressionNext generation video compression
Next generation video compressionEricsson
 
Next generation video compression
Next generation video compressionNext generation video compression
Next generation video compressionEricsson Slides
 
The H.264/AVC Advanced Video Coding Standard: Overview and ...
The H.264/AVC Advanced Video Coding Standard: Overview and ...The H.264/AVC Advanced Video Coding Standard: Overview and ...
The H.264/AVC Advanced Video Coding Standard: Overview and ...Videoguy
 
Paper id 2120148
Paper id 2120148Paper id 2120148
Paper id 2120148IJRAT
 
10.1.1.184.6612
10.1.1.184.661210.1.1.184.6612
10.1.1.184.6612NITC
 

Semelhante a Emerging H.264 Standard: (20)

H.264 vs HEVC
H.264 vs HEVCH.264 vs HEVC
H.264 vs HEVC
 
/conferences/spr2004/presentations/eubanks/eubanks_mpeg4.ppt
/conferences/spr2004/presentations/eubanks/eubanks_mpeg4.ppt/conferences/spr2004/presentations/eubanks/eubanks_mpeg4.ppt
/conferences/spr2004/presentations/eubanks/eubanks_mpeg4.ppt
 
martelli.ppt
martelli.pptmartelli.ppt
martelli.ppt
 
Spatial Scalable Video Compression Using H.264
Spatial Scalable Video Compression Using H.264Spatial Scalable Video Compression Using H.264
Spatial Scalable Video Compression Using H.264
 
E010132529
E010132529E010132529
E010132529
 
Motion Vector Recovery for Real-time H.264 Video Streams
Motion Vector Recovery for Real-time H.264 Video StreamsMotion Vector Recovery for Real-time H.264 Video Streams
Motion Vector Recovery for Real-time H.264 Video Streams
 
H.264 video compression standard.
H.264 video compression standard.H.264 video compression standard.
H.264 video compression standard.
 
H264 video compression explained
H264 video compression explainedH264 video compression explained
H264 video compression explained
 
Polycom Video Communications
Polycom Video CommunicationsPolycom Video Communications
Polycom Video Communications
 
Introduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag JainIntroduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag Jain
 
Generic Video Adaptation Framework Towards Content – and Context Awareness in...
Generic Video Adaptation Framework Towards Content – and Context Awareness in...Generic Video Adaptation Framework Towards Content – and Context Awareness in...
Generic Video Adaptation Framework Towards Content – and Context Awareness in...
 
Next generation video compression
Next generation video compressionNext generation video compression
Next generation video compression
 
Next generation video compression
Next generation video compressionNext generation video compression
Next generation video compression
 
video compression2
video compression2video compression2
video compression2
 
video compression2
video compression2video compression2
video compression2
 
video compression2
video compression2video compression2
video compression2
 
The H.264/AVC Advanced Video Coding Standard: Overview and ...
The H.264/AVC Advanced Video Coding Standard: Overview and ...The H.264/AVC Advanced Video Coding Standard: Overview and ...
The H.264/AVC Advanced Video Coding Standard: Overview and ...
 
Paper id 2120148
Paper id 2120148Paper id 2120148
Paper id 2120148
 
HEVC Main Main10 Profile Encoding
HEVC Main Main10 Profile EncodingHEVC Main Main10 Profile Encoding
HEVC Main Main10 Profile Encoding
 
10.1.1.184.6612
10.1.1.184.661210.1.1.184.6612
10.1.1.184.6612
 

Mais de Videoguy

Energy-Aware Wireless Video Streaming
Energy-Aware Wireless Video StreamingEnergy-Aware Wireless Video Streaming
Energy-Aware Wireless Video StreamingVideoguy
 
Microsoft PowerPoint - WirelessCluster_Pres
Microsoft PowerPoint - WirelessCluster_PresMicrosoft PowerPoint - WirelessCluster_Pres
Microsoft PowerPoint - WirelessCluster_PresVideoguy
 
Proxy Cache Management for Fine-Grained Scalable Video Streaming
Proxy Cache Management for Fine-Grained Scalable Video StreamingProxy Cache Management for Fine-Grained Scalable Video Streaming
Proxy Cache Management for Fine-Grained Scalable Video StreamingVideoguy
 
Free-riding Resilient Video Streaming in Peer-to-Peer Networks
Free-riding Resilient Video Streaming in Peer-to-Peer NetworksFree-riding Resilient Video Streaming in Peer-to-Peer Networks
Free-riding Resilient Video Streaming in Peer-to-Peer NetworksVideoguy
 
Instant video streaming
Instant video streamingInstant video streaming
Instant video streamingVideoguy
 
Video Streaming over Bluetooth: A Survey
Video Streaming over Bluetooth: A SurveyVideo Streaming over Bluetooth: A Survey
Video Streaming over Bluetooth: A SurveyVideoguy
 
Video Streaming
Video StreamingVideo Streaming
Video StreamingVideoguy
 
Reaching a Broader Audience
Reaching a Broader AudienceReaching a Broader Audience
Reaching a Broader AudienceVideoguy
 
Considerations for Creating Streamed Video Content over 3G ...
Considerations for Creating Streamed Video Content over 3G ...Considerations for Creating Streamed Video Content over 3G ...
Considerations for Creating Streamed Video Content over 3G ...Videoguy
 
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMINGADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMINGVideoguy
 
Impact of FEC Overhead on Scalable Video Streaming
Impact of FEC Overhead on Scalable Video StreamingImpact of FEC Overhead on Scalable Video Streaming
Impact of FEC Overhead on Scalable Video StreamingVideoguy
 
Application Brief
Application BriefApplication Brief
Application BriefVideoguy
 
Video Streaming Services – Stage 1
Video Streaming Services – Stage 1Video Streaming Services – Stage 1
Video Streaming Services – Stage 1Videoguy
 
Streaming Video into Second Life
Streaming Video into Second LifeStreaming Video into Second Life
Streaming Video into Second LifeVideoguy
 
Flash Live Video Streaming Software
Flash Live Video Streaming SoftwareFlash Live Video Streaming Software
Flash Live Video Streaming SoftwareVideoguy
 
Videoconference Streaming Solutions Cookbook
Videoconference Streaming Solutions CookbookVideoconference Streaming Solutions Cookbook
Videoconference Streaming Solutions CookbookVideoguy
 
Streaming Video Formaten
Streaming Video FormatenStreaming Video Formaten
Streaming Video FormatenVideoguy
 
iPhone Live Video Streaming Software
iPhone Live Video Streaming SoftwareiPhone Live Video Streaming Software
iPhone Live Video Streaming SoftwareVideoguy
 
Glow: Video streaming training guide - Firefox
Glow: Video streaming training guide - FirefoxGlow: Video streaming training guide - Firefox
Glow: Video streaming training guide - FirefoxVideoguy
 

Mais de Videoguy (20)

Energy-Aware Wireless Video Streaming
Energy-Aware Wireless Video StreamingEnergy-Aware Wireless Video Streaming
Energy-Aware Wireless Video Streaming
 
Microsoft PowerPoint - WirelessCluster_Pres
Microsoft PowerPoint - WirelessCluster_PresMicrosoft PowerPoint - WirelessCluster_Pres
Microsoft PowerPoint - WirelessCluster_Pres
 
Proxy Cache Management for Fine-Grained Scalable Video Streaming
Proxy Cache Management for Fine-Grained Scalable Video StreamingProxy Cache Management for Fine-Grained Scalable Video Streaming
Proxy Cache Management for Fine-Grained Scalable Video Streaming
 
Adobe
AdobeAdobe
Adobe
 
Free-riding Resilient Video Streaming in Peer-to-Peer Networks
Free-riding Resilient Video Streaming in Peer-to-Peer NetworksFree-riding Resilient Video Streaming in Peer-to-Peer Networks
Free-riding Resilient Video Streaming in Peer-to-Peer Networks
 
Instant video streaming
Instant video streamingInstant video streaming
Instant video streaming
 
Video Streaming over Bluetooth: A Survey
Video Streaming over Bluetooth: A SurveyVideo Streaming over Bluetooth: A Survey
Video Streaming over Bluetooth: A Survey
 
Video Streaming
Video StreamingVideo Streaming
Video Streaming
 
Reaching a Broader Audience
Reaching a Broader AudienceReaching a Broader Audience
Reaching a Broader Audience
 
Considerations for Creating Streamed Video Content over 3G ...
Considerations for Creating Streamed Video Content over 3G ...Considerations for Creating Streamed Video Content over 3G ...
Considerations for Creating Streamed Video Content over 3G ...
 
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMINGADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
 
Impact of FEC Overhead on Scalable Video Streaming
Impact of FEC Overhead on Scalable Video StreamingImpact of FEC Overhead on Scalable Video Streaming
Impact of FEC Overhead on Scalable Video Streaming
 
Application Brief
Application BriefApplication Brief
Application Brief
 
Video Streaming Services – Stage 1
Video Streaming Services – Stage 1Video Streaming Services – Stage 1
Video Streaming Services – Stage 1
 
Streaming Video into Second Life
Streaming Video into Second LifeStreaming Video into Second Life
Streaming Video into Second Life
 
Flash Live Video Streaming Software
Flash Live Video Streaming SoftwareFlash Live Video Streaming Software
Flash Live Video Streaming Software
 
Videoconference Streaming Solutions Cookbook
Videoconference Streaming Solutions CookbookVideoconference Streaming Solutions Cookbook
Videoconference Streaming Solutions Cookbook
 
Streaming Video Formaten
Streaming Video FormatenStreaming Video Formaten
Streaming Video Formaten
 
iPhone Live Video Streaming Software
iPhone Live Video Streaming SoftwareiPhone Live Video Streaming Software
iPhone Live Video Streaming Software
 
Glow: Video streaming training guide - Firefox
Glow: Video streaming training guide - FirefoxGlow: Video streaming training guide - Firefox
Glow: Video streaming training guide - Firefox
 

Emerging H.264 Standard:

  • 1. Emerging H.264 Standard: Overview and TMS320C64x Digital Media Platform Implementation White Paper UB Video Inc. Suite 400, 1788 west 5th Avenue Vancouver, British Columbia, Canada V6J 1P2 Tel: 604-737-2426; Fax: 604-737-1514 www.ubvideo.com Copyright © 2002 UB Video Inc. www.ubvideo.com 2-2002
  • 2. H.264: Introduction Digital video is being adopted in an increasingly proliferating array of applications ranging from video telephony and videoconferencing to DVD and digital TV. The adoption of digital video in many applications has been fuelled by the development of many video coding standards, which have emerged targeting different application areas. These standards provide the means needed to achieve interoperability between systems designed by different manufacturers for any given application, hence facilitating the growth of the video market. The ITU-T1 is now one of two formal organizations that develop video coding standards - the other being ISO/IEC JTC12. The ITU-T video coding standards are called recommendations, and they are denoted with H.26x (e.g., H.261, H.262, H.263 and H.264). The ISO/IEC standards are denoted with MPEG- x (e.g., MPEG-1, MPEG-2 and MPEG-4). The ITU-T recommendations have been designed mostly for real-time video communication applications, such as video conferencing and video telephony. On the other hand, the MPEG standards have been designed mostly to address the needs of video storage (DVD), broadcast video (broadcast TV), and video streaming (e.g., video over the Internet, video over DSL, video over wireless) applications. For the most part, the two standardization committees have worked independently on the different standards. The only exception has been the H.262/MPEG-2 standard, which was developed jointly by the two committees. Recently, the ITU-T and the ISO/IEC JTC1 have agreed to join their efforts in the development of the emerging H.264 standard, which was initiated by the ITU-T committee. H.264 (a.k.a. MPEG-4 Part 10 or MPEG-4 AVC) is being adopted by the two committees because it represents a departure in terms of performance from all existing video coding standards. Figure 1 summarizes the evolution of the ITU-T recommendations and the ISO/IEC MPEG standards. Please see [1] and [2] for more information on the H.263 and MPEG-4 video coding standards. ITU-T H.261 H.263 H.263+ H.263++ Standards Joint ITU-T / MPEG H.262/MPEG-2 H.264 Standards MPEG Standards MPEG-1 MPEG-4 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 Figure 1. Progression of the ITU-T Recommendations and MPEG standards. UB Video has been developing a complete video processing solution that is based on the H.264 video coding standard and that is optimized for the Texas Instruments TMS320C64x digital media platform family of DSPs. UBLive-264-C64, UB Video’s efficient H.264-based video processing solution, and the high- performance TMS320C64x family of DSPs, represent a compelling integrated software/hardware solution to 1 The International Telecommunications Union, Telecommunication Standardization Sector. 2 International Standardization Organization and the International Electrotechnical Commission, Joint Technical Committee number 1. Copyright © 2002 UB Video Inc. www.ubvideo.com 2-2002
  • 3. most high-performance video applications. The purpose of this paper is to present an overview of the emerging H.264 standard and the TMS320C64x digital media platform, as well as a discussion of UBLive- 264-C64. First, an overview of the H.264 standard and its benefits are presented, followed by a more detailed description of the main H.264 video coding features. The Texas Instruments TMS320C64x family of DSPs is then presented. UBLive-264-C64’s main features are then described, and its benefits for real-time video applications are finally discussed. H.264: Overview The main objective behind the H.264 project was to develop a high-performance video coding standard by adopting a “back to basics” approach where simple and straightforward design using well-known building blocks is used. The ITU-T Video Coding Experts Group (VCEG) has initiated the work on the H.264 standard in 1997. Towards the end of 2001, and witnessing the superiority of video quality offered by H.264- based software over that achieved by the (existing) most optimized MPEG-4 based software, ISO/IEC MPEG joined ITU-T VCEG by forming a Joint Video Team (JVT) that took over the H.264 project of the ITU-T. The JVT objective was to create a single video coding standard that would simultaneously result in a new part (i.e., Part-10) of the MPEG-4 family of standards and a new ITU-T (i.e., H.264) Recommendation. The H.264 development work is an on-going activity, with the first version of the standard expected to be finalized technically before the end of this year 2002 and officially before the end of the year 2003. The emerging H.264 standard has a number of advantages that distinguish it from existing standards, while at the same time, sharing common features with other existing standards. The following are some of the key advantages of H.264: 1. Up to 50% in bit rate savings: Compared to H.263v2 (H.263+) or MPEG-4 Simple Profile, H.264 permits a reduction in bit rate by up to 50% for a similar degree of encoder optimization at most bit rates. 2. High quality video: H.264 offers consistently good video quality at high and low bit rates. 3. Error resilience: H.264 provides the tools necessary to deal with packet loss in packet networks and bit errors in error-prone wireless networks. 4. Network friendliness: Through the Network Adaptation Layer, H.264 bit streams can be easily transported over different networks. The above advantages make H.264 an ideal standard for several applications such as video conferencing and broadcast video. This is discussed in detail in two related white papers. In this paper, we provide a detailed (but high-level) description of the H.264 video coding standard, as well as a discussion of UBLive-264-C64, UB Video’s efficient H.264-based video processing solution on the TMS320C64x digital media processor platform. Copyright © 2002 UB Video Inc. www.ubvideo.com 2-2002
  • 4. Quantization step sizes increased at a compounding Coding Control rate of approximately 12.5% Video Quantized Transform Source + Transform Quantization Coefficients _ 4x4 Integer Transform (fixed) Predicted Inverse Macroblock Quantization Intra Inter Bit Stream Out Entropy Inverse Coding Intra Inter Motion Transform Prediction and Compensation Compensation Single Universal VLC & Context-Adaptive VLC OR + Context-Based Adaptive Binary Arithmetic Coding Frame Store Loop Filter No mismatch Motion Estimation Motion Vectors Intra Prediction Modes Seven block sizes and shapes 9 4x4, 4 16x16 modes (luma) & 4 ¼-pel-motion estimation accuracy modes (chroma) = 17 modes Five Prediction modes with B-pictures Multiple reference picture selection Figure 2. Block Diagram of the H.264 encoder. Copyright © 2002 UB Video Inc. www.ubvideo.com 2-2002
  • 5. H.264: Technical Description Overview The main objective of the emerging H. 264 standard is to provide a means to achieve substantially higher video quality as compared to what could be achieved using any of the existing video coding standards. Nonetheless, the underlying approach of H.264 is similar to that adopted in previous standards such as H.263 and MPEG-4, and consists of the following four main stages: 1. Dividing each video frame into blocks of pixels so that processing of the video frame can be conducted at the block level. 2. Exploiting the spatial redundancies that exist within the video frame by coding some of the original blocks through transform, quantization and entropy coding (or variable-length coding). 3. Exploiting the temporal dependencies that exist between blocks in successive frames, so that only changes between successive frames need to be encoded. This is accomplished by using motion estimation and compensation. For any given block, a search is performed in the previously coded one or more frames to determine the motion vectors that are then used by the encoder and the decoder to predict the subject block. 4. Exploiting any remaining spatial redundancies that exist within the video frame by coding the residual blocks, i.e., the difference between the original blocks and the corresponding predicted blocks, again through transform, quantization and entropy coding. From the coding point of view, the main differences between H.264 and the other standards are summarized in Figure 2 through an encoder block diagram. On the motion estimation/compensation side, H.264 employs blocks of different sizes and shapes, higher resolution 1/4-pel motion estimation, multiple reference frame selection and multiple bi-directional mode selection. On the transform side, H.264 uses an integer based transform that approximates roughly the DCT transform used in previous standards, but does not have the mismatch problem in the inverse transform. In H.264, entropy coding can be performed using either a combination of a single Universal Variable Length Codes (UVLC) table with a Context Adaptive Variable Length Codes (CAVLC) for the transform coefficients or using Context-based Adaptive Binary Arithmetic Coding (CABAC). These and other features are discussed in more detail in the following sections. 9 11 Copyright © 2002 UB Video Inc. www.ubvideo.com 2-2002
  • 6. Figure 3. Subdivision of a QCIF picture into 16x16 macroblocks. Organization of the Bit Stream As mentioned above, a given video picture is divided into a number of small blocks referred to as macroblocks. For example, a picture with QCIF resolution (176x144) is divided into 99 16x16 macroblocks as indicated in Figure 3. A similar macroblock segmentation is used for other frame sizes. The luminance component of the picture is sampled at these frame resolutions, while the chrominance components, Cb and Cr, are down-sampled by two in the horizontal and vertical directions. In addition, a picture may be divided into an integer number of “slices”, which are valuable for resynchronization should some data be lost. Intra Prediction and Coding Intra coding refers to the case where only spatial redundancies within a video picture are exploited. The resulting frame is referred to as an I-picture. I-pictures are typically encoded by directly applying the transform to the different macroblocks in the frame. As a consequence, encoded I-pictures are large in size since a large amount of information is usually present in the frame, and no temporal information is used as part of the encoding process. In order to increase the efficiency of the intra coding process in H.264, spatial correlation between adjacent macroblocks in a given frame is exploited. The idea is based on the observation that adjacent macroblocks tend to have similar properties. Therefore, as a first step in the encoding process for a given macroblock, one may predict the macroblock of interest from the surrounding macroblocks (typically the ones located on top and to the left of the macroblock of interest, since those macroblocks would have already been encoded). The difference between the actual macroblock and its prediction is then coded, which results in fewer bits to represent the macroblock of interest as compared to when applying the transform directly to the macroblock itself. + + + Figure 4. Intra prediction modes for 4x4 luminance blocks. In order to perform the intra prediction mentioned above, H.264 offers 9 modes for prediction of 4x4 luminance blocks, including DC prediction (Mode 2) and 8 directional modes, labelled 0 thru 8 in Figure 4. This process is illustrated in Figure 4, in which pixels A to M from neighbouring blocks have already been Copyright © 2002 UB Video Inc. www.ubvideo.com 2-2002
  • 7. encoded and may be used for prediction. For example, if Mode 0 (Vertical prediction) is selected, then the values of the pixels a to p are assigned as follows: a, e, i and m are equal to A, b, f, j and n are equal to B, c, g, k and o are equal to C, and d, h, l and p are equal to D. In the case where Mode 3 (Diagonal_Down_Left prediction) is chosen, the values of a to p are given as follows: a is equal to (A+2B+C+2)/4, b, e are equal to (B+2C+D+2)/4, c, f, i are equal to (C+2D+E+2)/4, d, g, j, m are equal to (D+2E+F+2)/4, h, k, n are equal to (E+2F+G+2)/4, l, o are equal to (F+2G+H+2)/4, and p is equal to (G+3H+2)/4. For regions with less spatial detail (i.e., flat regions), H.264 supports 16x16 intra coding, in which one of four prediction modes (DC, Vertical, Horizontal and Planar) is chosen for the prediction of the entire luminance component of the macroblock. In addition, H.264 supports intra prediction for the 8x8 chrominance blocks also using four prediction modes (DC, Vertical, Horizontal and Planar). Finally, the prediction mode for each block is efficiently coded by assigning shorter symbols to more likely modes, where the probability of each mode is determined based on the modes used for coding the surrounding blocks. Inter Prediction and Coding H.264 derives most of its coding efficiency advantage from motion estimation. Inter prediction and coding is based on using motion estimation and compensation to take advantage of the temporal redundancies that exist between successive frames, hence, providing very efficient coding of video sequences. When a selected reference frame for motion estimation is a previously encoded frame, the frame to be encoded is referred to as a P-picture. When both a previously encoded frame and a future frame are chosen as reference frames, then the frame to be encoded is referred to as a B-picture. Motion estimation in H.264 supports most of the key features adopted in earlier video standards, but its efficiency is improved through added flexibility and functionality. In addition to supporting P-pictures (with single and multiple reference frames) and B-pictures, H.264 supports a new inter-stream transitional picture called an SP-picture. The inclusion of SP-pictures in a bit stream enables efficient switching between bit streams with similar content encoded at different bit rates, as well as random access and fast playback modes. The following four sections describe in more detail the four main motion estimation features used in H.264 namely, (1) the use of various block sizes and shapes, (2) the use of high-precision sub-pel motion vectors, (3) the use of multiple reference frames, and (4) the use of de- blocking filters in the prediction loop. Copyright © 2002 UB Video Inc. www.ubvideo.com 2-2002
  • 8. Mode 1 Mode 2 Mode 3 Mode 4 One 16x16 block Two 8x16 blocks Two 16x8 blocks Four 8x8 blocks One motion vector Two motion vectors Two motion vectors Four motion vectors 0 0 1 0 0 1 1 2 3 Mode 5 Mode 6 Mode 7 Eight 4x8 blocks Eight 8x4 blocks Sixteen 4x4 blocks Eight motion vectors Eight motion vectors Sixteen motion vectors 0 1 0 1 2 3 0 1 2 3 2 3 4 5 6 7 4 5 8 9 10 11 4 5 6 7 6 7 12 13 14 15 Figure 5. Different modes of dividing a macroblock for motion estimation in H.264. Block sizes Motion compensation for each 16x16 macroblock can be performed using a number of different block sizes and shapes. These are illustrated in Figure 5. Individual motion vectors can be transmitted for blocks as small as 4x4, so up to 16 motion vectors may be transmitted for a single macroblock. Block sizes of 16x8, 8x16, 8x8, 8x4, and Using seven different block sizes and shapes can translate into bit rate savings 4x8 are also supported as shown. The availability of smaller of more than 15% as compared to using motion compensation blocks improves prediction in general, only a 16x16 block size. and in particular, the small blocks improve the ability of the model to handle fine motion detail and result in better subjective viewing quality because they do not produce large blocking artefacts. Moreover, through the recently adopted tree structure segmentation method, it is possible to have a combination of 4x8, 8x4, or 4x4 sub-blocks within an 8x8 sub-block. Figure 6 shows an example of such a configuration for a 16x16 macroblock. 4x4 4x4 8x8 4x4 4x4 8x4 4x8 4x8 8x4 Figure 6. A configuration of sub-blocks within a macroblock based on tree structure segmentation method in H.264 Copyright © 2002 UB Video Inc. www.ubvideo.com 2-2002
  • 9. Motion estimation accuracy The prediction capability of the motion compensation Using ¼-pel spatial accuracy can yield as algorithm in H.264 is further improved by allowing motion much as 20% in bit rate savings as vectors to be determined with higher levels of spatial compared to using integer-pel spatial accuracy. accuracy than in existing standards. Quarter-pixel accurate motion compensation is the lowest-accuracy form of motion compensation in H.264 (in contrast with prior standards based primarily on half-pel accuracy, with quarter-pel accuracy only available in the newest version of MPEG-4). Multiple reference picture selection The H.264 standard offers the option of having multiple reference frames in inter picture coding, resulting in better subjective video quality and more efficient coding of the video frame under consideration. Moreover, using Using 5 reference frames for prediction can yield 5-10% in bit rate savings as multiple reference frames might help making the H.264 bit compared to using only one reference stream error resilient. However, from an implementation frame. point of view, there would be additional processing delays and higher memory requirements at both the encoder and decoder. De-blocking (Loop) Filter H.264 specifies the use of an adaptive de-blocking filter that operates on the horizontal and vertical block edges within the prediction loop in order to remove artefacts The de-blocking filter yields a substantial caused by block prediction errors. The filtering is generally improvement in subjective quality. based on 4x4 block boundaries, in which two pixels on either side of the boundary may be updated using a different filter. The rules for applying the de-blocking filter are intricate and quite complex, however, its use is optional for each slice (loosely defined as an integer number of macroblocks). Nonetheless, the improvement in subjective quality often more than justifies the increase in complexity. Integer Transform The information contained in a prediction error block resulting from either intra prediction or inter prediction is then re-expressed in the form of transform coefficients. H.264 is unique in that it employs a purely integer spatial transform (a rough approximation of the DCT) which is primarily 4x4 in shape, as opposed to the usual floating-point 8x8 DCT specified with rounding-error tolerances as used in earlier standards. The small shape helps reduce blocking and ringing artifacts, while the precise integer specification eliminates any mismatch issues between the encoder and decoder in the inverse transform. Quantization and Transform Coefficient Scanning The quantization step is where a significant portion of data compression takes place. In H.264, the transform coefficients are quantized using scalar quantization with no widened dead-zone. Fifty-two different quantization step sizes can be chosen on a macroblock basis – this being different from prior standards (H.263 supports thirty-one, for example). Moreover, in H.264 the step sizes are increased at a compounding Copyright © 2002 UB Video Inc. www.ubvideo.com 2-2002
  • 10. rate of approximately 12.5%, rather than increasing it by a constant increment. The fidelity of chrominance components is improved by using finer quantization step sizes as compared to those used for the luminance coefficients, particularly when the luminance coefficients are coarsely quantized. 0 1 5 6 2 4 7 12 3 8 11 13 9 10 14 15 Figure 7. Scan pattern for frame coding in H.264. The quantized transform coefficients correspond to different frequencies, with the coefficient at the top left hand corner in Figure 7 representing the DC value, and the rest of the coefficients corresponding to different nonzero frequency values. The next step in the encoding process is to arrange the quantized coefficients in an array, starting with the DC coefficient. A single coefficient-scanning pattern is available in H.264 (Figure 7) for frame coding, and another one is being added for field coding. The zigzag scan illustrated in Figure 7 is used in all frame-coding cases, and it is identical to the conventional scan used in earlier video coding standards. The zigzag scan arranges the coefficient in an ascending order of the corresponding frequencies. Entropy Coding The last step in the video coding process is entropy coding. Entropy coding is based on assigning shorter codewords to symbols with higher probabilities of occurrence, and longer codewords to symbols with less frequent occurrences. Some of the parameters to be entropy coded include transform coefficients for the residual data, motion vectors and other encoder information. Two types of entropy coding have been adopted: Variable-Length coding (VLC) and Context-Based Adaptive Binary Arithmetic Coding (CABAC). We next discuss briefly the current two approaches: Variable length coding In some video coding standards, symbols and the associated codewords are organized in look-up tables, referred to as VLC tables, which are stored at both the encoder and decoder. In H.263, a number of VLC tables are used, depending on the type of data under consideration (e.g., transform coefficients, motion vectors). H.264 offers a single Universal VLC (UVLC) table that is to be used in entropy coding of all symbols in the encoder except for the transform coefficients. Although the use of a single UVLC table is simple, is has a major disadvantage, which is that the single table is usually derived using a static probability distribution model, which ignores the correlations between the encoder symbols. In H.264, the transform coefficients are coded using Context Adaptive Variable Length Coding (CAVLC). CAVLC is designed to take advantage of several characteristics of quantized 4x4 blocks. First, non-zero coefficients at the end of the zigzag scan are often equal to +/- 1. CAVLC encodes the number of these coefficients (“trailing 1s”) in a compact way. Second, CAVLC employs run-level coding efficiently to represent the string of zeros in a quantized 4x4 block. Moreover, the numbers of non-zero coefficients in Copyright © 2002 UB Video Inc. www.ubvideo.com 2-2002
  • 11. neighboring blocks are usually correlated. Thus, the number of non-zero coefficients is encoded using a look- up table that depends on the numbers of non-zero coefficients in neighboring blocks. Finally, the magnitude (level) of non-zero coefficients gets larger near the DC coefficient and get smaller around the high frequency coefficients. CAVLC takes advantage of this by making the choice of the VLC look-up table for the level adaptive in a way where the choice depends on the recently coded levels. Context-Based Adaptive Binary Arithmetic Coding (CABAC) Arithmetic coding makes use of a probability model at both the encoder and decoder for all the syntax elements The use of Context based Adaptive Binary Arithmetic Coding in H.264 yields (transform coefficients, motion vectors). To increase the an improvement of approximately 10% coding efficiency of arithmetic coding, the underlying in bit savings. probability model is adapted to the changing statistics with a video frame, through a process called context modeling. Context modeling provides estimates of conditional probabilities of the coding symbols. Utilizing suitable context models, given inter-symbol redundancy can be exploited by switching between different probability models according to already coded symbols in the neighborhood of the current symbol to encode. Different models are often maintained for each syntax element (e.g., motion vectors and transform coefficients have different models). If a given symbol is non-binary valued, it will be mapped onto a sequence of binary decisions, so-called bins. The actual binarization is done according to a given binary tree – and in this case the UVLC binary tree is often used. Each binary decision is then encoded with the arithmetic encoder using the new probability estimates, which have been updated during the previous context modeling stage. After encoding of each bin, we adjust upward the probability estimate for the binary symbol that was just encoded. Hence, the model keeps track of the actual statistics. UBLive-264-C64x: TMS320C64x Digital Media Platform Implementation Designed to support both fixed and portable video/imaging applications, TI’s digital media platform of DSPs combines the silicon, software, system-level expertise and support necessary to bring OEMs to market quickly with differentiated products. Together with its third party network members such as UB Video, TI has invested heavily in the video/imaging end equipment market. Products and support tools within the digital media platform are optimized to meet customers’ unique performance needs. The TMS320C64x DSP Platform The H.264 standard derives a substantial part The TMS320C64x™ DSPs are the highest- of its performance gain from improved motion performance fixed-point DSP generation in the estimation. The TI TMS32064x family of DSPs TMS320C6000™ family of DSPs from Texas from TI has been well optimized for the type of Instruments (TI). The TMS320C64x DSP core operations performed in motion estimation. delivers the highest performance at the lowest power Consequently, the TM320C64x family of DSPs consumption of any available DSP in the market to represents an ideal and powerful platform for H.264-based video coding systems, particularly date. The TMS320C64x DSPs have broad capabilities where programmability is highly desired. and will enable multimedia communications and the Copyright © 2002 UB Video Inc. www.ubvideo.com 2-2002
  • 12. full convergence of video, voice and data on all types of broadband networks. With the 600Mhz processing power of TI's TMS320C64x DSPs, these DSPs are most suited to overcome the complexity and computational requirements of H.264 and to deliver high-quality full-screen video for real-time video conferencing and broadcast video applications. The TMS320C64x core is based on the second-generation VelociTI™ very-long-instruction-word (VLIW) architecture (VelociTI.2™) developed by TI, making the corresponding DSPs an ideal choice for multi- channel and multi-function applications such as image and video processing. The C64x™ is a code- compatible member of the C6000™ DSP platform. A C64x™ DSP is capable of a performance of up to 4800 million instructions per second (MIPS) at a clock rate of 600 MHz. The C64x™ DSP core processor has 64 general-purpose 32-bit registers and eight highly independent functional units - two multipliers and six arithmetic logic units (ALUs) with VelociTI.2 extensions. The VelociTI.2™ extensions in the eight functional units include new instructions to accelerate the performance in key applications and extend the parallelism of the VelociTI architecture. The C64x can produce four 16-bit multiply-accumulates (MACs) per cycle for a total of 2400 million MACs per second, or eight 8-bit MACs per cycle for a total of 4800 million MACS. The C64x also has application-specific hardware logic, on-chip memory, and additional on-chip peripherals similar to the other C6000™ DSP platform devices. The C64x uses a two-level cache-based architecture and has a powerful and diverse set of peripherals. The Level 1 program cache (L1P) is a 128-Kbit direct-mapped cache and the Level 1 data cache (L1D) is a 128- Kbit 2-way set-associative cache. The Level 2 memory/cache (L2) consists of an 8-Mbit-memory space that is shared between program and data space. The L2 memory can be configured as mapped memory, cache, or combinations of the two. Example Application: Video Conferencing - Much better visual quality than Video that obtained using H.263-based video coding products UBLive-26L-C64 is an H.264-based conferencing - Lower processing delay solution on the C64x platform for real-time video applications. The Application main features of the UBLive-26L- C64 are: (1) efficient algorithms that reduce substantially the H.264 complexity, (2) C64x optimizations that take full advantage of the C64x architecture, and (3) intelligent pre- Network processing that improves the video quality. The above features make UBLive-26L-C64 an ideal solution for real-time video communication applications. We next illustrate the benefits UBLive-26L-C64 in the - Up to 50% in bit rate saving relative to H.263 context of a video conferencing - Lower processing delay using one-pass application. encoding - Higher error resilience Copyright © 2002 UB Video Inc. www.ubvideo.com 2-2002
  • 13. The major requirements for a typical video conferencing session are consistently good video quality even using a limited bandwidth, low delay and robustness to packet loss. In a video conferencing call involving audio, video and data, the video component typically consumes most of the available bandwidth for the call. Therefore, there would be a much wider acceptance of video conferencing systems over low bandwidth networks if the required bandwidth for the video component part would be significantly reduced. This is precisely where UBLive-26L-C64 brings most of its value, mainly reducing the required bandwidth by as much as 50% as compared to H.263-based video solutions. For example, many of the video conferencing calls currently take place at around 384 kbps, of which video consumes about 320 kbps. For essentially the same video quality that would be obtained with H.263 at 320 kbps, UB Video’s UBLive-26L-C64 requires as little as 160 kbps, hence enabling the conference call to take place using a lower bandwidth network. UBLive-26L-C64 also offers a low processing latency, a key requirement in video conferencing applications. Many of the existing video conferencing solutions still perform two-pass encoding in order to guarantee a satisfactory video quality, however the two-pass encoding method can introduce an objectionable delay during a conference call. UBLive-26L-C64 guarantees an excellent video quality even for single-pass encoding, thereby reducing processing latency. Although most of the current video conferencing calls take place over local private networks, there is still a certain level of packet loss that takes place during packet transmission. UBLive-26L-C64 offers error resilience features at the encoder side that combat effectively packet loss even when the loss rate is high. The UBLive-26L-C64 Demonstration Software: Description The UBLive-26L-C64x demonstrates the ability to do real-time H264 encoding and decoding with a feature set optimized for low-delay coding. The demonstration runs on the TMS3200C64x Network Video Developer's Kit (NVDK) from Texas Instruments that includes the 6415 DSP. A block diagram of the demonstration is shown in Figure 8. Video is captured from a camera and (optionally) scaled to SIF resolution. The video is then real-time encoded at the bit rates of 128 and 384 Kbps for SIF content and 1Mbps for full-screen content, and the bit stream is then fed back to a real-time decoder. The DSP applies format (4:2:2 to 4:2:0) and color conversion to the decoded video before display. The encoder and decoder modules have a documented API that makes it easy to integrate the modules in end equipment products. Figure 8. Block diagram of the UBLive-26L-C64 demo. For more information on UBLive-26L-C64, please contact UB Video at www.ubvideo.com. References [1] H.263 Standard - Overview and TMS320C6000 Implementation. UB Video Inc. www.ubvideo.com. [2] MPEG-4 Standard. UB Video Inc. www.ubvideo.com. Copyright © 2002 UB Video Inc. www.ubvideo.com 2-2002