SlideShare uma empresa Scribd logo
1 de 20
Baixar para ler offline
White paper


                          SCALABLE MEDIA PERSONALIZATION

                                                                                              Amos Kohn
                                                                                        September 2007

ABSTRACT
User expectations, competition and sheer revenue pressures are driving rapid development—and
operator acquisition--of highly complex media processing technologies.
Historically, cable operators provided ―one stream for all‖ service in both the analog and digital domains.
At most, they provided two to three streams for East and West Coast delivery. Video on Demand (VOD)
represented a first step toward personalization, using personalized delivery, in the form of ―pumping‖
and network QAM routing, in lieu of personalization of the media playout itself. In some cases,
personalized advertisement play-lists were also created. This resulted in massive deployments of VOD
servers and edge QAMs.
The second step in this evolution is the introduction of switched digital video, which takes the linear
delivery one step further to deliver a hybrid VOD/linear experience without applying any personal media
processing. Like previous personalization approaches, user-based processing is limited to network
pumping and routing, with no access to the actual media or ability to manipulate it for true
personalization.
True user personalization requires the generic ability to perform intensive media processing on a per
user basis. As of today, a STB-based approach to media personalization seems to be dominant. This
approach necessitates future deployment of more capable (thus more expensive) STBs. This approach,
although straight-forward, is incompatible with the need to lower costs, unify user experience, and retain
customers and other operator needs. The network approach, where per-user personalization is
completely or partially accomplished BEFORE the video reaches the STB (or any other user device)
delivers the same experience but has been explored only in a very limited fashion. However, this
approach has the most potential to benefit operators as it addresses most of the current and future
challenges that operators face.




                                                                                                         1
NETWORK-BASED PROCESSING TOOLKIT

The following defines a set of coding properties that are used as part of the media personalization
solution. As indicated below, one of the advantages of this solution is that it is standard-based, as are the
tools. The properties defined here are a combination of MPEG-4 (H.264 mostly) and MPEG-2. The
combination provides a solution for both coding schemes.
MPEG-4 is composed of a collection of "tools" built to support and enhance scalable composition
applications Among the tools discussed here are shape coding, motion estimation and compensation,
texture coding, error resilience, sprite coding and scalability.
Unlike MPEG-4, MPEG-2 provides a very limited set of functionality for scalable personalization. The
tools defined in this document are nevertheless sufficient to provide personalization in the MPEG-2
domain.


Object-Based Structure and Syntax

Content-based interactivity, The MPEG4 standard extends the traditional frame-based processing
towards the composition of several video objects superimposed on a background image. For the
proper rendering of the scene without disturbing artifacts on the border of video objects (VO), the
compressed stream contains the encoded shape of the VO representing video as objects rather than in
video frames, enables content-based applications. This, in turn, provides new levels of content
interactivity based on efficient representation of objects, object manipulation, bit stream editing and
object-based scalability.

An MPEG-4 visual scene may consist of one or more video objects. Each video object is characterized
by temporal and spatial information in the form of shape, motion and texture. The visual bit stream
provides a hierarchical description of a visual scene. Start codes, which are special code values, can
access each level of the hierarchy in the bitstream. The ability to process objects, layers and sequences
selectively is a significant enabler for scalable personalization. Hierarchical levels include:

      Visual Object Sequence (VS): MPEG-4 scene may include any 2-D or 3-D natural or synthetic
       objects. Those objects and sequences can be addressed individually based on the targeted user.

      Video Object (VO): A video object is linked to a certain 2-D element in the scene. A rectangular
       frame provides the simplest example, or it can be an arbitrarily shaped object that corresponds to
       an object or background of the scene.

      Video Object Layer (VOL): Video object encoding takes place in one of two modes, scalable or
       non-scalable, depending on the application represented in the video object layer (VOL). The VOL
       provides support for scalable coding.

      Group of Video Object Planes (GOV): Optional in nature, GOVs enable random access points
       into the bitstream by providing points where video object planes are independently encoded.
       MPEG-4 video consists of various video objects, rather than frames, allowing a true interactivity
       and manipulation of separate arbitrary object shape object with efficient scheduling scheme to
       speedup real-time computation.

                                                                                                           2
   Video Object Plane (VOP): VOPs are video objects sampled in time. They can either be sampled
       independently or dependently by using motion compensation. Rectangular shapes can represent
       a conventional video frame. A motion estimation and compensation technique is provided for
       interlaced digital video such as video object planes (VOPs). Predictor motion vectors for use in
       differentially encoding a current field coded macroblock are obtained using the median of motion
       vectors of surrounding blocks or macroblocks which will support high system scalability.

   Figure 1 below illustrates an object-based visual bitstream.

   A visual elementary stream compresses visual data of just one layer of one visual object. There is
   only one elementary stream (ES) per visual bitstream. Visual configuration information includes the
   visual object sequence (VOS), visual object (VO) and visual object layer (VOL). Visual configuration
   information must be associated with each ES.




   Figure 1: The visual bitstream format



Compression TOOLS

Intra Coded VOPS (I-VOPS): VOPS that are coded with information within the VOP, removing some of
the spatial redundancy. Inter coding makes use of temporal redundancies between frames by the
method of motion estimation and compensation: two modes of inter coding are provided for - prediction
based on a previous VOP (P-VOPs) and prediction based on a previous VOP and a future VOP (B-
VOPs). These tools are use in the content preparation stage to increase compression efficiency, error
resilience, and coding of different types of video objects.

Shape coding tools: MPEG4 provides tools for encoding arbitrary shaped objects. Binary shape
information defines which portions (pixels) of the object belong to the video object at a given time, and is
encoded by a motion compensated block-based technique that allows both lossless and lossy coding.
The technique allows for accurate representation of object that in turn improved accuracy of quality of
                                                                                                           3
the final composition, as well as assist the differentiation between video and non video objects within the
stream.

Sprite coding: Sprite is an image composed of pixels belonging to a video object visible throughout a
video sequence and an efficient and concise method for representation of background video object,
which is typically compressed with the object-based coding technique. Sprite has high compression
efficiency when a video frame contains the whole background that is at least visible once over a video
sequence.
MPEG4 H.264/AVC Scalable Video Coding (SVC): A method of achieving high efficiency of video
compression is the scalable extension of H.264/AVC, known as scalable video coding or SVC.
A scalable video bitstream contains the non-scalable base layer and one or more enhancement layers.
(The term ―Layer‖ in Video Coding Layer (VCL) is related to syntax layers such as: block, macroblock,
slice, etc., layers). The basic SVC design can be classified as layered video codec. In general, the coder
structure as well as the coding efficiency depends on the scalability space that is required by an
application. An enhancement layer may enhance the temporal resolution (i.e. the frame rate), the spatial
resolution, or the quality of the video content represented by the lower layer or part of it. The scalable
layers can be aggregated to a single transport stream, or transported independently.
Scalability is provided at the bitstream level, allowing for reduced complexity. Reduced spatial and/or
temporal resolution can be obtained by discarding NAL units (or network packets) from a global SVC bit-
stream that are not required for decoding the target resolution. NAL units contain motion information and
texture data. NAL units of Progressive Refinement (PR) slices can additionally be truncated in order to
further reduce the bit-rate and the associated reconstruction quality.




                                                                                                          4
NETWORK BASED PERSONALIZATION CONCEPT
Network-based personalization represents an evolution of the network infrastructure. The solution
includes devices which allow multi-point media processing, enables the network to target any user with
any device with any content. In this paper, we are focusing primarily on the cable market and TV
services. However, the concept is not confined to these areas.
The existing content flow remains intact regardless of how processing functionality is extended within
each of the network components, including the user device. This approach can accommodate the range
of available STBs, employ modifications based on user profiles, and support a variety of sources.
The methodology behind the concept anticipates that the in-and-out point of the system must support a
variety of containers, formats, profiles, rates and so forth. However, within the system, the manipulation
flow is unified for simplification and scalability. Network-based personalization can provide service to
incoming baseline (Low Resolutions), Standard Definition (SD) and High Definition (HD), formats and
support multiple containers (such as Flash, Windows Media, Quicktime, MPEG Transport Stream and
Real).
Network personalization requires an edge processing point and optionally, an ingest and user premise as
content manipulation locations. The conceptual flow of the solution is defined in figure 2 below.




                                                        Interact




        Prepare              Integrate               Create               Present




         Asset                                 Session



Figure 2: Virtual Flow: Network based personalization


The virtual flow and building blocks defined is generic and can be placed at different locations of the
network, co-located or remote. Specific examples of architecture will be reviewed later in this paper.


                                                                                                        5
)
At the ―preparation‖ point, media content is ingested and manipulated in several aspects: 1 Analysis of
the content and creation of relevant information (metadata), which will then accompany it across the flow.
2) Processing of the content for integration and creation, which includes manipulation such as changing
format, structure, resolution and rate. The outcome of the preparation stage is a single copy of the
incoming media, but in a form that includes data that will allow the other blocks to create multiple
personalized streams from it.


The ―integration‖ point is a transition point from asset focus to session focus. The block is all about
connecting, synchronizing prepared media streams with instructions and other data to create a complete
session specific media and data flow, to be provided later to the ―create‖ block.
―Create‖ and ―present‖ blocks are the final content processing steps where for a given session each
media stream is crafted according to the user, device and medium (in the ―create‖ block), then joined to
a visual experience at the ―present‖ block. The ―create‖ and ―present‖ blocks are intentionally defined
separately, in order to accommodate different end user device types and power. Further discussion of
this subject appears in the ―Power to the user section‖ below.




                                                                                                        6
PUTTING IT ALL TOGETHER


The proposed implementation of network-based personalization takes into account the set of tools and
the virtual building blocks defined above to create the required end result.
To support high level personal session-based services we propose to utilize the MPEG-4 toolkit which
enables scene-related information to be transmitted together with video, audio and data to a processor-
based network element in which an object-based scene is composed based on user device rendering
capabilities. Using MPEG4 authoring tools and applying BIFS (Binary Format for Scenes) encoding at
the content preparation stage the system will support efficiency enhancement of personalization stream
processing , specifically at the ―create‖ and ―present‖ stages. Different encoding levels are required to
support the same bitstream; for example varied network computational power will be required to process
the foreground, background and other data such as 2D/3D in the same in the same bitstream. Moreover,
some of the video rendering will be passed directly to the user reception device (STB) and will reduce
network image processing requirements.
The solution described in this paper utilizes a set of tools allowing the content creator to build multimedia
applications without any knowledge of the internal representation structure of an MPEG-4 scene. By
using an MPEG4 toolkit, the multi-media content is object-oriented with spatial - temporal attributes
which can be attached to it, including BIFS encoding scheme. The MPEG4 encoded objects address
video, audio and multimedia presentations such as 3D as defined by the authoring tools.
The solution is built on four network elements: Prepare, integrate, create and present. All four network
elements work together to ensure the highest processing efficiency and accommodate different service
scenarios such as legacy MPEG2 set top boxes; H.264 set top boxes with no object-based rendering
capabilities and finally, STBs with full MPEG4 object-based processing capabilities. Two-way feedback
between the STB, the edge network and the network-based stream processor will be established in
order to define what will be processed in each of the network stages.


PREPARE
At the prepare stage, the assumption is that incoming content is received or converted to support
MPEG4 toolkit encoding, generating content media in object based format. Using authoring tools to
upload content and create scene-related object information will support improved media compression
that will be transmitted and processing by the network. The object based scene will be created using
MPEG4 authoring tools and applying BIFS (Binary Format for Scenes) encoding to support the
integration and control of different audio/visual and synthetic objects seamlessly in a scene.
Compression and manipulation of visual content using MPEG4 toolkit introduces novel concept of a
Video Object Plane (VOP) and a sprite. Using video segmentation, each frame of an input video
sequence can be segmented into a number of VOPs, each of which may describe a physical object
within the scene. A sprite coding technique may be used to support a mosaic layout. It s based on large
image composed of pixels belonging to a video object visible throughout a video segment. It captures
spatio-temporal information in a very compact way.

Other tools also might be used at the prepare stage applied to improve the network processing and
reduced bandwidth, those includes I-VOPs - "Intra-coded Video Object Plane" that allow

                                                                                                           7
encoded/decoded based on its shape, motion and texture. Bidirectional Video Object Plane (B-VOP)
may be used to predict from a past and a future reference VOP for each object or shape motion vector
built from neighbouring motion vectors that were already encoded.
The output of the prepare stage is, per asset, set of object based information, coded as elementary
streams, packetized elementary streams and metadata. The different object layers and data can in turn
be transported as independent IP flows, over UDP, RTP, to the Integrate stage.
INTEGRATE
The session with the preparation stage will be an "object-based" session which is embodied mainly in
its visualization of several visual object types. The scalable core profile is required mostly because it
supports arbitrary-shaped coding, temporal/spatial scalability, etc. At the same time, the scalable core
profile will need to support computer graphics, such as 2D mesh, synthetic objects, etc. as part of the
range of scalable objects in the integration stage.

MPEG4 object-based coding allows separate encoding of foreground figures and background scenes.
Arbitrary shaped coding needs to be supported to maintain the quality of the input elements. It
includes shape information in the compressed stream.
In order to apply stream adaptation to support different delivery environments and available
bandwidths, temporal and spatial scalability are included in the system. Spatial scalability allows
addition of one or more enhancement VOL (video object layers) to the base VOL to achieve different
video scenes.

To Summarize, at the integrate stage, a user composed out of multiple incoming object based assets,
to create a the final, synchronized, video object layers and object planes. The output of the integrate
includes all the info and media require for the session; however at this point the media is still not tuned
to the specifics of the network, device and user, it is a super set of it. The streams will than be
transport to the ―create‖ and ―present‖ stages, where the final manipulation is done.

CREATE
The system part of MPEG-4 allows creation or viewing of a multimedia sequence with hybrid
elementary streams, which can be encoded and decoded with the best suitable codec for each
stream. However, to manipulate those streams synchronously and compose them onto a screen in
real time is computationally demanding. Therefore a temporal cache will be used in the ―create‖ stage
to store the encoded media streams. All of the ES (elementary streams) consist of either a multiplexed
(using the MPEG-4 defined FLEXMUX) stream or a single stream, but all of them have been
packetized by the MPEG-4 SL (sync layer). The uses of FLEXMUX and sync layer will allow grouping
of the elementary streams with a low multiplexing overhead at the ―prepare‖ and ―Integrate‖ stages,
where the SL will be used to synchronize bitstream delivery information from the previous stage to the
―create‖ stage.

In order to generate the relevant session (stream) the ―create‖ stage will use an HTTP submission to
ask for a desired media presentation. The submission will only contain the index of the preformatted
Binary Format for Scenes - BIFS for those of a pre-created and stored presentation or a text-based
description of the user’s authored presentation. BIFS coding also allow integration and control of
different audio/video objects seamlessly in a scene. The ―integrate‖ stage will receive the request and
will send the media to the ―create‖ stage, i.e. the BIFS stream together with the object descriptor in the
                                                                                                         8
form of an initial object descriptor stream. The MPEG-4 BIFS will allow integration and control of
different audio/video objects seamlessly in a scene.

If the client side can satisfy the decoding requirements, it will send a confirmation to the ―create‖ stage
to start the presentation; otherwise, the client will send its decoding and resolution capabilities to the
―create‖ stage. At this point it will repeatedly downgrade to a lower-profile until it meets the decoding
capabilities or will inform the ―present‖ stage to compose a stream that will satisfy the client decoding
device (i.e. H.264 or MPEG2).

The ―create‖ stage will initiate the establishment of the necessary sessions for the SD (scene
description) stream (BIFS format) and the OD (object description) stream referenced with the user
device. It will allow the user device to retrieve the compressed media stream by using the URL
contained in the ES descriptor stream in real time. The BIFS is used to lay out the media elementary
stream in the presentation, as it provides the spatial and temporal relationship of those objects by
referencing their ES_IDs.

If the ―create‖ stage needs to modify the received scene, such as by adding an enhancement layer to
the current scene based on user device or network capabilities, it can send a BIFS update command
to the ―integrate‖ stage and obtain a reference to the new media elementary stream.

The ―create‖ stage can handle multiple streams and sync between different objects and between the
different elementary streams of a single object (e.g., base layer and enhancement layer). The
synchronization layer is responsible for synchronizing the elementary streams. Each SL-packet
consists of an Access Unit (AU) or a fragment of an AU. An AU needs to have time stamps for
synchronization and constitutes the data unit that will be consumed by the decoder at the ―create‖
stage or the user device decoder. An AU consists of a Video Object Plan (VOP). Each AU will be
receiving by the decoder at the time instance specified by a Decoding Time Stamp (DTS).

The media is processed by the ―present‖ stage in such a way that MPEG objects are transcoded to
either an H.264 or MPEG2 transport stream utilizing stored motion vector information and macroblock
mode. The applicable process is defined based on user device rendering capabilities. When an
advanced user device with MPEG4 object layer decoding capabilities is the target, the ―present‖
processor acts as a stream adaptor, resizing where composition will be performed by the client device
(advanced STB).

PRESENT

The modularity of the coding tools, expressed as well-known MPEG profiles and levels, allows for
easy customization of the ―present‖ stage for a selected segment. For example, MPEG2 legacy STB
markets where full stream composition needs to be applied at the network vs. full MPEG4 scene
object-based advanced set top box capability where minimum stream preparation will need to be
applied by the network ―resent‖ stage.
Two extreme service scenarios might be applied as follows:
Network-based ―present‖: The ―present‖ function applies stream adaptation and resizing; composes the
network object elements; and applies transcoding functions to convert MPEG4 file-based format to either
MPEG2 stream-based format or MPEG4/AVC stream-based format.

                                                                                                         9
STB based ―present‖: The ―present‖ function might path through to the network the object elements after
rate adaptation and resizing to be composed and presented by the advanced user device
The ―present‖ functionality is based on client/network awareness. In general, media provisioning will be
based on metadata that will be generated by the client device and the network manger. Metadata will
include the following information:
      Video format. i.e. MPEG2, H.264. VC-1, MPEG4, QT etc.
      User device rendering capabilities
      User devise resolution format. i.e. SQCIF, QCIF, CIF, 4CIF, 16CIF
      Network bandwidth allocation for the session


“Present” stage performance
It is essential that the ―present‖ function be composed of object-based elements that use the defined set
of tools which present binary coded representation of individual audiovisual objects, text, graphics, and
synthetic objects. It composes Visual Object Sequence (VS), Video Object Layer (VOL) or any other
defined tool to a valid H.264 stream or MPEG2 stream in the resolution and the BW as it defined by
the client device and the network metadata feedback.
The elementary streams (scene data, visual data, etc.) will be received at the ―present‖ stage from the
―create‖ system element which allows scalable representations, alternate coding (bitrate, resolution,
etc.), enhanced with metadata and protection information. An object described by an ObjectDescriptor
will be sent from the content originator i.e. the ―prepare” stage, and provides simple meta-data
related to the object such as content creation information or chapter time layout. This descriptor also
contains all information related to stream setup, including synchronization information or initialization
data for decoders.
The BIFS (Binary Format for Scenes) at the ―present‖ stage will be used to place each object, with
various effects potentially applied to it, in a display which will be transcoded to an MPEG2 or H.264
stream.
STB-based ―present‖: Object reconstruction
The essence of MPEG4 lies in its object-oriented structure. As such, each object forms an independent
entity that may or may not be linked to other objects, spatially and temporally. This approach gives the
end user at the client side tremendous flexibility to interact with the multimedia presentation and
manipulate the different media objects. End users can change the spatial-temporal relationships among
media objects, turn on or shut down media objects. However, it will require difficult and complicated
session management and control architecture.
A remote client retrieves information regarding the media objects of interest, and composes a
presentation based on what is available and desired. The following communication messages between
the client device and ―present” stage will occur:
      The client requests a service by submitting the description of the presentation to the data
       controller (DC) at the ―present‖ stage side.
      The DC on the ―present‖ stage side controls the encoder/producer module to generate the
       corresponding scene descriptor, object descriptors, command descriptors and other media
                                                                                                      10
streams based upon the presentation description information submitted by the end user at the
       client side.
      Session control on the ―Create‖ stage side controls the session initiation, control and termination.
      Actual stream delivery commences after the client indicates that it is ready to receive and
       streams flow from the ―Create‖ Stage to the ―Present‖ client. After the decoding and composition
       procedures, the MPEG-4 presentation authored by the end user is rendered on his or her display.
It is required that the set top box client support the architectural design of the MPEG4 system decoder
model (SDM), which is defined to achieve media synchronization, buffer management, and timing when
reconstructing the compressed media data.
The session controller at the client side communicates with the session controller at the server (―Create‖
Stage) side to exchange session status information and session control data. The session controller
translates the user action into appropriate session control commands.


Network-based MPEG4 to H.264/AVC baseline profile transcoding
Transcoding from MPEG4 to H.264/AVC can be done in the spatial domain and compressed domain.
The most straightforward method is to fully decode each video frame and then completely re-encode it
with H.264. This approach is known as spatial domain video transcoding. It involves full decoding and re-
encoding and is therefore very computationally intensive.
Motion vector refinement and an efficient transcoding algorithm are used for transcoding the MPEG4
object-based scene to a H.264 stream. The algorithm exploits the side information from the decoding
stage to predict the coding modes and motion vectors of H.264 encoding. Both INTRA macroblock
(MBs) transcoding and INTER macroblock transcoding will be exploited by the transcoding algorithm at
the ―present‖ stage.
During the decoding stage, the incoming bitstream is parsed in order to reconstruct the spatial video
signal. During the decoding process, the prediction direction for INTRA coded macro blocs and motion
vectors are stored and then used in the coding process.
To get the highest transcoding efficiency by the ―present‖ stage, side information will be stored. During
the decoding process of MPEG4, a lot of side information (like MVs) is obtained. The ―present‖ stage
reuses the side information, which reduces the transcoding complexity compared to a full decode/re-
encode scenario. In the process of decoding the MPEG4 bitstream, the side information is stored and
used to facilitate the re-encoding process. In the transcoding process both MV estimation and coding
mode decisions are addressed by reusing the side information to reduce complexity and computation
power.
Network-based MPEG4 to MPEG2 transcoding
To support legacy STBs that have limited local processing capabilities and support only MPEG2
transport streams, a full decode-encode will be performed by the ―present‖ stage. However, the ―present‖
stage utilizes tools that have been used for the conversion of MPEG4 to H.264 in order to remove
complexity. Stored motion vector information and macroblock mode decision algorithms for inter-frame
prediction based on machine learning techniques will be used as part the MPEG4 to MPEG2 transcode
process. Since coding mode decisions take up the most of the resources in video transcoding, a fast
macro block (MB) mode estimation would lead to reduced complexity.

                                                                                                        11
The implementation presented above has the ability to incorporate in offline and realtime environment.
See appendix 2 for elaboration on real time implementation.




                                                                                                   12
BENEFITS OF NETWORK-BASED PERSONALIZATION
Deploying network-based processing, whether complete or hybrid, has significant benefits:
      A unified user experience is delivered across the various STB’s in the field;
      It is a presentation, future-proof cost model for low to high-end STBs.
      It utilizes existing the VOD environment, servers and infrastructure. Network-based processing
       accommodates low-end and future high-end systems, all under existing, managed operators’ on-
       demand systems. Legacy servers require more back office preparation, with minimal server
       processing power overhead, while newer servers can provide additional per-user processing and
       thus more personalization features.
      Rate utilization is optimized. Instead of consuming the multiplication of all streaming comprised in
       the user experience, network optimized processing reduces overhead significantly. In the
       extreme case, it may be a single stream with no overhead, instead of 4-5 times the available
       BW. In the common case, it has overhead of approximately 20%.
      Best quality of service fpr connected home optimization. By performing most or all the
       processing before hitting the home, the operator optimizes the bandwidth and experience across
       the user end devices, delivering best quality of service.
      Prevention of subscriber churn in favour of direct over-the-top (OTT). The operator has control
       over the edge network. Over-the-top providers do not. Media manipulation in the network can
       and will be done by OTT operators. However, unlike cable operators, they do not have control
       over the edge network, limiting the effectiveness of their action, unless there is a QOS
       agreement with the operator, in which case control stays in the operator’s hands.
      Maintaining the position of current and future ―smart pipe‖. Being aware of the end-user device
       and processing for it is critical for the operator to maintain processing capabilities that will allow
       migration to other areas such as mobile and 3D streaming.




                                                                                                          13
IMPLEMENTING NETWORK-BASED PERSONALIZATION
As indicated earlier in the document, the solution can be implemented in a variety of ways. In this
section, we present three of the options, all under a generic North America on-demand architecture. The
three options are: Hybrid network edge and back office; Network edge; and Hybrid home network.


Hybrid network edge and back office
As the user device powers up or the user starts using personalization features, the user client connects
with the session manager, identifies the user, his device-type and his personalization requirements. and
once resources are identified, starts a session. In this implementation the ―prepare‖ function is physically
separated from the other building blocks, and the user STB is not capable of relevant video processing/
rendering. Each incoming media is processed and extracted to create it for downstream personalization
as part of the standard ingest process. Once a session is initiated and the edge processing resources
are found, sets of media and metadata flows are propagated across the internal CDN to the ―integrate‖
step at the network edge. The set of flows include the different media flows, related metadata (which
includes target STB-based processing, source media characteristics, target content insertion information,
interactivity support and so forth. The metadata needs to be available for the edge to start processing the
session), objects, data from content provider/ advertiser and so forth.
After arrival at the edge, the ―integrate‖ function aligns the flow and passes it to the ―create‖ and
―present‖ functions, which in this case, generate a single, personally composed stream, accompanied
with relevant metadata, directed at a specific user.


   Back Office               Region                 Edge                Curve                   User



     IP             IP                       IP               HFC                         HFC


                                                                      Analog, Broadcast

   Realtime


    Offline                                                                          Wired
                          App      Session
                         Servers   Manager        Edge QAM                                      Legacy STB
                                                                    Media Over
                                                                    Broadband
                          AMS,      UERM
          Prepare         CDN                                       Legacy Media
                                                  Integrate
                                                  Compose
                                                                                   Wireless
                                                  Present




Figure 3: Hybrid back office and network edge



                                                                                                             14
As can be seen from Figure 3 above, the SMP (Scalable Media Personalization) session manager is
connecting between the user device and the network, influencing in real time the ―integrate‖, ―create‖ and
―compose‖ edge functions.
Network edge only
This application case is about doing all the processing on-demand, in real time. It is similar to the hybrid
case; however, instead of the ―prepare‖ function being located at the back office and working offline, all
functions in this case are on the same platform. As can be expected this option has significant
horsepower requirements for the ―prepare‖ function, since content needs to be ―prepared‖ in real time. In
this example, the existing flow is almost seamless, as the resource manager simply identifies it as
another network resource and manages it accordingly.

   Back Office              Region                 Edge                Curve                   User



    IP             IP                       IP               HFC                         HFC


                                                                     Analog, Broadcast

  Realtime


   Offline                                                                          Wired
                         App      Session
                        Servers   Manager        Edge QAM                                      Legacy STB
                                                                   Media Over
                                                                   Broadband
         Generic         AMS,      UERM
                         CDN                      Prepare
         Ingest                                                    Legacy Media
                                                 Integrate
                                                 Compose
                                                  Present                         Wireless




Figure 4: Network Edge




                                                                                                            15
Hybrid Home and Network
In the hybrid implementation, the end user device (STB in our case) was identified as one that is capable
of hosting the ―present‖ function. As a result, as can be seen from Figure 5, the ―present‖ function is
dislocated from the user home, while the system demarcation is the ―create‖ function. During the
session, multiple ―prepared‖ flows of data and media will arrive to the STB, taking significantly less
bandwidth versus the non-prepared options and consuming reduced CPU horsepower as part of the
―present‖ function.

   Back Office              Region                 Edge                Curve                   User



    IP             IP                       IP               HFC                         HFC


                                                                     Analog, Broadcast

  Realtime


   Offline                                                                          Wired
                         App      Session
                        Servers   Manager        Edge QAM                                       ADV STB
                                                                   Media Over
                                                                   Broadband
                         AMS,      UERM
         Prepare         CDN                                       Legacy Media
                                                 Integrate
                                                 Compose
                                                                                  Wireless




Figure 5: Hybrid Home and Network




                                                                                                          16
POWER SHIFTING TO THE USER
Although legacy STBs are indeed present in many homes, the overall processing horsepower at the
home is growing and will continue to grow. That means that the user device will be able to do more
processing at home and theoretically less in need of network-based assistance. At first glance this is
indeed the case. However, when the subject is delved into further, two main challenges reveal
themselves.
   1. The increase in user device capabilities and actual user expectations, comes back to the network
      as a direct increase in bandwidth utilization, which then reflects back on users’ experience and
      ability to run enhanced applications such as multi-view.
       For example, today’s next generations STBs support 800MIPS to 16000 MIPS versus the legacy
       20 to 1000 MIPS, with dedicated dual 400Mhz video graphics processors and dual 250-MHz
       audio processors (S-A/Cisco’s next-gen Zeus silicon platform).
       In Figure 6 below, the expected migration of media services into other home devices such as
       media centres and game consoles significantly increases available home processing power.

                              Processing Roadmap [TMIPS]

           3

         2.5

           2

         1.5

           1

         0.5

           0
                   2007            2008            2009           2010




Figure 6: Home Processing Power Roadmap


   2. No matter how ―fast and furious‖ processing power is in the home, users will always want more.
      Having home devices perform ALL the video processing increases utilization of CPU and
      memory and directly diminishes the performance of other applications.
In addition, as discussed earlier in the document, the increase in open standard home capabilities
substantially strengthens the threat of customer churn for the cable operators.
Network-based personalization is targeted at providing solutions to the above challenges. The approach
is to use network processing to help the user, improving his experience.



                                                                                                   17
By performing the ―prepare‖, ―integrate‖ and ―create‖ functions in the network, and leaving only the
―present‖ function to the user home, several key benefits are delivered which effectively address the
above challenges.
Network bandwidth utilization: The ―create‖ function drives down network bandwidth consumption.
The streams that are delivered to the user are no longer the complete, original media as before, but
rather only what is needed. For example, when looking at 1 HD and 2 SD in the same multi-view window,
each of the three streams will have the correct resolution and frame rate required at each given moment,
resulting in significant bandwidth savings, as can be seen in Figure 7.

             Bandwidth to the home example (1HD, 2SD)

   18
   16
   14
   12                                                    STB Only [Mbps]
   10
                                                         Hybrid [Mbps]
    8
    6                                                    Network Only [Mbps]
    4
    2
    0
               MPEG2                   H.264




Figure 7: 2SD, 1HD bandwidth to the home
CPU Processing power: As indicated in the ―putting it all together‖ section, our solution allows for object
layer selective composition. Also, the actual multi-view is created out of multiple resolutions and thus
there is no need for render-resize-compose functions at the user device, which in turn reduces the
overall CPU utilization.
Finally, the fact that the network can deliver the above benefits inherently drives power back to the hands
of the operator, who can deliver the best user experience.




                                                                                                        18
SUMMARY
Exceeding user expectation while maintaining a viable business case is becoming more challenging than
ever for the cable operator. As the weight is shifted to the home and broadband streaming, the operator
is forced to find new solutions to maintain leadership in the era of personalization and interactivity.
Network base personalization provides a balanced solution. The ability to maintain an open, standard
based solution, while being able to dynamically shift the processing balance based on user, device,
network and time, can provide the user and operator a ―golden‖ solution.




REFERENCES
   Ahmad, X. Wei, Y. Sun and Y.-Q. Zhang, "Video Transcoding: An Overview of Various Techniques
    and Research Issues," IEEE Transactions on Multimedia, Vol. 7, No. 5, pp. 793-04, Oct. 2005.
   ISO/IEC JTC 1/SC 29/WG 11, "Information technology-Coding of audio-visual objects, Part8:
    Carriage of MPEG-4 contents over IP networks (ISO/IEC 14496-8)― Jan. 2001.
   Ishfaq Ahmad Dept. of Computer Science and Engineering The University of Texas Arlington, TX
    ―MPEG-4 To H.264/AVC Transcoding‖.
   Haining Liu, Xiaoping Wei, and Magda El Zarki - ―Real Time Interactive MPEG-4 Client-Server‖
   ISO/IEC JTC 1/SC 29/WG 11- ―MPEG-4 Terminal Architecture‖
   ISO/IEC JTC1/SC29/WG11 – ―CODING OF MOVING PICTURES AND AUDIO‖
   ITU-T – ―The Advanced Video Coding Standard‖
   MPEG Video Group, Description of Core Experiments in SVC, ISO/IEC JTC1/SC 29 WG 11
    Document N6898, 2005
   John Watkinson ―THE MPEG HANDBOOK‖



ABOUT THE AUTHOR
Amos Kohn is Vice President of Business Development at Scopus Video Networks. He has more then
20 years of multi-national executive management experience in convergence technology development,
marketing, business strategy and solutions engineering at Telecom and new multimedia emerging
organizations. Prior to joining Scopus, Amos Kohn held a senior position at ICTV, Liberate Technologies
and Golden Channels.




                                                                                                    19
APPENDIX 1: STB BASED ADDRESSABLE ADVERTISING


In the home addressable advertising model, multiple user profiles in the same household are offered to
advertisers within the same ad slot. For example, within the same slot, multiple targeted ads will replace
the same program feed targeted at different ages of youth while another advertisement may target the
adults at the house (male, female) based on specific profiles. During the slot, youth will see one ad while
the adult will see another ad. Addressable advertising require more bandwidth to the home then
traditional zone-based advertisements. Granularity might step one level up, where the targeted
advertisement will target the household and not the user within a household. In this case, less bandwidth
will be required in a given serving area in comparison to the userbased targeted advertisement. The
impact of home addressability on the infrastructure of channels that are already in the digital tier and
enabled for local ad insertion will be similar to unicast VOD service bandwidth requirements.
In case of a four demographics scenario, for each ad zone, four times the bandwidth that has been
allocated for a linear ad will need to be added.


APPRENDIX 2: REALTIME IMPLEMENTATION
Processing in real time is defined by stream provisioning (fast motion estimation), stream complexity and
the size of the buffer at each stage.
The scenes as compositions of audiovisual objects (AVO's), support of hybrid coding of natural video
and 2D/3D graphics, and provision of advanced system and interoperability capabilities support real time
processing.
MPEG4 real time software encoding of arbitrarily shaped video objects (VO) is an important key in the
structure of the solution. The MPEG4 toolkit unites the advantages of block and pixel-recursive motion
estimation methods in one common scheme, leading to a fast hybrid recursive motion estimation which
supports MPEG4 processing.




                                                                                                        20

Mais conteúdo relacionado

Mais procurados

Complexity Analysis in Scalable Video Coding
Complexity Analysis in Scalable Video CodingComplexity Analysis in Scalable Video Coding
Complexity Analysis in Scalable Video CodingWaqas Tariq
 
MPEG-4 Developments
MPEG-4 DevelopmentsMPEG-4 Developments
MPEG-4 DevelopmentsMartin Uren
 
Vediocompressed
VediocompressedVediocompressed
Vediocompressedtangbinsen
 
GPU - HD Video White Paper
GPU - HD Video White PaperGPU - HD Video White Paper
GPU - HD Video White PaperBenson Tao
 
Bridging the gap between web and television
Bridging the gap between web and televisionBridging the gap between web and television
Bridging the gap between web and televisionMarius Preda PhD
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
DYNAMIC REGION OF INTEREST TRANSCODING FOR MULTIPOINT VIDEO ...
DYNAMIC REGION OF INTEREST TRANSCODING FOR MULTIPOINT VIDEO ...DYNAMIC REGION OF INTEREST TRANSCODING FOR MULTIPOINT VIDEO ...
DYNAMIC REGION OF INTEREST TRANSCODING FOR MULTIPOINT VIDEO ...Videoguy
 
MPEG Augmented Reality Tutorial
MPEG Augmented Reality TutorialMPEG Augmented Reality Tutorial
MPEG Augmented Reality TutorialMarius Preda PhD
 
Introduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag JainIntroduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag JainVideoguy
 
Compression: Video Compression (MPEG and others)
Compression: Video Compression (MPEG and others)Compression: Video Compression (MPEG and others)
Compression: Video Compression (MPEG and others)danishrafiq
 
Effects of gop on multiview video
Effects of gop on multiview videoEffects of gop on multiview video
Effects of gop on multiview videocsandit
 
Multimedia presentation video compression
Multimedia presentation video compressionMultimedia presentation video compression
Multimedia presentation video compressionLaLit DuBey
 
Video Compression Basics by sahil jain
Video Compression Basics by sahil jainVideo Compression Basics by sahil jain
Video Compression Basics by sahil jainSahil Jain
 
Video Compression Techniques
Video Compression TechniquesVideo Compression Techniques
Video Compression Techniquescnssources
 
MPEG Compression Standards
MPEG Compression StandardsMPEG Compression Standards
MPEG Compression StandardsAjay
 

Mais procurados (20)

10 fn s42
10 fn s4210 fn s42
10 fn s42
 
Complexity Analysis in Scalable Video Coding
Complexity Analysis in Scalable Video CodingComplexity Analysis in Scalable Video Coding
Complexity Analysis in Scalable Video Coding
 
MPEG-4 Developments
MPEG-4 DevelopmentsMPEG-4 Developments
MPEG-4 Developments
 
L0956974
L0956974L0956974
L0956974
 
Vediocompressed
VediocompressedVediocompressed
Vediocompressed
 
GPU - HD Video White Paper
GPU - HD Video White PaperGPU - HD Video White Paper
GPU - HD Video White Paper
 
Tutorial MPEG 3D Graphics
Tutorial MPEG 3D GraphicsTutorial MPEG 3D Graphics
Tutorial MPEG 3D Graphics
 
mpeg4
mpeg4mpeg4
mpeg4
 
Bridging the gap between web and television
Bridging the gap between web and televisionBridging the gap between web and television
Bridging the gap between web and television
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
DYNAMIC REGION OF INTEREST TRANSCODING FOR MULTIPOINT VIDEO ...
DYNAMIC REGION OF INTEREST TRANSCODING FOR MULTIPOINT VIDEO ...DYNAMIC REGION OF INTEREST TRANSCODING FOR MULTIPOINT VIDEO ...
DYNAMIC REGION OF INTEREST TRANSCODING FOR MULTIPOINT VIDEO ...
 
MPEG Augmented Reality Tutorial
MPEG Augmented Reality TutorialMPEG Augmented Reality Tutorial
MPEG Augmented Reality Tutorial
 
Introduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag JainIntroduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag Jain
 
Compression: Video Compression (MPEG and others)
Compression: Video Compression (MPEG and others)Compression: Video Compression (MPEG and others)
Compression: Video Compression (MPEG and others)
 
Effects of gop on multiview video
Effects of gop on multiview videoEffects of gop on multiview video
Effects of gop on multiview video
 
Multimedia presentation video compression
Multimedia presentation video compressionMultimedia presentation video compression
Multimedia presentation video compression
 
MPEG4 vs H.264
MPEG4 vs H.264MPEG4 vs H.264
MPEG4 vs H.264
 
Video Compression Basics by sahil jain
Video Compression Basics by sahil jainVideo Compression Basics by sahil jain
Video Compression Basics by sahil jain
 
Video Compression Techniques
Video Compression TechniquesVideo Compression Techniques
Video Compression Techniques
 
MPEG Compression Standards
MPEG Compression StandardsMPEG Compression Standards
MPEG Compression Standards
 

Semelhante a White Paper - Mpeg 4 Toolkit Approach

High Efficiency of Media Processing Amos K.
High Efficiency of Media Processing Amos K.High Efficiency of Media Processing Amos K.
High Efficiency of Media Processing Amos K.Amos Kohn
 
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Ijripublishers Ijri
 
1 state of-the-art and trends in scalable video
1 state of-the-art and trends in scalable video1 state of-the-art and trends in scalable video
1 state of-the-art and trends in scalable videoYogananda Patnaik
 
Video Streaming Compression for Wireless Multimedia Sensor Networks
Video Streaming Compression for Wireless Multimedia Sensor NetworksVideo Streaming Compression for Wireless Multimedia Sensor Networks
Video Streaming Compression for Wireless Multimedia Sensor NetworksIOSR Journals
 
Efficient Architecture for Variable Block Size Motion Estimation in H.264/AVC
Efficient Architecture for Variable Block Size Motion Estimation in H.264/AVCEfficient Architecture for Variable Block Size Motion Estimation in H.264/AVC
Efficient Architecture for Variable Block Size Motion Estimation in H.264/AVCIDES Editor
 
IRJET- Segmenting and Classifying the Moving Object from HEVC Compressed Surv...
IRJET- Segmenting and Classifying the Moving Object from HEVC Compressed Surv...IRJET- Segmenting and Classifying the Moving Object from HEVC Compressed Surv...
IRJET- Segmenting and Classifying the Moving Object from HEVC Compressed Surv...IRJET Journal
 
H2B2VS (HEVC hybrid broadcast broadband video services) – Building innovative...
H2B2VS (HEVC hybrid broadcast broadband video services) – Building innovative...H2B2VS (HEVC hybrid broadcast broadband video services) – Building innovative...
H2B2VS (HEVC hybrid broadcast broadband video services) – Building innovative...Raoul Monnier
 
11.performance evaluation of mpeg 0004www.iiste.org call for-paper video tran...
11.performance evaluation of mpeg 0004www.iiste.org call for-paper video tran...11.performance evaluation of mpeg 0004www.iiste.org call for-paper video tran...
11.performance evaluation of mpeg 0004www.iiste.org call for-paper video tran...Alexander Decker
 
Performance evaluation of mpeg 4 video transmission over ip-networks
Performance evaluation of mpeg 4 video transmission over ip-networksPerformance evaluation of mpeg 4 video transmission over ip-networks
Performance evaluation of mpeg 4 video transmission over ip-networksAlexander Decker
 
Efficient video indexing for fast motion video
Efficient video indexing for fast motion videoEfficient video indexing for fast motion video
Efficient video indexing for fast motion videoijcga
 
Partial encryption of compresed video
Partial encryption of compresed videoPartial encryption of compresed video
Partial encryption of compresed videoeSAT Publishing House
 
Partial encryption of compressed video
Partial encryption of compressed videoPartial encryption of compressed video
Partial encryption of compressed videoeSAT Journals
 
IRJET - Information Hiding in H.264/AVC using Digital Watermarking
IRJET -  	  Information Hiding in H.264/AVC using Digital WatermarkingIRJET -  	  Information Hiding in H.264/AVC using Digital Watermarking
IRJET - Information Hiding in H.264/AVC using Digital WatermarkingIRJET Journal
 
Paper id 2120148
Paper id 2120148Paper id 2120148
Paper id 2120148IJRAT
 
A FRAMEWORK FOR MOBILE VIDEO STREAMING AND VIDEO SHARING IN CLOUD
A FRAMEWORK FOR MOBILE VIDEO STREAMING AND VIDEO SHARING IN CLOUDA FRAMEWORK FOR MOBILE VIDEO STREAMING AND VIDEO SHARING IN CLOUD
A FRAMEWORK FOR MOBILE VIDEO STREAMING AND VIDEO SHARING IN CLOUDJournal For Research
 
Radvision scalable video coding whitepaper by face to face live
Radvision scalable video coding whitepaper by face to face liveRadvision scalable video coding whitepaper by face to face live
Radvision scalable video coding whitepaper by face to face liveFace to Face Live
 

Semelhante a White Paper - Mpeg 4 Toolkit Approach (20)

High Efficiency of Media Processing Amos K.
High Efficiency of Media Processing Amos K.High Efficiency of Media Processing Amos K.
High Efficiency of Media Processing Amos K.
 
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
 
1 state of-the-art and trends in scalable video
1 state of-the-art and trends in scalable video1 state of-the-art and trends in scalable video
1 state of-the-art and trends in scalable video
 
Video Streaming Compression for Wireless Multimedia Sensor Networks
Video Streaming Compression for Wireless Multimedia Sensor NetworksVideo Streaming Compression for Wireless Multimedia Sensor Networks
Video Streaming Compression for Wireless Multimedia Sensor Networks
 
Efficient Architecture for Variable Block Size Motion Estimation in H.264/AVC
Efficient Architecture for Variable Block Size Motion Estimation in H.264/AVCEfficient Architecture for Variable Block Size Motion Estimation in H.264/AVC
Efficient Architecture for Variable Block Size Motion Estimation in H.264/AVC
 
IRJET- Segmenting and Classifying the Moving Object from HEVC Compressed Surv...
IRJET- Segmenting and Classifying the Moving Object from HEVC Compressed Surv...IRJET- Segmenting and Classifying the Moving Object from HEVC Compressed Surv...
IRJET- Segmenting and Classifying the Moving Object from HEVC Compressed Surv...
 
H2B2VS (HEVC hybrid broadcast broadband video services) – Building innovative...
H2B2VS (HEVC hybrid broadcast broadband video services) – Building innovative...H2B2VS (HEVC hybrid broadcast broadband video services) – Building innovative...
H2B2VS (HEVC hybrid broadcast broadband video services) – Building innovative...
 
H264 final
H264 finalH264 final
H264 final
 
11.performance evaluation of mpeg 0004www.iiste.org call for-paper video tran...
11.performance evaluation of mpeg 0004www.iiste.org call for-paper video tran...11.performance evaluation of mpeg 0004www.iiste.org call for-paper video tran...
11.performance evaluation of mpeg 0004www.iiste.org call for-paper video tran...
 
Performance evaluation of mpeg 4 video transmission over ip-networks
Performance evaluation of mpeg 4 video transmission over ip-networksPerformance evaluation of mpeg 4 video transmission over ip-networks
Performance evaluation of mpeg 4 video transmission over ip-networks
 
[IJET-V1I2P1] Authors :Imran Ullah Khan ,Mohd. Javed Khan ,S.Hasan Saeed ,Nup...
[IJET-V1I2P1] Authors :Imran Ullah Khan ,Mohd. Javed Khan ,S.Hasan Saeed ,Nup...[IJET-V1I2P1] Authors :Imran Ullah Khan ,Mohd. Javed Khan ,S.Hasan Saeed ,Nup...
[IJET-V1I2P1] Authors :Imran Ullah Khan ,Mohd. Javed Khan ,S.Hasan Saeed ,Nup...
 
Efficient video indexing for fast motion video
Efficient video indexing for fast motion videoEfficient video indexing for fast motion video
Efficient video indexing for fast motion video
 
Partial encryption of compresed video
Partial encryption of compresed videoPartial encryption of compresed video
Partial encryption of compresed video
 
Partial encryption of compressed video
Partial encryption of compressed videoPartial encryption of compressed video
Partial encryption of compressed video
 
IRJET - Information Hiding in H.264/AVC using Digital Watermarking
IRJET -  	  Information Hiding in H.264/AVC using Digital WatermarkingIRJET -  	  Information Hiding in H.264/AVC using Digital Watermarking
IRJET - Information Hiding in H.264/AVC using Digital Watermarking
 
Paper id 2120148
Paper id 2120148Paper id 2120148
Paper id 2120148
 
A FRAMEWORK FOR MOBILE VIDEO STREAMING AND VIDEO SHARING IN CLOUD
A FRAMEWORK FOR MOBILE VIDEO STREAMING AND VIDEO SHARING IN CLOUDA FRAMEWORK FOR MOBILE VIDEO STREAMING AND VIDEO SHARING IN CLOUD
A FRAMEWORK FOR MOBILE VIDEO STREAMING AND VIDEO SHARING IN CLOUD
 
Radvision scalable video coding whitepaper by face to face live
Radvision scalable video coding whitepaper by face to face liveRadvision scalable video coding whitepaper by face to face live
Radvision scalable video coding whitepaper by face to face live
 
Publications
PublicationsPublications
Publications
 
Cg25492495
Cg25492495Cg25492495
Cg25492495
 

White Paper - Mpeg 4 Toolkit Approach

  • 1. White paper SCALABLE MEDIA PERSONALIZATION Amos Kohn September 2007 ABSTRACT User expectations, competition and sheer revenue pressures are driving rapid development—and operator acquisition--of highly complex media processing technologies. Historically, cable operators provided ―one stream for all‖ service in both the analog and digital domains. At most, they provided two to three streams for East and West Coast delivery. Video on Demand (VOD) represented a first step toward personalization, using personalized delivery, in the form of ―pumping‖ and network QAM routing, in lieu of personalization of the media playout itself. In some cases, personalized advertisement play-lists were also created. This resulted in massive deployments of VOD servers and edge QAMs. The second step in this evolution is the introduction of switched digital video, which takes the linear delivery one step further to deliver a hybrid VOD/linear experience without applying any personal media processing. Like previous personalization approaches, user-based processing is limited to network pumping and routing, with no access to the actual media or ability to manipulate it for true personalization. True user personalization requires the generic ability to perform intensive media processing on a per user basis. As of today, a STB-based approach to media personalization seems to be dominant. This approach necessitates future deployment of more capable (thus more expensive) STBs. This approach, although straight-forward, is incompatible with the need to lower costs, unify user experience, and retain customers and other operator needs. The network approach, where per-user personalization is completely or partially accomplished BEFORE the video reaches the STB (or any other user device) delivers the same experience but has been explored only in a very limited fashion. However, this approach has the most potential to benefit operators as it addresses most of the current and future challenges that operators face. 1
  • 2. NETWORK-BASED PROCESSING TOOLKIT The following defines a set of coding properties that are used as part of the media personalization solution. As indicated below, one of the advantages of this solution is that it is standard-based, as are the tools. The properties defined here are a combination of MPEG-4 (H.264 mostly) and MPEG-2. The combination provides a solution for both coding schemes. MPEG-4 is composed of a collection of "tools" built to support and enhance scalable composition applications Among the tools discussed here are shape coding, motion estimation and compensation, texture coding, error resilience, sprite coding and scalability. Unlike MPEG-4, MPEG-2 provides a very limited set of functionality for scalable personalization. The tools defined in this document are nevertheless sufficient to provide personalization in the MPEG-2 domain. Object-Based Structure and Syntax Content-based interactivity, The MPEG4 standard extends the traditional frame-based processing towards the composition of several video objects superimposed on a background image. For the proper rendering of the scene without disturbing artifacts on the border of video objects (VO), the compressed stream contains the encoded shape of the VO representing video as objects rather than in video frames, enables content-based applications. This, in turn, provides new levels of content interactivity based on efficient representation of objects, object manipulation, bit stream editing and object-based scalability. An MPEG-4 visual scene may consist of one or more video objects. Each video object is characterized by temporal and spatial information in the form of shape, motion and texture. The visual bit stream provides a hierarchical description of a visual scene. Start codes, which are special code values, can access each level of the hierarchy in the bitstream. The ability to process objects, layers and sequences selectively is a significant enabler for scalable personalization. Hierarchical levels include:  Visual Object Sequence (VS): MPEG-4 scene may include any 2-D or 3-D natural or synthetic objects. Those objects and sequences can be addressed individually based on the targeted user.  Video Object (VO): A video object is linked to a certain 2-D element in the scene. A rectangular frame provides the simplest example, or it can be an arbitrarily shaped object that corresponds to an object or background of the scene.  Video Object Layer (VOL): Video object encoding takes place in one of two modes, scalable or non-scalable, depending on the application represented in the video object layer (VOL). The VOL provides support for scalable coding.  Group of Video Object Planes (GOV): Optional in nature, GOVs enable random access points into the bitstream by providing points where video object planes are independently encoded. MPEG-4 video consists of various video objects, rather than frames, allowing a true interactivity and manipulation of separate arbitrary object shape object with efficient scheduling scheme to speedup real-time computation. 2
  • 3. Video Object Plane (VOP): VOPs are video objects sampled in time. They can either be sampled independently or dependently by using motion compensation. Rectangular shapes can represent a conventional video frame. A motion estimation and compensation technique is provided for interlaced digital video such as video object planes (VOPs). Predictor motion vectors for use in differentially encoding a current field coded macroblock are obtained using the median of motion vectors of surrounding blocks or macroblocks which will support high system scalability. Figure 1 below illustrates an object-based visual bitstream. A visual elementary stream compresses visual data of just one layer of one visual object. There is only one elementary stream (ES) per visual bitstream. Visual configuration information includes the visual object sequence (VOS), visual object (VO) and visual object layer (VOL). Visual configuration information must be associated with each ES. Figure 1: The visual bitstream format Compression TOOLS Intra Coded VOPS (I-VOPS): VOPS that are coded with information within the VOP, removing some of the spatial redundancy. Inter coding makes use of temporal redundancies between frames by the method of motion estimation and compensation: two modes of inter coding are provided for - prediction based on a previous VOP (P-VOPs) and prediction based on a previous VOP and a future VOP (B- VOPs). These tools are use in the content preparation stage to increase compression efficiency, error resilience, and coding of different types of video objects. Shape coding tools: MPEG4 provides tools for encoding arbitrary shaped objects. Binary shape information defines which portions (pixels) of the object belong to the video object at a given time, and is encoded by a motion compensated block-based technique that allows both lossless and lossy coding. The technique allows for accurate representation of object that in turn improved accuracy of quality of 3
  • 4. the final composition, as well as assist the differentiation between video and non video objects within the stream. Sprite coding: Sprite is an image composed of pixels belonging to a video object visible throughout a video sequence and an efficient and concise method for representation of background video object, which is typically compressed with the object-based coding technique. Sprite has high compression efficiency when a video frame contains the whole background that is at least visible once over a video sequence. MPEG4 H.264/AVC Scalable Video Coding (SVC): A method of achieving high efficiency of video compression is the scalable extension of H.264/AVC, known as scalable video coding or SVC. A scalable video bitstream contains the non-scalable base layer and one or more enhancement layers. (The term ―Layer‖ in Video Coding Layer (VCL) is related to syntax layers such as: block, macroblock, slice, etc., layers). The basic SVC design can be classified as layered video codec. In general, the coder structure as well as the coding efficiency depends on the scalability space that is required by an application. An enhancement layer may enhance the temporal resolution (i.e. the frame rate), the spatial resolution, or the quality of the video content represented by the lower layer or part of it. The scalable layers can be aggregated to a single transport stream, or transported independently. Scalability is provided at the bitstream level, allowing for reduced complexity. Reduced spatial and/or temporal resolution can be obtained by discarding NAL units (or network packets) from a global SVC bit- stream that are not required for decoding the target resolution. NAL units contain motion information and texture data. NAL units of Progressive Refinement (PR) slices can additionally be truncated in order to further reduce the bit-rate and the associated reconstruction quality. 4
  • 5. NETWORK BASED PERSONALIZATION CONCEPT Network-based personalization represents an evolution of the network infrastructure. The solution includes devices which allow multi-point media processing, enables the network to target any user with any device with any content. In this paper, we are focusing primarily on the cable market and TV services. However, the concept is not confined to these areas. The existing content flow remains intact regardless of how processing functionality is extended within each of the network components, including the user device. This approach can accommodate the range of available STBs, employ modifications based on user profiles, and support a variety of sources. The methodology behind the concept anticipates that the in-and-out point of the system must support a variety of containers, formats, profiles, rates and so forth. However, within the system, the manipulation flow is unified for simplification and scalability. Network-based personalization can provide service to incoming baseline (Low Resolutions), Standard Definition (SD) and High Definition (HD), formats and support multiple containers (such as Flash, Windows Media, Quicktime, MPEG Transport Stream and Real). Network personalization requires an edge processing point and optionally, an ingest and user premise as content manipulation locations. The conceptual flow of the solution is defined in figure 2 below. Interact Prepare Integrate Create Present Asset Session Figure 2: Virtual Flow: Network based personalization The virtual flow and building blocks defined is generic and can be placed at different locations of the network, co-located or remote. Specific examples of architecture will be reviewed later in this paper. 5
  • 6. ) At the ―preparation‖ point, media content is ingested and manipulated in several aspects: 1 Analysis of the content and creation of relevant information (metadata), which will then accompany it across the flow. 2) Processing of the content for integration and creation, which includes manipulation such as changing format, structure, resolution and rate. The outcome of the preparation stage is a single copy of the incoming media, but in a form that includes data that will allow the other blocks to create multiple personalized streams from it. The ―integration‖ point is a transition point from asset focus to session focus. The block is all about connecting, synchronizing prepared media streams with instructions and other data to create a complete session specific media and data flow, to be provided later to the ―create‖ block. ―Create‖ and ―present‖ blocks are the final content processing steps where for a given session each media stream is crafted according to the user, device and medium (in the ―create‖ block), then joined to a visual experience at the ―present‖ block. The ―create‖ and ―present‖ blocks are intentionally defined separately, in order to accommodate different end user device types and power. Further discussion of this subject appears in the ―Power to the user section‖ below. 6
  • 7. PUTTING IT ALL TOGETHER The proposed implementation of network-based personalization takes into account the set of tools and the virtual building blocks defined above to create the required end result. To support high level personal session-based services we propose to utilize the MPEG-4 toolkit which enables scene-related information to be transmitted together with video, audio and data to a processor- based network element in which an object-based scene is composed based on user device rendering capabilities. Using MPEG4 authoring tools and applying BIFS (Binary Format for Scenes) encoding at the content preparation stage the system will support efficiency enhancement of personalization stream processing , specifically at the ―create‖ and ―present‖ stages. Different encoding levels are required to support the same bitstream; for example varied network computational power will be required to process the foreground, background and other data such as 2D/3D in the same in the same bitstream. Moreover, some of the video rendering will be passed directly to the user reception device (STB) and will reduce network image processing requirements. The solution described in this paper utilizes a set of tools allowing the content creator to build multimedia applications without any knowledge of the internal representation structure of an MPEG-4 scene. By using an MPEG4 toolkit, the multi-media content is object-oriented with spatial - temporal attributes which can be attached to it, including BIFS encoding scheme. The MPEG4 encoded objects address video, audio and multimedia presentations such as 3D as defined by the authoring tools. The solution is built on four network elements: Prepare, integrate, create and present. All four network elements work together to ensure the highest processing efficiency and accommodate different service scenarios such as legacy MPEG2 set top boxes; H.264 set top boxes with no object-based rendering capabilities and finally, STBs with full MPEG4 object-based processing capabilities. Two-way feedback between the STB, the edge network and the network-based stream processor will be established in order to define what will be processed in each of the network stages. PREPARE At the prepare stage, the assumption is that incoming content is received or converted to support MPEG4 toolkit encoding, generating content media in object based format. Using authoring tools to upload content and create scene-related object information will support improved media compression that will be transmitted and processing by the network. The object based scene will be created using MPEG4 authoring tools and applying BIFS (Binary Format for Scenes) encoding to support the integration and control of different audio/visual and synthetic objects seamlessly in a scene. Compression and manipulation of visual content using MPEG4 toolkit introduces novel concept of a Video Object Plane (VOP) and a sprite. Using video segmentation, each frame of an input video sequence can be segmented into a number of VOPs, each of which may describe a physical object within the scene. A sprite coding technique may be used to support a mosaic layout. It s based on large image composed of pixels belonging to a video object visible throughout a video segment. It captures spatio-temporal information in a very compact way. Other tools also might be used at the prepare stage applied to improve the network processing and reduced bandwidth, those includes I-VOPs - "Intra-coded Video Object Plane" that allow 7
  • 8. encoded/decoded based on its shape, motion and texture. Bidirectional Video Object Plane (B-VOP) may be used to predict from a past and a future reference VOP for each object or shape motion vector built from neighbouring motion vectors that were already encoded. The output of the prepare stage is, per asset, set of object based information, coded as elementary streams, packetized elementary streams and metadata. The different object layers and data can in turn be transported as independent IP flows, over UDP, RTP, to the Integrate stage. INTEGRATE The session with the preparation stage will be an "object-based" session which is embodied mainly in its visualization of several visual object types. The scalable core profile is required mostly because it supports arbitrary-shaped coding, temporal/spatial scalability, etc. At the same time, the scalable core profile will need to support computer graphics, such as 2D mesh, synthetic objects, etc. as part of the range of scalable objects in the integration stage. MPEG4 object-based coding allows separate encoding of foreground figures and background scenes. Arbitrary shaped coding needs to be supported to maintain the quality of the input elements. It includes shape information in the compressed stream. In order to apply stream adaptation to support different delivery environments and available bandwidths, temporal and spatial scalability are included in the system. Spatial scalability allows addition of one or more enhancement VOL (video object layers) to the base VOL to achieve different video scenes. To Summarize, at the integrate stage, a user composed out of multiple incoming object based assets, to create a the final, synchronized, video object layers and object planes. The output of the integrate includes all the info and media require for the session; however at this point the media is still not tuned to the specifics of the network, device and user, it is a super set of it. The streams will than be transport to the ―create‖ and ―present‖ stages, where the final manipulation is done. CREATE The system part of MPEG-4 allows creation or viewing of a multimedia sequence with hybrid elementary streams, which can be encoded and decoded with the best suitable codec for each stream. However, to manipulate those streams synchronously and compose them onto a screen in real time is computationally demanding. Therefore a temporal cache will be used in the ―create‖ stage to store the encoded media streams. All of the ES (elementary streams) consist of either a multiplexed (using the MPEG-4 defined FLEXMUX) stream or a single stream, but all of them have been packetized by the MPEG-4 SL (sync layer). The uses of FLEXMUX and sync layer will allow grouping of the elementary streams with a low multiplexing overhead at the ―prepare‖ and ―Integrate‖ stages, where the SL will be used to synchronize bitstream delivery information from the previous stage to the ―create‖ stage. In order to generate the relevant session (stream) the ―create‖ stage will use an HTTP submission to ask for a desired media presentation. The submission will only contain the index of the preformatted Binary Format for Scenes - BIFS for those of a pre-created and stored presentation or a text-based description of the user’s authored presentation. BIFS coding also allow integration and control of different audio/video objects seamlessly in a scene. The ―integrate‖ stage will receive the request and will send the media to the ―create‖ stage, i.e. the BIFS stream together with the object descriptor in the 8
  • 9. form of an initial object descriptor stream. The MPEG-4 BIFS will allow integration and control of different audio/video objects seamlessly in a scene. If the client side can satisfy the decoding requirements, it will send a confirmation to the ―create‖ stage to start the presentation; otherwise, the client will send its decoding and resolution capabilities to the ―create‖ stage. At this point it will repeatedly downgrade to a lower-profile until it meets the decoding capabilities or will inform the ―present‖ stage to compose a stream that will satisfy the client decoding device (i.e. H.264 or MPEG2). The ―create‖ stage will initiate the establishment of the necessary sessions for the SD (scene description) stream (BIFS format) and the OD (object description) stream referenced with the user device. It will allow the user device to retrieve the compressed media stream by using the URL contained in the ES descriptor stream in real time. The BIFS is used to lay out the media elementary stream in the presentation, as it provides the spatial and temporal relationship of those objects by referencing their ES_IDs. If the ―create‖ stage needs to modify the received scene, such as by adding an enhancement layer to the current scene based on user device or network capabilities, it can send a BIFS update command to the ―integrate‖ stage and obtain a reference to the new media elementary stream. The ―create‖ stage can handle multiple streams and sync between different objects and between the different elementary streams of a single object (e.g., base layer and enhancement layer). The synchronization layer is responsible for synchronizing the elementary streams. Each SL-packet consists of an Access Unit (AU) or a fragment of an AU. An AU needs to have time stamps for synchronization and constitutes the data unit that will be consumed by the decoder at the ―create‖ stage or the user device decoder. An AU consists of a Video Object Plan (VOP). Each AU will be receiving by the decoder at the time instance specified by a Decoding Time Stamp (DTS). The media is processed by the ―present‖ stage in such a way that MPEG objects are transcoded to either an H.264 or MPEG2 transport stream utilizing stored motion vector information and macroblock mode. The applicable process is defined based on user device rendering capabilities. When an advanced user device with MPEG4 object layer decoding capabilities is the target, the ―present‖ processor acts as a stream adaptor, resizing where composition will be performed by the client device (advanced STB). PRESENT The modularity of the coding tools, expressed as well-known MPEG profiles and levels, allows for easy customization of the ―present‖ stage for a selected segment. For example, MPEG2 legacy STB markets where full stream composition needs to be applied at the network vs. full MPEG4 scene object-based advanced set top box capability where minimum stream preparation will need to be applied by the network ―resent‖ stage. Two extreme service scenarios might be applied as follows: Network-based ―present‖: The ―present‖ function applies stream adaptation and resizing; composes the network object elements; and applies transcoding functions to convert MPEG4 file-based format to either MPEG2 stream-based format or MPEG4/AVC stream-based format. 9
  • 10. STB based ―present‖: The ―present‖ function might path through to the network the object elements after rate adaptation and resizing to be composed and presented by the advanced user device The ―present‖ functionality is based on client/network awareness. In general, media provisioning will be based on metadata that will be generated by the client device and the network manger. Metadata will include the following information:  Video format. i.e. MPEG2, H.264. VC-1, MPEG4, QT etc.  User device rendering capabilities  User devise resolution format. i.e. SQCIF, QCIF, CIF, 4CIF, 16CIF  Network bandwidth allocation for the session “Present” stage performance It is essential that the ―present‖ function be composed of object-based elements that use the defined set of tools which present binary coded representation of individual audiovisual objects, text, graphics, and synthetic objects. It composes Visual Object Sequence (VS), Video Object Layer (VOL) or any other defined tool to a valid H.264 stream or MPEG2 stream in the resolution and the BW as it defined by the client device and the network metadata feedback. The elementary streams (scene data, visual data, etc.) will be received at the ―present‖ stage from the ―create‖ system element which allows scalable representations, alternate coding (bitrate, resolution, etc.), enhanced with metadata and protection information. An object described by an ObjectDescriptor will be sent from the content originator i.e. the ―prepare” stage, and provides simple meta-data related to the object such as content creation information or chapter time layout. This descriptor also contains all information related to stream setup, including synchronization information or initialization data for decoders. The BIFS (Binary Format for Scenes) at the ―present‖ stage will be used to place each object, with various effects potentially applied to it, in a display which will be transcoded to an MPEG2 or H.264 stream. STB-based ―present‖: Object reconstruction The essence of MPEG4 lies in its object-oriented structure. As such, each object forms an independent entity that may or may not be linked to other objects, spatially and temporally. This approach gives the end user at the client side tremendous flexibility to interact with the multimedia presentation and manipulate the different media objects. End users can change the spatial-temporal relationships among media objects, turn on or shut down media objects. However, it will require difficult and complicated session management and control architecture. A remote client retrieves information regarding the media objects of interest, and composes a presentation based on what is available and desired. The following communication messages between the client device and ―present” stage will occur:  The client requests a service by submitting the description of the presentation to the data controller (DC) at the ―present‖ stage side.  The DC on the ―present‖ stage side controls the encoder/producer module to generate the corresponding scene descriptor, object descriptors, command descriptors and other media 10
  • 11. streams based upon the presentation description information submitted by the end user at the client side.  Session control on the ―Create‖ stage side controls the session initiation, control and termination.  Actual stream delivery commences after the client indicates that it is ready to receive and streams flow from the ―Create‖ Stage to the ―Present‖ client. After the decoding and composition procedures, the MPEG-4 presentation authored by the end user is rendered on his or her display. It is required that the set top box client support the architectural design of the MPEG4 system decoder model (SDM), which is defined to achieve media synchronization, buffer management, and timing when reconstructing the compressed media data. The session controller at the client side communicates with the session controller at the server (―Create‖ Stage) side to exchange session status information and session control data. The session controller translates the user action into appropriate session control commands. Network-based MPEG4 to H.264/AVC baseline profile transcoding Transcoding from MPEG4 to H.264/AVC can be done in the spatial domain and compressed domain. The most straightforward method is to fully decode each video frame and then completely re-encode it with H.264. This approach is known as spatial domain video transcoding. It involves full decoding and re- encoding and is therefore very computationally intensive. Motion vector refinement and an efficient transcoding algorithm are used for transcoding the MPEG4 object-based scene to a H.264 stream. The algorithm exploits the side information from the decoding stage to predict the coding modes and motion vectors of H.264 encoding. Both INTRA macroblock (MBs) transcoding and INTER macroblock transcoding will be exploited by the transcoding algorithm at the ―present‖ stage. During the decoding stage, the incoming bitstream is parsed in order to reconstruct the spatial video signal. During the decoding process, the prediction direction for INTRA coded macro blocs and motion vectors are stored and then used in the coding process. To get the highest transcoding efficiency by the ―present‖ stage, side information will be stored. During the decoding process of MPEG4, a lot of side information (like MVs) is obtained. The ―present‖ stage reuses the side information, which reduces the transcoding complexity compared to a full decode/re- encode scenario. In the process of decoding the MPEG4 bitstream, the side information is stored and used to facilitate the re-encoding process. In the transcoding process both MV estimation and coding mode decisions are addressed by reusing the side information to reduce complexity and computation power. Network-based MPEG4 to MPEG2 transcoding To support legacy STBs that have limited local processing capabilities and support only MPEG2 transport streams, a full decode-encode will be performed by the ―present‖ stage. However, the ―present‖ stage utilizes tools that have been used for the conversion of MPEG4 to H.264 in order to remove complexity. Stored motion vector information and macroblock mode decision algorithms for inter-frame prediction based on machine learning techniques will be used as part the MPEG4 to MPEG2 transcode process. Since coding mode decisions take up the most of the resources in video transcoding, a fast macro block (MB) mode estimation would lead to reduced complexity. 11
  • 12. The implementation presented above has the ability to incorporate in offline and realtime environment. See appendix 2 for elaboration on real time implementation. 12
  • 13. BENEFITS OF NETWORK-BASED PERSONALIZATION Deploying network-based processing, whether complete or hybrid, has significant benefits:  A unified user experience is delivered across the various STB’s in the field;  It is a presentation, future-proof cost model for low to high-end STBs.  It utilizes existing the VOD environment, servers and infrastructure. Network-based processing accommodates low-end and future high-end systems, all under existing, managed operators’ on- demand systems. Legacy servers require more back office preparation, with minimal server processing power overhead, while newer servers can provide additional per-user processing and thus more personalization features.  Rate utilization is optimized. Instead of consuming the multiplication of all streaming comprised in the user experience, network optimized processing reduces overhead significantly. In the extreme case, it may be a single stream with no overhead, instead of 4-5 times the available BW. In the common case, it has overhead of approximately 20%.  Best quality of service fpr connected home optimization. By performing most or all the processing before hitting the home, the operator optimizes the bandwidth and experience across the user end devices, delivering best quality of service.  Prevention of subscriber churn in favour of direct over-the-top (OTT). The operator has control over the edge network. Over-the-top providers do not. Media manipulation in the network can and will be done by OTT operators. However, unlike cable operators, they do not have control over the edge network, limiting the effectiveness of their action, unless there is a QOS agreement with the operator, in which case control stays in the operator’s hands.  Maintaining the position of current and future ―smart pipe‖. Being aware of the end-user device and processing for it is critical for the operator to maintain processing capabilities that will allow migration to other areas such as mobile and 3D streaming. 13
  • 14. IMPLEMENTING NETWORK-BASED PERSONALIZATION As indicated earlier in the document, the solution can be implemented in a variety of ways. In this section, we present three of the options, all under a generic North America on-demand architecture. The three options are: Hybrid network edge and back office; Network edge; and Hybrid home network. Hybrid network edge and back office As the user device powers up or the user starts using personalization features, the user client connects with the session manager, identifies the user, his device-type and his personalization requirements. and once resources are identified, starts a session. In this implementation the ―prepare‖ function is physically separated from the other building blocks, and the user STB is not capable of relevant video processing/ rendering. Each incoming media is processed and extracted to create it for downstream personalization as part of the standard ingest process. Once a session is initiated and the edge processing resources are found, sets of media and metadata flows are propagated across the internal CDN to the ―integrate‖ step at the network edge. The set of flows include the different media flows, related metadata (which includes target STB-based processing, source media characteristics, target content insertion information, interactivity support and so forth. The metadata needs to be available for the edge to start processing the session), objects, data from content provider/ advertiser and so forth. After arrival at the edge, the ―integrate‖ function aligns the flow and passes it to the ―create‖ and ―present‖ functions, which in this case, generate a single, personally composed stream, accompanied with relevant metadata, directed at a specific user. Back Office Region Edge Curve User IP IP IP HFC HFC Analog, Broadcast Realtime Offline Wired App Session Servers Manager Edge QAM Legacy STB Media Over Broadband AMS, UERM Prepare CDN Legacy Media Integrate Compose Wireless Present Figure 3: Hybrid back office and network edge 14
  • 15. As can be seen from Figure 3 above, the SMP (Scalable Media Personalization) session manager is connecting between the user device and the network, influencing in real time the ―integrate‖, ―create‖ and ―compose‖ edge functions. Network edge only This application case is about doing all the processing on-demand, in real time. It is similar to the hybrid case; however, instead of the ―prepare‖ function being located at the back office and working offline, all functions in this case are on the same platform. As can be expected this option has significant horsepower requirements for the ―prepare‖ function, since content needs to be ―prepared‖ in real time. In this example, the existing flow is almost seamless, as the resource manager simply identifies it as another network resource and manages it accordingly. Back Office Region Edge Curve User IP IP IP HFC HFC Analog, Broadcast Realtime Offline Wired App Session Servers Manager Edge QAM Legacy STB Media Over Broadband Generic AMS, UERM CDN Prepare Ingest Legacy Media Integrate Compose Present Wireless Figure 4: Network Edge 15
  • 16. Hybrid Home and Network In the hybrid implementation, the end user device (STB in our case) was identified as one that is capable of hosting the ―present‖ function. As a result, as can be seen from Figure 5, the ―present‖ function is dislocated from the user home, while the system demarcation is the ―create‖ function. During the session, multiple ―prepared‖ flows of data and media will arrive to the STB, taking significantly less bandwidth versus the non-prepared options and consuming reduced CPU horsepower as part of the ―present‖ function. Back Office Region Edge Curve User IP IP IP HFC HFC Analog, Broadcast Realtime Offline Wired App Session Servers Manager Edge QAM ADV STB Media Over Broadband AMS, UERM Prepare CDN Legacy Media Integrate Compose Wireless Figure 5: Hybrid Home and Network 16
  • 17. POWER SHIFTING TO THE USER Although legacy STBs are indeed present in many homes, the overall processing horsepower at the home is growing and will continue to grow. That means that the user device will be able to do more processing at home and theoretically less in need of network-based assistance. At first glance this is indeed the case. However, when the subject is delved into further, two main challenges reveal themselves. 1. The increase in user device capabilities and actual user expectations, comes back to the network as a direct increase in bandwidth utilization, which then reflects back on users’ experience and ability to run enhanced applications such as multi-view. For example, today’s next generations STBs support 800MIPS to 16000 MIPS versus the legacy 20 to 1000 MIPS, with dedicated dual 400Mhz video graphics processors and dual 250-MHz audio processors (S-A/Cisco’s next-gen Zeus silicon platform). In Figure 6 below, the expected migration of media services into other home devices such as media centres and game consoles significantly increases available home processing power. Processing Roadmap [TMIPS] 3 2.5 2 1.5 1 0.5 0 2007 2008 2009 2010 Figure 6: Home Processing Power Roadmap 2. No matter how ―fast and furious‖ processing power is in the home, users will always want more. Having home devices perform ALL the video processing increases utilization of CPU and memory and directly diminishes the performance of other applications. In addition, as discussed earlier in the document, the increase in open standard home capabilities substantially strengthens the threat of customer churn for the cable operators. Network-based personalization is targeted at providing solutions to the above challenges. The approach is to use network processing to help the user, improving his experience. 17
  • 18. By performing the ―prepare‖, ―integrate‖ and ―create‖ functions in the network, and leaving only the ―present‖ function to the user home, several key benefits are delivered which effectively address the above challenges. Network bandwidth utilization: The ―create‖ function drives down network bandwidth consumption. The streams that are delivered to the user are no longer the complete, original media as before, but rather only what is needed. For example, when looking at 1 HD and 2 SD in the same multi-view window, each of the three streams will have the correct resolution and frame rate required at each given moment, resulting in significant bandwidth savings, as can be seen in Figure 7. Bandwidth to the home example (1HD, 2SD) 18 16 14 12 STB Only [Mbps] 10 Hybrid [Mbps] 8 6 Network Only [Mbps] 4 2 0 MPEG2 H.264 Figure 7: 2SD, 1HD bandwidth to the home CPU Processing power: As indicated in the ―putting it all together‖ section, our solution allows for object layer selective composition. Also, the actual multi-view is created out of multiple resolutions and thus there is no need for render-resize-compose functions at the user device, which in turn reduces the overall CPU utilization. Finally, the fact that the network can deliver the above benefits inherently drives power back to the hands of the operator, who can deliver the best user experience. 18
  • 19. SUMMARY Exceeding user expectation while maintaining a viable business case is becoming more challenging than ever for the cable operator. As the weight is shifted to the home and broadband streaming, the operator is forced to find new solutions to maintain leadership in the era of personalization and interactivity. Network base personalization provides a balanced solution. The ability to maintain an open, standard based solution, while being able to dynamically shift the processing balance based on user, device, network and time, can provide the user and operator a ―golden‖ solution. REFERENCES  Ahmad, X. Wei, Y. Sun and Y.-Q. Zhang, "Video Transcoding: An Overview of Various Techniques and Research Issues," IEEE Transactions on Multimedia, Vol. 7, No. 5, pp. 793-04, Oct. 2005.  ISO/IEC JTC 1/SC 29/WG 11, "Information technology-Coding of audio-visual objects, Part8: Carriage of MPEG-4 contents over IP networks (ISO/IEC 14496-8)― Jan. 2001.  Ishfaq Ahmad Dept. of Computer Science and Engineering The University of Texas Arlington, TX ―MPEG-4 To H.264/AVC Transcoding‖.  Haining Liu, Xiaoping Wei, and Magda El Zarki - ―Real Time Interactive MPEG-4 Client-Server‖  ISO/IEC JTC 1/SC 29/WG 11- ―MPEG-4 Terminal Architecture‖  ISO/IEC JTC1/SC29/WG11 – ―CODING OF MOVING PICTURES AND AUDIO‖  ITU-T – ―The Advanced Video Coding Standard‖  MPEG Video Group, Description of Core Experiments in SVC, ISO/IEC JTC1/SC 29 WG 11 Document N6898, 2005  John Watkinson ―THE MPEG HANDBOOK‖ ABOUT THE AUTHOR Amos Kohn is Vice President of Business Development at Scopus Video Networks. He has more then 20 years of multi-national executive management experience in convergence technology development, marketing, business strategy and solutions engineering at Telecom and new multimedia emerging organizations. Prior to joining Scopus, Amos Kohn held a senior position at ICTV, Liberate Technologies and Golden Channels. 19
  • 20. APPENDIX 1: STB BASED ADDRESSABLE ADVERTISING In the home addressable advertising model, multiple user profiles in the same household are offered to advertisers within the same ad slot. For example, within the same slot, multiple targeted ads will replace the same program feed targeted at different ages of youth while another advertisement may target the adults at the house (male, female) based on specific profiles. During the slot, youth will see one ad while the adult will see another ad. Addressable advertising require more bandwidth to the home then traditional zone-based advertisements. Granularity might step one level up, where the targeted advertisement will target the household and not the user within a household. In this case, less bandwidth will be required in a given serving area in comparison to the userbased targeted advertisement. The impact of home addressability on the infrastructure of channels that are already in the digital tier and enabled for local ad insertion will be similar to unicast VOD service bandwidth requirements. In case of a four demographics scenario, for each ad zone, four times the bandwidth that has been allocated for a linear ad will need to be added. APPRENDIX 2: REALTIME IMPLEMENTATION Processing in real time is defined by stream provisioning (fast motion estimation), stream complexity and the size of the buffer at each stage. The scenes as compositions of audiovisual objects (AVO's), support of hybrid coding of natural video and 2D/3D graphics, and provision of advanced system and interoperability capabilities support real time processing. MPEG4 real time software encoding of arbitrarily shaped video objects (VO) is an important key in the structure of the solution. The MPEG4 toolkit unites the advantages of block and pixel-recursive motion estimation methods in one common scheme, leading to a fast hybrid recursive motion estimation which supports MPEG4 processing. 20