SlideShare a Scribd company logo
1 of 8
Download to read offline
DYNAMIC REGION OF INTEREST TRANSCODING
                      FOR MULTIPOINT VIDEO CONFERENCING
                                              Chia-Wen Lin
                        Department of Computer Science and Information Engineering,
                         National Chung Cheng University, Chiayi, Taiwan, R.O.C.
                                           cwlin@cs.ccu.edu.tw
                                             Yung Chang Chen
                                    Department of Electrical Engineering
                          National Tsing Hua University, Hsinchu, Taiwan R.O.C.
                                        ycchen@benz.ee.nthu.edu.tw
                                              Ming-Ting Sun
                                    Department of Electrical Engineering
                            University of Washington, Seattle, Washington, USA
                                          sun@ee.washington.edu

                     ABSTRACT                                 centralized server. In this scenario, multiple conferees are
                                                              connected to the central server, referred to as the
This paper presents a region of interest transcoding
                                                              Multipoint Control Unit (MCU), which coordinates and
scheme for multipoint video conferencing to enhance the
                                                              distributes video and audio streams among multiple
visual quality. In a multipoint videoconference, usually
                                                              participants in a video conference according to the
there are only one or two active conferees at one time
                                                              channel bandwidth requirement of each conferee. A
which are the regions of interest to the other conferees
                                                              video transcoder [2-5] is included in the MCU to
involved. We propose a Dynamic Sub-Window Skipping
                                                              combine the multiple incoming encoded video streams
(DSWS) scheme to firstly identify the active participants
                                                              from the various conferees into a single coded video
from the multiple incoming encoded video streams by
                                                              stream and send the re-encoded bit-stream back to each
calculating the motion activity of each sub-window, and
                                                              participant over the same channel with the required bit-
secondly reduce the frame-rates of the motion inactive
                                                              rate and format for decoding and presentation. In the case
participants by skipping these less-important sub-
                                                              of a multipoint videoconference over PSTN (Public
windows. The bits saved by the skipping operation are
                                                              Switch Telephone Network) or ISDN (Integrated Service
reallocated to the active sub-windows to enhance the
                                                              Digital Network), the channel bandwidth is constant and
regions of interest. We also propose a low-complexity
                                                              symmetrical. Assuming each conferee has a channel
scheme to compose and trace the unavailable motion
                                                              bandwidth of B kb/s, then MCU receives each conferee’s
vectors with a good accuracy in the dropped inactive
                                                              video at B kb/s each, decodes and combines the videos,
sub-windows after performing the DSWS. Simulation
                                                              and re-encodes the combined video at B kbps so as to
results show that the proposed methods not only
                                                              meet the channel bandwidth requirements for sending
significantly improve the visual quality on the active sub-
                                                              back the encoded video to the conferees. Therefore it is
windows without introducing serious visual quality
                                                              required to perform bit-rate conversion/reduction at the
degradation in the inactive ones, but also reduce the
                                                              video transcoder. Bit-rate conversion from high bit-rate
computational complexity and avoid whole-frame
                                                              to low bit-rate in video transcoding will, however,
skipping. Moreover, the proposed algorithm is fully
                                                              introduce video quality degradation. The visual quality,
compatible with the H.263 video coding standard.
                                                              the computational load, and the used bit-rates need to be
                                                              traded off in video transcoding to achieve a good solution.
               1.   INTRODUCTION                              The problem of how to efficiently redistribute the limited
                                                              bit-rates to different parts of a video in video transcoding
Video telephony is an efficient way for businesspersons,      is critical in providing satisfactory visual quality. In a
engineers, scientists, etc. to exchange information at        multipoint videoconference, most of the time only one or
remote locations. With the rapid growth of video              two conferees are active at one time. The active
telephony, the need of multipoint video conferencing is       conferees need higher bit-rates to produce good quality
also growing. A multipoint videoconference involves           video while the inactive conferees require lower bit-rates
three or more conference participants. In continuous          to produce acceptable quality video [4]. Simply
presence video conferencing, each conferee can see            uniformly distributing the bit-rates to the conferees will
others in the same window simultaneously [1]. Fig. 1          result in non-uniform video quality. To make best use of
depicts an application scenario of multiple persons           the available bit-rates, a joint rate-control scheme which
participating in a multipoint videoconference with a          takes into account each conferee sub-window’s activity
can be used [4-5]. Sun et al. [4] proposed to measure the     vectors in the dropped sub-windows after performing the
motion activity of each sub-window by calculating the         DSWS scheme. Section 4 reports the experimental
sum of the magnitudes of its corresponding motion             results of the proposed algorithms and the comparison
vectors, and allocate the bit-rates to each sub-window        with the H.263 [11] TMN8 direct transcoding method.
according to its activity. Thus more bits will be allocated   Finally, conclusions are drawn in Section 5.
to those sub-windows with higher activities, and this
strategy will produce much more uniform quality video.        2. DNAMIC SUBWINDOW SKIPPING (DSWS)
Wu et al. [5] extended the work in [4] by allocating the
bit-rates to each sub-window according to its spatial-
temporal activity which takes into account the motion,        Fig. 2 depicts the architecture for multipoint video
the variance of the residual signal, and the number of the    transcoding discussed in this paper. In this architecture,
encoded macroblocks. Similar work on joint rate-control       the input data for the video transcoder consist of multiple
can also be found in the statistical multiplexing (StatMux)   H.263 encoded video streams from the client-terminals
of multiple video programs [6-8] and MPEG-4 joint rate-       through a heterogeneous network environment. The
control of multiple video objects [9]. However the strong     video streams could be transmitted through PSTN, ISDN,
correlation among the conferees in a multipoint video         or LAN (Local Area Network) with various bandwidth
conference usually does not exist in the general cases of     requirements. For simplicity but without loss of
the StatMux and MPEG-4 object rate control. In addition,      generality, we assume each video stream is encoded in
we will show that dynamic temporal resolution control         the QCIF (176x144) format and each participant can see
[10] for each sub-window, which has not been addressed        4 participants in a CIF (352x288) frame in a continuous
in the previous work, may bring further coding gain and       presence fashion. In our experiments we assume the
computation reduction in multipoint video transcoding.        video transmission is over ISDN as shown in the scenario
                                                              in Fig. 1. As shown in Fig. 2, the input video streams first
                                                              are buffered at the input to regulate the difference
                                                              between the input data-rate and the transcoding date-rate
                    video combining                           for each video frame. Each video stream is decoded
                                                              through a Variable Length Decoder (VLD) and then
                          ISDN                                transcoded into lower data-rates. To meet various user
                          ISDN                                bit-rate requirements, more than one transcoder may be
                                                              required for the same video stream to generate multiple
                                                              video streams with different bit rates. The transcoded bit-
                                                              streams are subsequently combined into CIF frames
                       MCU in the
                                                              through a number of multiplexers and video combiners.
                      Central Office                          The multiplexing unit for the video combiner is the GOB
                                                              (Group of Block) as specified in the H.263 standard [11].
Fig. 1. Application example of multipoint video
conferencing
                                                                                         Motion info
In this paper, we present a DSWS scheme which
                                                                                                               Joint
provides the flexibility that sub-windows can be encoded                                                    Rate control

in different temporal resolutions according to their
motion activities. The proposed DSWS scheme classifies         from conferee #1


the sub-windows into active and inactive classes by                      BUF       VLD        Sub-window
                                                                                               transcoder
                                                                                                               BUF
                                                                                                                             video
                                                                                                                           combiner
                                                                                                                                      to conferee #1



calculating the associated motion activities. The inactive
sub-windows can then be dropped so that the saved bits         from conferee #2

                                                                                                                                      to conferee #2
can be used to significantly enhance the visual quality of               BUF       VLD        Sub-window
                                                                                               transcoder
                                                                                                               BUF           video
                                                                                                                           combiner

the active ones without introducing serious degradation
on the inactive ones. In addition to the performance gain       from conferee #N


on active sub-windows, the DSWS scheme also presents                    BUF        VLD        Sub-window
                                                                                                               BUF
                                                                                                                             video
                                                                                                                                      to conferee #N

                                                                                               transcoder                  combiner
two other advantages: achieving computation reduction
and avoiding the whole-frame skipping. Furthermore, we
present a motion-vector composing scheme which can            Fig. 2. The proposed multipoint video transcoding
compose and trace the unavailable motion vectors in the       architecture
dropped sub-windows with good accuracy at a very low
                                                              As mentioned above, in a multipoint video conference,
computational and memory cost.
                                                              usually only one or two persons are motion active at one
The rest of this paper is organized as follows. Section 2     time. The active conferees (e.g., the conferees who are
presents the proposed DSWS scheme. In Section 3, a pre-       speaking) are often the center of focus. Therefore,
filtered activity-based motion-vector composing scheme        allocating the active sub-windows with relatively higher
is proposed for composing the unavailable motion
bit-rates can provide a much better visual experience to                  1 N
the viewers. The active conferees usually have larger
                                                               SAADm =      ∑ ∑ | fm (x, y) − fmprev ( x + MVmx,n , y + MVmy,n ) |
                                                                          N n =1 x, y∈MBm ,n
motions than others, thus they can be easily detected by                                                                      (2)
using the incoming motion information.
                                                               The sum of the magnitude of motion vectors of a sub-
In this paper, we observe that in multipoint video             window can be used as a good indication of its motion
conferencing, the temporal resolutions of the low-active       activity. A sub-window is classified as active if the sum is
sub-windows may be reduced without introducing                 larger than a predetermined threshold, THMV, otherwise it
significant visual degradation from the Human Visual           is classified as inactive. An inactive sub-window will be
System (HVS) point of view. The main reason is that,           skipped if its associated SAAD value defined in (2) is
since the motions presented by the inactive sub-windows        below a threshold, THSAAD. If an inactive sub-window is
are relatively slow and the conferees usually concentrate      skipped, the corresponding latest non-skipped sub-
their focuses on the active ones, the effect of the temporal   window is repeated to approximate the skipped sub-
resolution reduction by skipping inactive sub-windows          windows.       Human visual perception is relatively
can often be masked by the high motions in the active          insensitive to the little differences between the skipped
sub-windows thus is not sensitive to viewers’ perceptions.     sub-windows and their reconstructed ones from sub-
To make best use of this property, we propose to drop          window repetition if the skipped sub-windows are
motion inactive sub-windows by using sub-window                inactive. The two thresholds, THMV and THSAAD, are used
repetition to approximate those dropped sub-windows at         for the classification, the larger the thresholds are set, the
the end decoder, so that the saved bits can be used to         more the sub-windows will be skipped, and the more the
enhance the quality of the remaining non-skipped active        saved bits will be used in other sub-windows (but jerky
ones which are usually the regions of interest. In addition,   motions will become more serious). The SAAD value of
if a sub-window is decided to be skipped, much                 each sub-window is used to constrain the sub-window
computation in transcoding this sub-window can be saved,       skipping. If the current frame is inactive but the SAAD is
thus significant computation reduction can be achieved.        larger than a threshold, the proposed method enforces
Sub-window skipping can be implemented in the H.263            that the sub-window, which would otherwise be skipped,
syntax [11] by simply setting all the COD bits of the          be encoded. This measure can prevent the error
Macroblocks (MBs) belonging to the skipped sub-                accumulation caused by slow but steady motions by using
windows to “1” to get rid of sending the associated DCT        only the motion activity measure as in [10]. It should be
coefficients, motion vector information, and MB                noted that, since the incoming sub-windows may be
overhead bits. Only 99 bits (for 99 MB COD bits                dropped in consecutive frames with the DSWS method,
respectively) are required to represent a skipped QCIF         the incoming motion vectors may not be valid since they
(176x144) sub-window, thus the overhead is relatively          may point to the dropped sub-windows that do not exist
negligible. In our proposed DSWS scheme, the motion            in the transcoded bit-stream. To compose and trace the
information is used to calculate the motion activity of        required but unavailable motion vectors along the
each sub-window for dynamic sub-window skipping                consecutively skipped sub-windows with respect to the
control. The DSWS scheme is summarized as follows:             corresponding latest encoded sub-windows, the motion
      MV
if ( Sm < TH MV ) & &(SAADm < TH SAAD )                        vector composing scheme proposed in Section 3 is used.
                                                               The proposed DSWS scheme presents several advantages.
then                                                           First, the quality of the active sub-windows can be
   Skip the transcoding of the mth sub-window                  effectively enhanced. The quality loss on the inactive
                                                               sub-windows is relatively small and visually insensitive
else                                                           to the viewer’s perception. Second, skipping a sub-
   Transcode the mth sub-window                                window implies saving much computation in transcoding
                                                               that sub-window (in our simulation, about 2/3 of the
where the sum of the magnitude of accumulated motion           computation in transcoding that sub-window can be
vectors of the mth sub-window is defined as                    saved), thus achieving significant computation reduction.
                         N
                                                               Finally, by skipping the motion inactive sub-windows,
                     1
               MV
              Sm =
                     N
                         ∑ ( MV  x
                                m ,n            )
                                       + MVmy, n ,      (1)    many whole-frame skippings due to insufficient bit-
                         n =1                                  allocation can be avoided, so that the temporal resolution
                                                               of the motion active sub-windows can be kept as high as
N is the number of MBs in a sub-window, ( MVmx,n , MVmy,n )
                                                               possible. Moreover, the proposed method can be
is the motion vector associated with the nth macroblock        combined with the dynamic bit-allocation scheme
of the mth sub-window with respect to its corresponding        presented in [4] to further improve the visual quality with
                                         prev
previously encoded sub-window (i.e., f m (⋅) ), and the        almost no extra complexity.
Sum of Accumulated Absolute Difference (SAAD) of the
sub-window is defined as
3. COMPOSING MOTION VECTORS IN THE                                                                                   motion vector MV n − k is an interpolation with a form
                     SKIPPED SUB-WINDOWS                                                                             [10,13-14]:
                                                                                                                                                  4
After performing the DSWS scheme, sub-windows may                                                                                      n−k
                                                                                                                                                 ∑ IV          i
                                                                                                                                                                n−k
                                                                                                                                                                      Ai ACTi
be skipped in consecutive frames. Similar to the frame-                                                                           MV         =   i =1                                    (3)
                                                                                                                                                         4
rate conversion situation stated in [10,12], the motion
vectors in the dropped sub-windows are usually
                                                                                                                                                        ∑ A ACT
                                                                                                                                                        i =1
                                                                                                                                                                  i            i

unavailable in the incoming bit-stream, thus need to be                                                              where Ai and ACTi represents the corresponding
re-computed. For example, in Fig. 3, a situation where                                                               overlapping area with and the activity of the ith
one sub-window is dropped in two consecutive frames in                                                               neighboring grid MB respectively. In (3), the block
transcoding is illustrated. As showin in Fig. 3, each video                                                          activities, ACTi’s, can be computed in the pixel-domain
stream going into the MCU carries the incoming motion                                                                or the DCT-domain. Since the number of the DCT
           n
vectors IVm , where n and m represent the frame and MB                                                               coefficients of a MB is usually small, computing the
sequence numbers respectively, for the 16x16 MBs on                                                                  activity in the DCT domain is much more efficient. We
the segmented grid. In this example, the equivalent                                                                  propose to use the sum of the magnitude of the non-zero
outgoing motion vector sent from the MCU to the client-                                                              DCT AC coefficient as the activity measure as follows:
                                                                              n
terminals                of             the          block                  MB1                should           be                    ACTi =            ∑ Coef                           (4)
                                                                                                                                                                        i, j
     n          n                    n−1                    n− 2
OV = IV + MV
    1          1                    1
                                         +     MV          1
                                                                   , instead of the incoming                                                            j∉DC

                                                                                                                     where Coefi,j is the jth nonzero DCT AC coefficient of
motion vector IV . However, MV1n −1 and MV1n − 2 do not
                                   1
                                    n
                                                                                                                     the ith anchor block which is decoded and de-quantized
                                            '        ''
exist in the incoming bit-stream since MB1 and MB1                                                                   from the incoming bit-stream. Since the number of the
are not aligned with the MB grid. Thus the outgoing                                                                  nonzero de-quantized DCT coefficients in a residual
motion vector needs to be either re-estimated using                                                                  block is usually small, the computational cost is thus
motion estimation schemes or composed using the                                                                      quite low.
available motion information of the MBs on the gird.                                                                 As explained in [12], there are several drawbacks with
                                                                                                                     the above interpolation scheme. First, for consecutively
                                                                                                                     dropped sub-windows, the interpolation should be
     MB1 − 3
       n
               MB n− 3
                                        (dropped)
                                    MB1 2
                                      n−
                                              MB n − 2               n
                                                                       (dropped)
                                                                   MB1 −1                        n
                                                                                               MB1      MB n
                                                                                                                     processed in backward order starting from the last
                                                                              '
                                                                            MB1
                                                                                                                     dropped sub-window to the first dropped sub-window.
                  2                              2                                                         2
                                           ''
                                         MB1           MV1n −1                          IV1n
                         MV1n− 2
                                                                                                                     This backward processing requires all motion vectors of
             '''
       n MB
     MB3 − 3 1 MB1 3
                 n−                   n−
                                    MB3 2       MB n − 2
                                                   4               MB3 −1
                                                                     n
                                                                              MB n −1
                                                                                                 n
                                                                                               MB3      MB n
                                                                                                           4
                                                                                 4

                                                            OV1n
                                                                                                                     the dropped sub-windows be stored, which requires much
    Sub-window (n-3)               Sub-window (n-2)                Sub-window (n-1)            Sub-window (n)
                                                                                                                     extra memory. Another drawback of the interpolation
                                              Tracing macroblocks in backward order
                                                                                                                     scheme is the inaccuracy of the resultant motion vector.
                                    IV1n− 2     IV2n− 2            IV1n−1     IV2n−1            IV1n     IV2n        In spite of the proper weighting of each neighboring
                                                                                                                     motion-vector based on the associated overlapping area
                                   IV3n− 2      IV4n− 2            IV3n−1     IV4n−1           IV3n      IV4n
                                                                                                                     and activity, unreliable motion vectors can be produced
                                                 Incoming motion vectors
                                                                                                                     because the area covered by four neighboring MBs may
                                                                                                                     be too divergent and too large to be described by a single
                                                                                                                     motion. The interpolation of these diverse motion flows
Fig. 3. Motion vector tracing in a MB with 2 sub-                                                                    thus may not produce an accurate motion vector. To
windows skipping.                                                                                                    improve the results, a Forward Dominant Vector
Motion vector re-estimation is undesirable due to the                                                                Selection (FDVS) scheme was proposed in [12] to
intensive computation. Instead, motion vector composing                                                              compose and trace the un-available motion vectors in the
using the available motion information is a better                                                                   dropped frames. Instead of interpolating the target
approach [10,12]. Similar problem has been discussed                                                                 motion vector from its four neighboring motion vectors,
for video downscaling applications [13-15]. When a MB                                                                the FDVS method selects one out of four neighboring
is not aligned with the MB grid, it will in general overlap                                                          motion vectors as a dominant one to approximate the
                                                                                                                                                                                   n−k
with four MBs with the corresponding motion vectors                                                                  target      motion-vector,        i.e.,     MV        =
{ IV1n − k , IV2n − k , IV3n − k , IV4n − k }. Composing the motion                                                                 n−1    n−1      n−1      n−1
                                                                                                                     dominant( IV1 , IV2 , IV3 , IV4 ), where the
vector MV n − k from the neighboring motion vectors                                                                  dominant motion-vector is defined as the motion-vector
{ IV1n − k , IV2n − k , IV3n − k , IV4n − k } is to find a mapping                                                   carried by the neighboring MB which overlaps the target
                           n −k                                                                                      MB the most. Fig. 4 illustrates the FDVS scheme for the
function MV                        = f( IV1n −1 , IV2n −1 , IV3n−1 , IV4n −1 ) which                                 2-frame skipping case. The FDVS scheme can compose
can approximate MV n − k with a good accuracy. One                                                                   the motion vectors with good accuracy and low
                                                                                                                     complexity in the forward order [12].
straightforward mapping function for composing the
for i = 1 to 4
                                                                                                                                              if IVi − IVmean > kstd ⋅ IVstd
                                                                                                                                                 IVi is unreliable, remove it from the
                                                  (dropped)                               (dropped)
      MB1 − 3
        n
                 MB n − 3
                    2                       MB1 2
                                              n−
                                                          MB n − 2
                                                          ''
                                                             2                    MB1 −1
                                                                                    n
                                                                                                 '
                                                                                               MB1                  MB   n
                                                                                                                         1
                                                                                                                               MB n
                                                                                                                                  2              dominant vector candidate list
                                                        MB1                                                  IV1n

         n−
      MB 3 3     MB
                     '''
                   MB1
                     n −3
                                            MB    n−2          n −2
                                                                      MV1n −1 = IV2n −1
                                                                                  MB   n −1                         MB n       MB n
                                                                                                                                              else
                                                          MB                                    MB    n −1             3



                                                                                                                                                   IVi is reliable, keep it in the dominant
                     1                            3                                    3                                          4
                            MV1n − 2 = IV2n − 2
                                                               4                                      4


                                                                          OV1n


          Frame (n-3)                         Frame (n-2)                            Frame (n-1)                         Frame (n)
                                                                                                                                                   vector candidate list
                                                                                                                                          where Kstd is a predefined constant.
                                                                                                                                      (3) Calculate the area-activity products Ai⋅ACTi for the
                                                                                                                                          MBs with the motion vector in the dominant vector
                                                                                                                                          candidate list, where Ai is the overlapping area with
                                                                                                                                          the neighboring block (i) and ACTi is the activity
Fig. 4. Forward dominant vector selection schemes [12].                                                                                   measure as defined in (5). Then select the motion
The performance of the FDVS method can be further                                                                                         vector of the neighboring MB with the largest area-
improved with a little extra computational complexity. If                                                                                 activity product as the dominant vector.
there is no strongly dominant MB which overlaps the
reference block with a significantly large area (e.g., the                                                                                     4. EXPERIMENTAL RESULTS
overlapping area is larger than a predefined threshold,
                                                                                                                                      In our experiments, two 200-frame standard QCIF
say 80% of the block area), selecting a dominant vector
                                                                                                                                      (176x144) test sequences “foreman” and “carphone” with
which diverges largely from the other neighboring
                                                                                                                                      a frame rate of 30 frames/s are used to verify the
motion vectors may degrade the quality significantly
                                                                                                                                      performance of the proposed PA-FDVS scheme. The
since the motion vector chosen may be unreliable. To
                                                                                                                                      performance comparisons of the full-search motion
solve this problem, we propose to remove the unreliable
                                                                                                                                      estimation method and the motion-vector composing
motion vectors from the candidate list before selecting
                                                                                                                                      methods (interpolation, FDVS, and PA-FDVS methods)
the dominant motion vector if no strongly dominant MB
                                                                                                                                      using the two test sequences are shown in Table 1 and
is found. Furthermore, in the dominant vector selection,
                                                                                                                                      Figure 5. Table 1 shows the average PSNR comparisons
the “largest overlapping area” may not be the best
                                                                                                                                      for the two test-sequences which were first encoded with
criterion when the overlapping areas of some of the other
                                                                                                                                      128 kb/s and 30 frames/s and then transcoded with 32 k/s
neighboring anchor blocks are similar. In this case, we
                                                                                                                                      and 7.5 frames/s. The result in Table 1 indicates that PA-
propose to select the neighboring MB with the largest
                                                                                                                                      FDVS performs better than FDVS and significantly
overlapping energy/activity as the dominant MB and use
                                                                                                                                      outperforms the interpolation scheme. The frame-by-
the activity measure as defined in (4).
                                                                                                                                      frame PSNR comparison of the PA-FDVS and FDVS
The proposed Pre-filtered Activity-based FDVS scheme                                                                                  schemes with the same test condition used in Table 1 are
(PA-FDVS) is summarized as follows:                                                                                                   shown in Figure 5. Alhough the average PSNR values of
                                                                                                                                      the PA-FDVS and FDVS in Table 1 are close, Figure 5
(1) For each MB, calculate the largest overlapping area,
                                                                                                                                      suggests that the PA-FDVS scheme achieves significant
    and if the largest overlapping area is greater than a
                                                                                                                                      PSNR improvement (up to 1.6 dB and 2.9 dB for the
    predetermined threshold (e.g., 80% in our simulation)
                                                                                                                                      “foreman” and “carphone” sequences respectively) over
    then select the motion vector of the neighboring MB
                                                                                                                                      the FDVS scheme on several frames with many divergent
    with the largest overlapping area as the dominant
                                                                                                                                      object-motions.
    vector and process the next MB, otherwise go to step
                                                                                                                                                               Table 1
    (2).
                                                                                                                                      Performance comparison of different motion vector
(2) Perform the following motion vector pre-filtering
                                                                                                                                      estimation and composition methods. Incoming bit-
    procedure:
                                                                                                                                      streams of 128 kb/s and 30 fps were transcoded into 32
    Set the initial candidate list as the four neighboring
                                                                                                                                      kb/s and 7.5 fps
    vectors { IV1 , IV2 , IV3 , IV4 }
     Calculate the mean and the standard deviation of                                                                                 Test sequence MV composition method Average PSNR
     the four neighboring motion vectors as follows:                                                                                                    Full-scale ME         27.39
                                                                                                                                                        Interpolation         23.72
                                                                                                                                        Foreman
                      1 4                                                                                                                                   FDVS              25.51
    IVmean =            ∑ IVi
                      4 i=1                                                                                                                              FA-FDVS              25.67
                                                                                                                                                        Full-scale ME         29.47
            1 4
                                                                                 2                                                                      Interpolation         27.07
    IVstd =   ∑ ( IVi − IVmean )
            4 i=1
                                                                                                                                        Carphone
                                                                                                                                                            FDVS              28.16
                                                                                                                                                         FA-FDVS              28.27
31
                                                forem an                            skipped sub-windows at the video decoders. The
                                                                                    thresholds, THMV and THSAD, are empirically set at 0.2
            30
                                                                                    and 10 respectively. Fig. 7 illustrates that the proposed
            29                                                                      DSWS method achieves PSNR gain on the sub-windows
            28                                                                      with relatively high activities, while the low-activity sub-
            27                                                                      windows are degraded. Table 2 shows the comparison of
                                                                                    the average PSNR of all the sub-windows using the two
   P S NR




            26
                                                                                    methods. As shown in Fig. 7 and Table 2, the proposed
            25
                                                                                    DSWS scheme achieves 0.2 and 0.39 dB average PSNR
            24                                                                      improvements on the overall and the non-skipped sub-
            23                                                                      windows respectively. In sub-window 4, the performance
            22
                         FDV S                                                      is degraded by 0.4 dB because of many long intervals
                         P A -FDV S

            21
                                                                                    with relatively low motion-activity. The degradation is
                 0   5        10      15   20      25
                                                 fram e
                                                           30   35   40   45   50   caused by the temporal resolution reduction by repeating
                                                                                    the previously decoded sub-window for the period of
                                                (a)                                 sub-window skipping, the temporal resolution reduction
            35
                                                carphone
                                                                                    is visually insignificant, since the motions of these sub-
            34                                                                      windows are very small. In this simulation case, 418 out
            33
                                                                                    of 1600 sub-windows are skipped, thus achieving about
            32
            31
                                                                                    17% computation reduction since the computation
            30                                                                      required for sub-window skipping decision and motion-
            29                                                                      vector composing is about 1/3 of the computation for
            28
                                                                                    sub-window transcoding. The computation reduction
   P S NR




            27
            26
                                                                                    ratio depends on the two threshold values: THMV and
            25                                                                      THSAD. The higher the threshold values, the lower the
            24                                                                      computation demand, however the lower the video
            23
                                                                                    quality. It is thus possible to achieve better trade-offs
            22
            21
                         FDV S
                         P A -FDV S
                                                                                    between computational cost and video quality by
            20                                                                      adjusting the threshold values adaptively.
                 0   5        10      15   20      25      30   35   40   45   50
                                                 frame


                          (b)                                                                                                                            Sub-window 1


Fig. 5. Performance comparison of the FDVS and the                                                        800
                                                                                      Motion Activity




                                                                                                          600

FA-FDVS schemes. Incoming bit-streams of 128 Kb/s                                                         400
                                                                                                          200

and 30 fps are transcoded with 32 Kb/s and 7.5 fps: (a)                                                    0
                                                                                                                1   15 29 43 57 71 85 99 113 127 141 155 169 183 197 211 225 239 253 267 281 295 309 323 337 351 365

“foreman” sequence; (b) “carphone” sequence.                                                                                                                  Frame No.




To verify the effectiveness of the proposed DSWS                                                                                                            (a)
scheme, four 400-frame QCIF video sequences captured
                                                                                                                                                         Sub-window 2

                                                                                                          800
from a four-point video-conference with a frame-rate of
                                                                                        Motion Activity




                                                                                                          600
                                                                                                          400
15 frames/s are used for experiments. We firstly encoded                                                  200


the four QCIF video sequences with 128 kb/s and 15
                                                                                                            0
                                                                                                                1   15 29 43 57 71 85 99 113 127 141 155 169 183 197 211 225 239 253 267 281 295 309 323 337 351 365


frames/s using the pubic-domain H.263 TMN8 software
                                                                                                                                                              Frame No.



[16]. The four input bit-streams are then jointly                                                                                                          (b)
transcoded into a single CIF (352x288) video with an                                                                                                     Sub-window 3


output bit-rate of 128 kb/s and an output frame-rate of 15                                                800
                                                                                        Motion Activity




                                                                                                          600

frames/s using the proposed DSWS scheme. The                                                              400
                                                                                                          200

compression ratio performed by the transcoder is thus                                                       0
                                                                                                                1   15 29 43 57 71 85 99 113 127 141 155 169 183 197 211 225 239 253 267 281 295 309 323 337 351 365

four in our experiments.                                                                                                                                      Frame No.



                                                                                                                                                            (c)
Fig. 6 depicts the motion-activity of each sub-window. In                                                                                                Sub-window 4


the simulated video conference session, most of the time                                                  800
                                                                                      Motion Activity




                                                                                                          600

only one or two sub-windows are motion active. Figure 7                                                   400
                                                                                                          200

compares the frame-by-frame PSNR performance of the                                                         0
                                                                                                                1   15 29 43 57 71 85 99 113 127 141 155 169 183 197 211 225 239 253 267 281 295 309 323 337 351 365

proposed DSWS method and the direct transcoding using                                                                                                         Frame No.



TMN8 [17]. The PSNR of a skipped sub-window is                                                                                          (d)
computed from the incoming QCIF sub-window images                                                                   Fig. 6. Motion-activity of each sub-window
and the latest previously reconstructed non-skipped one,
since the sub-window repetitions will occur for the
32
                                                                                                                       Table 2
             31.5
                                                                                Direct
                                                                                DS W S
                                                                                                Average PSNR Comparison of the proposed DSWS and
                  31
                                                                                                            direct transcoding schemes
             30.5
                                                                                                                                        Average PSNR of
                                                                                                                 Average PSNR of all
                                                                                                                                       non-skipped frames
                                                                                                                     frames(dB)
                  30


                                                                                                                                              (dB)
    P S NR




             29.5



                  29
                                                                                                                 Ori.      DSWS        Ori.     DSWS
             28.5                                                                               Sub-window 1    28.54 29.14 +0.60 28.55 29.31 +0.76
                  28
                                                                                                (151 skipped)
             27.5
                                                                                                Sub-window 2    28.59 29.16 +0.57 28.52 29.25 +0.73
                  27
                                                                                                 (75 skipped)
                       0   50   100   150         200
                                            fram e num ber
                                                             250   300   350              400
                                                                                                Sub-window 3    29.54 29.56 +0.02 29.48 29.59 +0.11
                                                                                                 (54 skipped)
                                            (a)                                                 Sub-window 4    28.99 28.59 -0.40 28.73 28.68 -0.05
                  34
                                                 RT
                                                                                                (139 skipped)
                                                                                                   Average      28.91 29.11 +0.20 28.82 29.21 +0.39
                                                                                 Direct
                                                                                 DS W S

                  33




                  32




                  31                                                                                             5. CONCLUSIONS
         P S NR




                  30


                                                                                                In this paper, we proposed a dynamic sub-window
                                                                                                skipping scheme for multipoint video conferencing. The
                  29




                  28
                                                                                                proposed scheme can enhance the visual quality of the
                  27
                                                                                                active sub-windows by saving bits from skipping the
                                                                                                inactive ones without introducing significant quality
                                                                                                degradation. We also presented an efficient motion-
                  26
                       0   50   100   150         200        250   300   350              400
                                            fram e num ber



                                                                                                vector composing scheme to compose and trace the
                                            (b)                                                 motion vectors in the skipped sub-windows
                                                 LB
                  32



             31.5
                                                                               Direct
                                                                               DS W S
                                                                                                The proposed method is particularly useful in multi-point
                                                                                                video conferencing applications since the focuses in such
                  31

                                                                                                applications are mainly on the active conferees.
                                                                                                Simulation results verify the effectiveness of the
             30.5




                                                                                                proposed method. Significant computation reduction is
                  30




                                                                                                also achieved with the proposed method. Furthermore,
    P S NR




             29.5



                  29
                                                                                                the proposed algorithm is fully compatible with the
             28.5
                                                                                                H.263 standard, thus can be integrated into current
                  28
                                                                                                commercial products. The proposed method can also be
             27.5
                                                                                                further extended to enhance the quality of sp ecific
                  27
                       0   50   100   150         200
                                            fram e num ber
                                                             250   300   350              400   regions/objects in region/object-based coding standards
                                                                                                such as MPEG-4 [5].
                                            (c)
                                                 RB
                                                                                                                  6. REFERENCES
                  31

                                                                                Direct
                                                                                DS W S

                                                                                                [1]   M.-T. Sun and I.-M. Pao, “Multipoint video
             30.5



                  30
                                                                                                      conferencing,” Visual Commun. and Image Proc.,
             29.5
                                                                                                      Marcel Dekker, 1997.
                  29


                                                                                                [2]   G. Keesman et al., “Transcoding of MPEG
    P S NR




             28.5



                  28
                                                                                                      bitstream,” Signal Proc. Image Commun., pp. 481-
             27.5
                                                                                                      500, 1996.
                  27                                                                            [3]    P. A. A. Assuncao and M. Ghanbari, “A frequency
             26.5                                                                                     domain video transcoder for dynamic bit rate
                  26
                       0   50   100   150         200        250   300   350              400
                                                                                                      reduction of MPEG-2 bit streams,” IEEE Trans.
                                                                                                      Circuits Syst. Video Technol., vol. 8, no. 8, pp. 953-
                                            fram e num ber




                       (d)                                                                            567, Dec. 1998.
Fig. 7. PSNR comparison of the proposed method and                                              [4]    M.-T. Sun, T.-D. Wu, and J.-N. Hwang, "Dynamic
TMN8                                                                                                  bit allocation in video combing for multipoint video
conferencing," IEEE Trans. on Circuit and Systems.,
      vol. 45, no. 5, pp. 644-648, May. 1998.
[5]   G. Keesman, “Multi-program video compression
      using joint bit-rate control,” Philips Journal of
      Research, vol. 50, pp. 21-45, 1996.
[6]   L. Wang and A. Vincent, “Bit allocation and
      Constraints for Joint Coding of Multiple Video
      programs,” IEEE Trans. Circuits Syst. Video
      Technol., vol. 9, no. 6, pp. 949-959, Sep. 1999.
[7]   A. Vetro, H. Sun and Y. Wang, “MPEG-4 rate
      control for multiple video objects,” IEEE Trans.
      Circuits Syst. Video Technol., vol. 9, no. 1, pp. 186-
      199, Feb. 1999.
[8]   M. E. Lukacs and D. G. Boyer, , and M. Mills, “The
      personal presence system experimental research
      prototype,” IEEE Int. Conf. Comm., vol. 2, pp.
      1112-1116, Jun. 1996.
[9]   M. E. Lukac and D. G. Boyer, “A universal
      broadband multipoint teleconferencing service for
      the 21st century,” IEEE Comm. Magazine, vol. 33,
      no. 11, pp. 36-43, Nov. 1995.
[10] J.-N. Hwang, T.-D. Wu, and C.-W. Lin, "Dynamic
      frame skipping in video transcoding,” Proc. IEEE
      Workshop on Multimedia Signal Proc., Dec. 1998,
      Redondo Beach, USA.
[11] ITU-T Recommendation H.263, “Video codec for
      low bit-rate communication,” May. 1996.
[12] J. Youn, M.-T. Sun and C.-W. Lin, “Adaptive
      motion vector refinement for high performance
      transcoding,” IEEE Trans. Multimedia, vol. 1, no. 1,
      pp. 30-40 Mar. 1999.
[13] N. Bjorkand and C. Christopoulos, “Transcoder
      architecture for video coding,” IEEE Trans.
      Consumer Electron., vol.44, pp.88–98, Feb.1998.
[14] B. Shen, I.K. Sethi, and V. Bhaskaran, “Adaptive
      motion-vector resampling for compressed video
      downscaling,” IEEE Trans. Circuits Syst. Video
      Technol., vol. 9, no. 6, pp. 929-936, Sep. 1999.
[15] M. R. Hashemi, L. Winger, and S. Panchanathan,
      “Compressed domain vector resampling for
      downscaling of MPEG video,” Proc. IEEE Int. Conf.
      Image Proc., pp. 276-279, Japan, Oct. 1999.
[16] Image    Procssing Lab, University of British
      Columbia, “H.263+ encoder/decoder,” TMN(H.263)
      codec, Feb. 1998.
[17] ITU-T/SG16, “Video codec test model, TMN8,”
      Portland, June 1997.

More Related Content

What's hot

Reaching a Broader Audience
Reaching a Broader AudienceReaching a Broader Audience
Reaching a Broader AudienceVideoguy
 
An Overview on Multimedia Transcoding Techniques on Streaming Digital Contents
An Overview on Multimedia Transcoding Techniques on Streaming Digital ContentsAn Overview on Multimedia Transcoding Techniques on Streaming Digital Contents
An Overview on Multimedia Transcoding Techniques on Streaming Digital Contentsidescitation
 
Instant video streaming
Instant video streamingInstant video streaming
Instant video streamingVideoguy
 
Optimal Streaming Protocol for VoD Using Clients' Residual Bandwidth
Optimal Streaming Protocol for VoD Using Clients' Residual BandwidthOptimal Streaming Protocol for VoD Using Clients' Residual Bandwidth
Optimal Streaming Protocol for VoD Using Clients' Residual BandwidthIDES Editor
 
Considerations for Creating Streamed Video Content over 3G ...
Considerations for Creating Streamed Video Content over 3G ...Considerations for Creating Streamed Video Content over 3G ...
Considerations for Creating Streamed Video Content over 3G ...Videoguy
 
EXPERIENCES WITH HIGH DEFINITION INTERACTIVE VIDEO ...
EXPERIENCES WITH HIGH DEFINITION INTERACTIVE VIDEO ...EXPERIENCES WITH HIGH DEFINITION INTERACTIVE VIDEO ...
EXPERIENCES WITH HIGH DEFINITION INTERACTIVE VIDEO ...Videoguy
 
Streaming Video into Second Life
Streaming Video into Second LifeStreaming Video into Second Life
Streaming Video into Second LifeVideoguy
 
GY-HM750E
GY-HM750EGY-HM750E
GY-HM750EAVNed
 
Bitstream and hybrid-based video quality assessment for IPTV monitoring
Bitstream and hybrid-based video quality assessment for IPTV monitoringBitstream and hybrid-based video quality assessment for IPTV monitoring
Bitstream and hybrid-based video quality assessment for IPTV monitoringFörderverein Technische Fakultät
 
Multinode Cooperative Communications with Generalized Combining Schemes
Multinode Cooperative Communications with Generalized Combining SchemesMultinode Cooperative Communications with Generalized Combining Schemes
Multinode Cooperative Communications with Generalized Combining Schemesakrambedoui
 
Cymtv.Products.Jan.2012
Cymtv.Products.Jan.2012Cymtv.Products.Jan.2012
Cymtv.Products.Jan.2012robwilmer
 
Videoconferencing Technical Considerations for IT Professionals
Videoconferencing Technical Considerations for IT ProfessionalsVideoconferencing Technical Considerations for IT Professionals
Videoconferencing Technical Considerations for IT ProfessionalsClarkPowell
 
Apresentação feita em 2005 no Annual Simulation Symposium.
Apresentação feita em 2005 no Annual Simulation Symposium.Apresentação feita em 2005 no Annual Simulation Symposium.
Apresentação feita em 2005 no Annual Simulation Symposium.Antonio Marcos Alberti
 
Overview of the H.264/AVC video coding standard - Circuits ...
Overview of the H.264/AVC video coding standard - Circuits ...Overview of the H.264/AVC video coding standard - Circuits ...
Overview of the H.264/AVC video coding standard - Circuits ...Videoguy
 
Radvision High Quality Experience Over Unmanaged Networks By Face to Face Live
Radvision High Quality Experience Over Unmanaged Networks By Face to Face LiveRadvision High Quality Experience Over Unmanaged Networks By Face to Face Live
Radvision High Quality Experience Over Unmanaged Networks By Face to Face LiveFace to Face Live
 
Scalable Video Coding in Content-Aware Networks
Scalable Video Coding in Content-Aware NetworksScalable Video Coding in Content-Aware Networks
Scalable Video Coding in Content-Aware Networksmgrafl
 

What's hot (20)

Reaching a Broader Audience
Reaching a Broader AudienceReaching a Broader Audience
Reaching a Broader Audience
 
An Overview on Multimedia Transcoding Techniques on Streaming Digital Contents
An Overview on Multimedia Transcoding Techniques on Streaming Digital ContentsAn Overview on Multimedia Transcoding Techniques on Streaming Digital Contents
An Overview on Multimedia Transcoding Techniques on Streaming Digital Contents
 
Antiference
AntiferenceAntiference
Antiference
 
Instant video streaming
Instant video streamingInstant video streaming
Instant video streaming
 
Optimal Streaming Protocol for VoD Using Clients' Residual Bandwidth
Optimal Streaming Protocol for VoD Using Clients' Residual BandwidthOptimal Streaming Protocol for VoD Using Clients' Residual Bandwidth
Optimal Streaming Protocol for VoD Using Clients' Residual Bandwidth
 
Considerations for Creating Streamed Video Content over 3G ...
Considerations for Creating Streamed Video Content over 3G ...Considerations for Creating Streamed Video Content over 3G ...
Considerations for Creating Streamed Video Content over 3G ...
 
EXPERIENCES WITH HIGH DEFINITION INTERACTIVE VIDEO ...
EXPERIENCES WITH HIGH DEFINITION INTERACTIVE VIDEO ...EXPERIENCES WITH HIGH DEFINITION INTERACTIVE VIDEO ...
EXPERIENCES WITH HIGH DEFINITION INTERACTIVE VIDEO ...
 
Streaming Video into Second Life
Streaming Video into Second LifeStreaming Video into Second Life
Streaming Video into Second Life
 
GY-HM750E
GY-HM750EGY-HM750E
GY-HM750E
 
Bitstream and hybrid-based video quality assessment for IPTV monitoring
Bitstream and hybrid-based video quality assessment for IPTV monitoringBitstream and hybrid-based video quality assessment for IPTV monitoring
Bitstream and hybrid-based video quality assessment for IPTV monitoring
 
Multinode Cooperative Communications with Generalized Combining Schemes
Multinode Cooperative Communications with Generalized Combining SchemesMultinode Cooperative Communications with Generalized Combining Schemes
Multinode Cooperative Communications with Generalized Combining Schemes
 
JVC GY-HM750
JVC GY-HM750JVC GY-HM750
JVC GY-HM750
 
JVC GY-HM750
JVC GY-HM750JVC GY-HM750
JVC GY-HM750
 
Cymtv.Products.Jan.2012
Cymtv.Products.Jan.2012Cymtv.Products.Jan.2012
Cymtv.Products.Jan.2012
 
Videoconferencing Technical Considerations for IT Professionals
Videoconferencing Technical Considerations for IT ProfessionalsVideoconferencing Technical Considerations for IT Professionals
Videoconferencing Technical Considerations for IT Professionals
 
Apresentação feita em 2005 no Annual Simulation Symposium.
Apresentação feita em 2005 no Annual Simulation Symposium.Apresentação feita em 2005 no Annual Simulation Symposium.
Apresentação feita em 2005 no Annual Simulation Symposium.
 
Overview of the H.264/AVC video coding standard - Circuits ...
Overview of the H.264/AVC video coding standard - Circuits ...Overview of the H.264/AVC video coding standard - Circuits ...
Overview of the H.264/AVC video coding standard - Circuits ...
 
Dvbshop
DvbshopDvbshop
Dvbshop
 
Radvision High Quality Experience Over Unmanaged Networks By Face to Face Live
Radvision High Quality Experience Over Unmanaged Networks By Face to Face LiveRadvision High Quality Experience Over Unmanaged Networks By Face to Face Live
Radvision High Quality Experience Over Unmanaged Networks By Face to Face Live
 
Scalable Video Coding in Content-Aware Networks
Scalable Video Coding in Content-Aware NetworksScalable Video Coding in Content-Aware Networks
Scalable Video Coding in Content-Aware Networks
 

Viewers also liked

Privacy protection of visual information
Privacy protection of visual informationPrivacy protection of visual information
Privacy protection of visual informationTouradj Ebrahimi
 
Scrambling For Video Surveillance
Scrambling For Video SurveillanceScrambling For Video Surveillance
Scrambling For Video SurveillanceKobi Magnezi
 
Pros and Cons of Eyetracking
Pros and Cons of EyetrackingPros and Cons of Eyetracking
Pros and Cons of EyetrackingLuke Hay
 
Dmitry Stepanov - Detector of interest point from region of interest on NBI ...
Dmitry Stepanov - Detector of interest point from region of interest on  NBI ...Dmitry Stepanov - Detector of interest point from region of interest on  NBI ...
Dmitry Stepanov - Detector of interest point from region of interest on NBI ...AIST
 
ICME 2016 - Tutorial on Interactive Search in Video & Lifelog Repositories
ICME 2016 - Tutorial on Interactive Search in Video & Lifelog RepositoriesICME 2016 - Tutorial on Interactive Search in Video & Lifelog Repositories
ICME 2016 - Tutorial on Interactive Search in Video & Lifelog Repositoriesklschoef
 
Extraction of region of interest in an image
Extraction of region of interest in an imageExtraction of region of interest in an image
Extraction of region of interest in an imageHarsukh Chandak
 
Video Forgery Detection: Literature review
Video Forgery Detection: Literature reviewVideo Forgery Detection: Literature review
Video Forgery Detection: Literature reviewTharindu Rusira
 
Eye Tracking & Consumer Behavior
Eye Tracking & Consumer BehaviorEye Tracking & Consumer Behavior
Eye Tracking & Consumer BehaviorHugo Guyader
 
Eye tracking and its economic feasibility
Eye tracking and its economic feasibilityEye tracking and its economic feasibility
Eye tracking and its economic feasibilityJeffrey Funk
 
An eye tracker analysis of the influence of applicant attractiveness on emplo...
An eye tracker analysis of the influence of applicant attractiveness on emplo...An eye tracker analysis of the influence of applicant attractiveness on emplo...
An eye tracker analysis of the influence of applicant attractiveness on emplo...Hakan Boz
 
Interactive Video Search - Tutorial at ACM Multimedia 2015
Interactive Video Search - Tutorial at ACM Multimedia 2015Interactive Video Search - Tutorial at ACM Multimedia 2015
Interactive Video Search - Tutorial at ACM Multimedia 2015klschoef
 
Image Processing Based Signature Recognition and Verification Technique Using...
Image Processing Based Signature Recognition and Verification Technique Using...Image Processing Based Signature Recognition and Verification Technique Using...
Image Processing Based Signature Recognition and Verification Technique Using...Priyanka Pradhan
 
Eye-tracking presentation
Eye-tracking presentationEye-tracking presentation
Eye-tracking presentationPeter Smith
 
Theeye tribe, it s a eye tracking device which makes the usage of PC, laptops...
Theeye tribe, it s a eye tracking device which makes the usage of PC, laptops...Theeye tribe, it s a eye tracking device which makes the usage of PC, laptops...
Theeye tribe, it s a eye tracking device which makes the usage of PC, laptops...Prajs Ks
 
Real time image processing ppt
Real time image processing pptReal time image processing ppt
Real time image processing pptashwini.jagdhane
 
Workshopvin4 Region Of Interest Advanced Video Coding
Workshopvin4 Region Of Interest Advanced Video CodingWorkshopvin4 Region Of Interest Advanced Video Coding
Workshopvin4 Region Of Interest Advanced Video Codingimec.archive
 

Viewers also liked (19)

Privacy protection of visual information
Privacy protection of visual informationPrivacy protection of visual information
Privacy protection of visual information
 
Mw2012 eyetracking
Mw2012 eyetrackingMw2012 eyetracking
Mw2012 eyetracking
 
Scrambling For Video Surveillance
Scrambling For Video SurveillanceScrambling For Video Surveillance
Scrambling For Video Surveillance
 
Pros and Cons of Eyetracking
Pros and Cons of EyetrackingPros and Cons of Eyetracking
Pros and Cons of Eyetracking
 
Dmitry Stepanov - Detector of interest point from region of interest on NBI ...
Dmitry Stepanov - Detector of interest point from region of interest on  NBI ...Dmitry Stepanov - Detector of interest point from region of interest on  NBI ...
Dmitry Stepanov - Detector of interest point from region of interest on NBI ...
 
ICME 2016 - Tutorial on Interactive Search in Video & Lifelog Repositories
ICME 2016 - Tutorial on Interactive Search in Video & Lifelog RepositoriesICME 2016 - Tutorial on Interactive Search in Video & Lifelog Repositories
ICME 2016 - Tutorial on Interactive Search in Video & Lifelog Repositories
 
Extraction of region of interest in an image
Extraction of region of interest in an imageExtraction of region of interest in an image
Extraction of region of interest in an image
 
Video Forgery Detection: Literature review
Video Forgery Detection: Literature reviewVideo Forgery Detection: Literature review
Video Forgery Detection: Literature review
 
Eye Tracking & Consumer Behavior
Eye Tracking & Consumer BehaviorEye Tracking & Consumer Behavior
Eye Tracking & Consumer Behavior
 
Eye tracking and its economic feasibility
Eye tracking and its economic feasibilityEye tracking and its economic feasibility
Eye tracking and its economic feasibility
 
An eye tracker analysis of the influence of applicant attractiveness on emplo...
An eye tracker analysis of the influence of applicant attractiveness on emplo...An eye tracker analysis of the influence of applicant attractiveness on emplo...
An eye tracker analysis of the influence of applicant attractiveness on emplo...
 
Interactive Video Search - Tutorial at ACM Multimedia 2015
Interactive Video Search - Tutorial at ACM Multimedia 2015Interactive Video Search - Tutorial at ACM Multimedia 2015
Interactive Video Search - Tutorial at ACM Multimedia 2015
 
Region Of Interest Extraction
Region Of Interest ExtractionRegion Of Interest Extraction
Region Of Interest Extraction
 
Image Processing Based Signature Recognition and Verification Technique Using...
Image Processing Based Signature Recognition and Verification Technique Using...Image Processing Based Signature Recognition and Verification Technique Using...
Image Processing Based Signature Recognition and Verification Technique Using...
 
Eye-tracking presentation
Eye-tracking presentationEye-tracking presentation
Eye-tracking presentation
 
Theeye tribe, it s a eye tracking device which makes the usage of PC, laptops...
Theeye tribe, it s a eye tracking device which makes the usage of PC, laptops...Theeye tribe, it s a eye tracking device which makes the usage of PC, laptops...
Theeye tribe, it s a eye tracking device which makes the usage of PC, laptops...
 
Eye Tracking & Design
Eye Tracking & DesignEye Tracking & Design
Eye Tracking & Design
 
Real time image processing ppt
Real time image processing pptReal time image processing ppt
Real time image processing ppt
 
Workshopvin4 Region Of Interest Advanced Video Coding
Workshopvin4 Region Of Interest Advanced Video CodingWorkshopvin4 Region Of Interest Advanced Video Coding
Workshopvin4 Region Of Interest Advanced Video Coding
 

Similar to DYNAMIC REGION OF INTEREST TRANSCODING FOR MULTIPOINT VIDEO ...

Whitepaper multipoint video_conferencing_june2012_wr
Whitepaper multipoint video_conferencing_june2012_wrWhitepaper multipoint video_conferencing_june2012_wr
Whitepaper multipoint video_conferencing_june2012_wrJohn Shim
 
H2B2VS (HEVC hybrid broadcast broadband video services) – Building innovative...
H2B2VS (HEVC hybrid broadcast broadband video services) – Building innovative...H2B2VS (HEVC hybrid broadcast broadband video services) – Building innovative...
H2B2VS (HEVC hybrid broadcast broadband video services) – Building innovative...Raoul Monnier
 
Impact of Video Encoding Parameters on Dynamic Video Transcoding
Impact of Video Encoding Parameters on Dynamic Video TranscodingImpact of Video Encoding Parameters on Dynamic Video Transcoding
Impact of Video Encoding Parameters on Dynamic Video TranscodingVideoguy
 
Video Streaming Compression for Wireless Multimedia Sensor Networks
Video Streaming Compression for Wireless Multimedia Sensor NetworksVideo Streaming Compression for Wireless Multimedia Sensor Networks
Video Streaming Compression for Wireless Multimedia Sensor NetworksIOSR Journals
 
(Paper) MTcast: Robust and Efficient P2P-based Video Delivery for Heterogeneo...
(Paper) MTcast: Robust and Efficient P2P-based Video Delivery for Heterogeneo...(Paper) MTcast: Robust and Efficient P2P-based Video Delivery for Heterogeneo...
(Paper) MTcast: Robust and Efficient P2P-based Video Delivery for Heterogeneo...Naoki Shibata
 
A Benchmark to Evaluate Mobile Video Upload to Cloud Infrastructures
A Benchmark to Evaluate Mobile Video Upload to Cloud InfrastructuresA Benchmark to Evaluate Mobile Video Upload to Cloud Infrastructures
A Benchmark to Evaluate Mobile Video Upload to Cloud InfrastructuresUniversity of Southern California
 
(Paper) P2P VIDEO BROADCAST BASED ON PER-PEER TRANSCODING AND ITS EVALUATION ...
(Paper) P2P VIDEO BROADCAST BASED ON PER-PEER TRANSCODING AND ITS EVALUATION ...(Paper) P2P VIDEO BROADCAST BASED ON PER-PEER TRANSCODING AND ITS EVALUATION ...
(Paper) P2P VIDEO BROADCAST BASED ON PER-PEER TRANSCODING AND ITS EVALUATION ...Naoki Shibata
 
Using Bandwidth Aggregation to Improve the Performance of Video Quality- Adap...
Using Bandwidth Aggregation to Improve the Performance of Video Quality- Adap...Using Bandwidth Aggregation to Improve the Performance of Video Quality- Adap...
Using Bandwidth Aggregation to Improve the Performance of Video Quality- Adap...paperpublications3
 
Ending the Multipoint Videoconferencing Compromise
Ending the Multipoint Videoconferencing CompromiseEnding the Multipoint Videoconferencing Compromise
Ending the Multipoint Videoconferencing CompromiseVideoguy
 
VCIP_MCBE_presentation.pdf
VCIP_MCBE_presentation.pdfVCIP_MCBE_presentation.pdf
VCIP_MCBE_presentation.pdfVignesh V Menon
 
Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...
Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...
Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...Alpen-Adria-Universität
 
Online Bitrate ladder prediction for Adaptive VVC Streaming
Online Bitrate ladder prediction for Adaptive VVC StreamingOnline Bitrate ladder prediction for Adaptive VVC Streaming
Online Bitrate ladder prediction for Adaptive VVC StreamingVignesh V Menon
 
Survey paper on Virtualized cloud based IPTV System
Survey paper on Virtualized cloud based IPTV SystemSurvey paper on Virtualized cloud based IPTV System
Survey paper on Virtualized cloud based IPTV Systemijceronline
 
Video Conferencing PAGES
Video Conferencing PAGESVideo Conferencing PAGES
Video Conferencing PAGESVideoguy
 
Motion Vector Recovery for Real-time H.264 Video Streams
Motion Vector Recovery for Real-time H.264 Video StreamsMotion Vector Recovery for Real-time H.264 Video Streams
Motion Vector Recovery for Real-time H.264 Video StreamsIDES Editor
 
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Ijripublishers Ijri
 

Similar to DYNAMIC REGION OF INTEREST TRANSCODING FOR MULTIPOINT VIDEO ... (20)

Whitepaper multipoint video_conferencing_june2012_wr
Whitepaper multipoint video_conferencing_june2012_wrWhitepaper multipoint video_conferencing_june2012_wr
Whitepaper multipoint video_conferencing_june2012_wr
 
H2B2VS (HEVC hybrid broadcast broadband video services) – Building innovative...
H2B2VS (HEVC hybrid broadcast broadband video services) – Building innovative...H2B2VS (HEVC hybrid broadcast broadband video services) – Building innovative...
H2B2VS (HEVC hybrid broadcast broadband video services) – Building innovative...
 
Simulation Study of Video Streaming in Multi-Hop Network
Simulation Study of Video Streaming in Multi-Hop NetworkSimulation Study of Video Streaming in Multi-Hop Network
Simulation Study of Video Streaming in Multi-Hop Network
 
F04024549
F04024549F04024549
F04024549
 
Impact of Video Encoding Parameters on Dynamic Video Transcoding
Impact of Video Encoding Parameters on Dynamic Video TranscodingImpact of Video Encoding Parameters on Dynamic Video Transcoding
Impact of Video Encoding Parameters on Dynamic Video Transcoding
 
Video Streaming Compression for Wireless Multimedia Sensor Networks
Video Streaming Compression for Wireless Multimedia Sensor NetworksVideo Streaming Compression for Wireless Multimedia Sensor Networks
Video Streaming Compression for Wireless Multimedia Sensor Networks
 
(Paper) MTcast: Robust and Efficient P2P-based Video Delivery for Heterogeneo...
(Paper) MTcast: Robust and Efficient P2P-based Video Delivery for Heterogeneo...(Paper) MTcast: Robust and Efficient P2P-based Video Delivery for Heterogeneo...
(Paper) MTcast: Robust and Efficient P2P-based Video Delivery for Heterogeneo...
 
A Benchmark to Evaluate Mobile Video Upload to Cloud Infrastructures
A Benchmark to Evaluate Mobile Video Upload to Cloud InfrastructuresA Benchmark to Evaluate Mobile Video Upload to Cloud Infrastructures
A Benchmark to Evaluate Mobile Video Upload to Cloud Infrastructures
 
(Paper) P2P VIDEO BROADCAST BASED ON PER-PEER TRANSCODING AND ITS EVALUATION ...
(Paper) P2P VIDEO BROADCAST BASED ON PER-PEER TRANSCODING AND ITS EVALUATION ...(Paper) P2P VIDEO BROADCAST BASED ON PER-PEER TRANSCODING AND ITS EVALUATION ...
(Paper) P2P VIDEO BROADCAST BASED ON PER-PEER TRANSCODING AND ITS EVALUATION ...
 
Using Bandwidth Aggregation to Improve the Performance of Video Quality- Adap...
Using Bandwidth Aggregation to Improve the Performance of Video Quality- Adap...Using Bandwidth Aggregation to Improve the Performance of Video Quality- Adap...
Using Bandwidth Aggregation to Improve the Performance of Video Quality- Adap...
 
Ending the Multipoint Videoconferencing Compromise
Ending the Multipoint Videoconferencing CompromiseEnding the Multipoint Videoconferencing Compromise
Ending the Multipoint Videoconferencing Compromise
 
VCIP_MCBE_presentation.pdf
VCIP_MCBE_presentation.pdfVCIP_MCBE_presentation.pdf
VCIP_MCBE_presentation.pdf
 
Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...
Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...
Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...
 
E0431020024
E0431020024E0431020024
E0431020024
 
Online Bitrate ladder prediction for Adaptive VVC Streaming
Online Bitrate ladder prediction for Adaptive VVC StreamingOnline Bitrate ladder prediction for Adaptive VVC Streaming
Online Bitrate ladder prediction for Adaptive VVC Streaming
 
Survey paper on Virtualized cloud based IPTV System
Survey paper on Virtualized cloud based IPTV SystemSurvey paper on Virtualized cloud based IPTV System
Survey paper on Virtualized cloud based IPTV System
 
[32]
[32][32]
[32]
 
Video Conferencing PAGES
Video Conferencing PAGESVideo Conferencing PAGES
Video Conferencing PAGES
 
Motion Vector Recovery for Real-time H.264 Video Streams
Motion Vector Recovery for Real-time H.264 Video StreamsMotion Vector Recovery for Real-time H.264 Video Streams
Motion Vector Recovery for Real-time H.264 Video Streams
 
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
 

More from Videoguy

Proxy Cache Management for Fine-Grained Scalable Video Streaming
Proxy Cache Management for Fine-Grained Scalable Video StreamingProxy Cache Management for Fine-Grained Scalable Video Streaming
Proxy Cache Management for Fine-Grained Scalable Video StreamingVideoguy
 
Free-riding Resilient Video Streaming in Peer-to-Peer Networks
Free-riding Resilient Video Streaming in Peer-to-Peer NetworksFree-riding Resilient Video Streaming in Peer-to-Peer Networks
Free-riding Resilient Video Streaming in Peer-to-Peer NetworksVideoguy
 
Video Streaming
Video StreamingVideo Streaming
Video StreamingVideoguy
 
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMINGADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMINGVideoguy
 
Impact of FEC Overhead on Scalable Video Streaming
Impact of FEC Overhead on Scalable Video StreamingImpact of FEC Overhead on Scalable Video Streaming
Impact of FEC Overhead on Scalable Video StreamingVideoguy
 
Application Brief
Application BriefApplication Brief
Application BriefVideoguy
 
Video Streaming Services – Stage 1
Video Streaming Services – Stage 1Video Streaming Services – Stage 1
Video Streaming Services – Stage 1Videoguy
 
Flash Live Video Streaming Software
Flash Live Video Streaming SoftwareFlash Live Video Streaming Software
Flash Live Video Streaming SoftwareVideoguy
 
Videoconference Streaming Solutions Cookbook
Videoconference Streaming Solutions CookbookVideoconference Streaming Solutions Cookbook
Videoconference Streaming Solutions CookbookVideoguy
 
Streaming Video Formaten
Streaming Video FormatenStreaming Video Formaten
Streaming Video FormatenVideoguy
 
iPhone Live Video Streaming Software
iPhone Live Video Streaming SoftwareiPhone Live Video Streaming Software
iPhone Live Video Streaming SoftwareVideoguy
 
Glow: Video streaming training guide - Firefox
Glow: Video streaming training guide - FirefoxGlow: Video streaming training guide - Firefox
Glow: Video streaming training guide - FirefoxVideoguy
 
Video and Streaming in Nokia Phones v1.0
Video and Streaming in Nokia Phones v1.0Video and Streaming in Nokia Phones v1.0
Video and Streaming in Nokia Phones v1.0Videoguy
 
Video Streaming across wide area networks
Video Streaming across wide area networksVideo Streaming across wide area networks
Video Streaming across wide area networksVideoguy
 
University Information Systems Product Service Offering
University Information Systems Product Service OfferingUniversity Information Systems Product Service Offering
University Information Systems Product Service OfferingVideoguy
 
Video Communications and Video Streaming
Video Communications and Video StreamingVideo Communications and Video Streaming
Video Communications and Video StreamingVideoguy
 
Mixed Streaming of Video over Wireless Networks
Mixed Streaming of Video over Wireless NetworksMixed Streaming of Video over Wireless Networks
Mixed Streaming of Video over Wireless NetworksVideoguy
 
A Streaming Media Primer
A Streaming Media PrimerA Streaming Media Primer
A Streaming Media PrimerVideoguy
 
Video streaming
Video streamingVideo streaming
Video streamingVideoguy
 

More from Videoguy (20)

Proxy Cache Management for Fine-Grained Scalable Video Streaming
Proxy Cache Management for Fine-Grained Scalable Video StreamingProxy Cache Management for Fine-Grained Scalable Video Streaming
Proxy Cache Management for Fine-Grained Scalable Video Streaming
 
Adobe
AdobeAdobe
Adobe
 
Free-riding Resilient Video Streaming in Peer-to-Peer Networks
Free-riding Resilient Video Streaming in Peer-to-Peer NetworksFree-riding Resilient Video Streaming in Peer-to-Peer Networks
Free-riding Resilient Video Streaming in Peer-to-Peer Networks
 
Video Streaming
Video StreamingVideo Streaming
Video Streaming
 
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMINGADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
 
Impact of FEC Overhead on Scalable Video Streaming
Impact of FEC Overhead on Scalable Video StreamingImpact of FEC Overhead on Scalable Video Streaming
Impact of FEC Overhead on Scalable Video Streaming
 
Application Brief
Application BriefApplication Brief
Application Brief
 
Video Streaming Services – Stage 1
Video Streaming Services – Stage 1Video Streaming Services – Stage 1
Video Streaming Services – Stage 1
 
Flash Live Video Streaming Software
Flash Live Video Streaming SoftwareFlash Live Video Streaming Software
Flash Live Video Streaming Software
 
Videoconference Streaming Solutions Cookbook
Videoconference Streaming Solutions CookbookVideoconference Streaming Solutions Cookbook
Videoconference Streaming Solutions Cookbook
 
Streaming Video Formaten
Streaming Video FormatenStreaming Video Formaten
Streaming Video Formaten
 
iPhone Live Video Streaming Software
iPhone Live Video Streaming SoftwareiPhone Live Video Streaming Software
iPhone Live Video Streaming Software
 
Glow: Video streaming training guide - Firefox
Glow: Video streaming training guide - FirefoxGlow: Video streaming training guide - Firefox
Glow: Video streaming training guide - Firefox
 
Video and Streaming in Nokia Phones v1.0
Video and Streaming in Nokia Phones v1.0Video and Streaming in Nokia Phones v1.0
Video and Streaming in Nokia Phones v1.0
 
Video Streaming across wide area networks
Video Streaming across wide area networksVideo Streaming across wide area networks
Video Streaming across wide area networks
 
University Information Systems Product Service Offering
University Information Systems Product Service OfferingUniversity Information Systems Product Service Offering
University Information Systems Product Service Offering
 
Video Communications and Video Streaming
Video Communications and Video StreamingVideo Communications and Video Streaming
Video Communications and Video Streaming
 
Mixed Streaming of Video over Wireless Networks
Mixed Streaming of Video over Wireless NetworksMixed Streaming of Video over Wireless Networks
Mixed Streaming of Video over Wireless Networks
 
A Streaming Media Primer
A Streaming Media PrimerA Streaming Media Primer
A Streaming Media Primer
 
Video streaming
Video streamingVideo streaming
Video streaming
 

DYNAMIC REGION OF INTEREST TRANSCODING FOR MULTIPOINT VIDEO ...

  • 1. DYNAMIC REGION OF INTEREST TRANSCODING FOR MULTIPOINT VIDEO CONFERENCING Chia-Wen Lin Department of Computer Science and Information Engineering, National Chung Cheng University, Chiayi, Taiwan, R.O.C. cwlin@cs.ccu.edu.tw Yung Chang Chen Department of Electrical Engineering National Tsing Hua University, Hsinchu, Taiwan R.O.C. ycchen@benz.ee.nthu.edu.tw Ming-Ting Sun Department of Electrical Engineering University of Washington, Seattle, Washington, USA sun@ee.washington.edu ABSTRACT centralized server. In this scenario, multiple conferees are connected to the central server, referred to as the This paper presents a region of interest transcoding Multipoint Control Unit (MCU), which coordinates and scheme for multipoint video conferencing to enhance the distributes video and audio streams among multiple visual quality. In a multipoint videoconference, usually participants in a video conference according to the there are only one or two active conferees at one time channel bandwidth requirement of each conferee. A which are the regions of interest to the other conferees video transcoder [2-5] is included in the MCU to involved. We propose a Dynamic Sub-Window Skipping combine the multiple incoming encoded video streams (DSWS) scheme to firstly identify the active participants from the various conferees into a single coded video from the multiple incoming encoded video streams by stream and send the re-encoded bit-stream back to each calculating the motion activity of each sub-window, and participant over the same channel with the required bit- secondly reduce the frame-rates of the motion inactive rate and format for decoding and presentation. In the case participants by skipping these less-important sub- of a multipoint videoconference over PSTN (Public windows. The bits saved by the skipping operation are Switch Telephone Network) or ISDN (Integrated Service reallocated to the active sub-windows to enhance the Digital Network), the channel bandwidth is constant and regions of interest. We also propose a low-complexity symmetrical. Assuming each conferee has a channel scheme to compose and trace the unavailable motion bandwidth of B kb/s, then MCU receives each conferee’s vectors with a good accuracy in the dropped inactive video at B kb/s each, decodes and combines the videos, sub-windows after performing the DSWS. Simulation and re-encodes the combined video at B kbps so as to results show that the proposed methods not only meet the channel bandwidth requirements for sending significantly improve the visual quality on the active sub- back the encoded video to the conferees. Therefore it is windows without introducing serious visual quality required to perform bit-rate conversion/reduction at the degradation in the inactive ones, but also reduce the video transcoder. Bit-rate conversion from high bit-rate computational complexity and avoid whole-frame to low bit-rate in video transcoding will, however, skipping. Moreover, the proposed algorithm is fully introduce video quality degradation. The visual quality, compatible with the H.263 video coding standard. the computational load, and the used bit-rates need to be traded off in video transcoding to achieve a good solution. 1. INTRODUCTION The problem of how to efficiently redistribute the limited bit-rates to different parts of a video in video transcoding Video telephony is an efficient way for businesspersons, is critical in providing satisfactory visual quality. In a engineers, scientists, etc. to exchange information at multipoint videoconference, most of the time only one or remote locations. With the rapid growth of video two conferees are active at one time. The active telephony, the need of multipoint video conferencing is conferees need higher bit-rates to produce good quality also growing. A multipoint videoconference involves video while the inactive conferees require lower bit-rates three or more conference participants. In continuous to produce acceptable quality video [4]. Simply presence video conferencing, each conferee can see uniformly distributing the bit-rates to the conferees will others in the same window simultaneously [1]. Fig. 1 result in non-uniform video quality. To make best use of depicts an application scenario of multiple persons the available bit-rates, a joint rate-control scheme which participating in a multipoint videoconference with a takes into account each conferee sub-window’s activity
  • 2. can be used [4-5]. Sun et al. [4] proposed to measure the vectors in the dropped sub-windows after performing the motion activity of each sub-window by calculating the DSWS scheme. Section 4 reports the experimental sum of the magnitudes of its corresponding motion results of the proposed algorithms and the comparison vectors, and allocate the bit-rates to each sub-window with the H.263 [11] TMN8 direct transcoding method. according to its activity. Thus more bits will be allocated Finally, conclusions are drawn in Section 5. to those sub-windows with higher activities, and this strategy will produce much more uniform quality video. 2. DNAMIC SUBWINDOW SKIPPING (DSWS) Wu et al. [5] extended the work in [4] by allocating the bit-rates to each sub-window according to its spatial- temporal activity which takes into account the motion, Fig. 2 depicts the architecture for multipoint video the variance of the residual signal, and the number of the transcoding discussed in this paper. In this architecture, encoded macroblocks. Similar work on joint rate-control the input data for the video transcoder consist of multiple can also be found in the statistical multiplexing (StatMux) H.263 encoded video streams from the client-terminals of multiple video programs [6-8] and MPEG-4 joint rate- through a heterogeneous network environment. The control of multiple video objects [9]. However the strong video streams could be transmitted through PSTN, ISDN, correlation among the conferees in a multipoint video or LAN (Local Area Network) with various bandwidth conference usually does not exist in the general cases of requirements. For simplicity but without loss of the StatMux and MPEG-4 object rate control. In addition, generality, we assume each video stream is encoded in we will show that dynamic temporal resolution control the QCIF (176x144) format and each participant can see [10] for each sub-window, which has not been addressed 4 participants in a CIF (352x288) frame in a continuous in the previous work, may bring further coding gain and presence fashion. In our experiments we assume the computation reduction in multipoint video transcoding. video transmission is over ISDN as shown in the scenario in Fig. 1. As shown in Fig. 2, the input video streams first are buffered at the input to regulate the difference between the input data-rate and the transcoding date-rate video combining for each video frame. Each video stream is decoded through a Variable Length Decoder (VLD) and then ISDN transcoded into lower data-rates. To meet various user ISDN bit-rate requirements, more than one transcoder may be required for the same video stream to generate multiple video streams with different bit rates. The transcoded bit- streams are subsequently combined into CIF frames MCU in the through a number of multiplexers and video combiners. Central Office The multiplexing unit for the video combiner is the GOB (Group of Block) as specified in the H.263 standard [11]. Fig. 1. Application example of multipoint video conferencing Motion info In this paper, we present a DSWS scheme which Joint provides the flexibility that sub-windows can be encoded Rate control in different temporal resolutions according to their motion activities. The proposed DSWS scheme classifies from conferee #1 the sub-windows into active and inactive classes by BUF VLD Sub-window transcoder BUF video combiner to conferee #1 calculating the associated motion activities. The inactive sub-windows can then be dropped so that the saved bits from conferee #2 to conferee #2 can be used to significantly enhance the visual quality of BUF VLD Sub-window transcoder BUF video combiner the active ones without introducing serious degradation on the inactive ones. In addition to the performance gain from conferee #N on active sub-windows, the DSWS scheme also presents BUF VLD Sub-window BUF video to conferee #N transcoder combiner two other advantages: achieving computation reduction and avoiding the whole-frame skipping. Furthermore, we present a motion-vector composing scheme which can Fig. 2. The proposed multipoint video transcoding compose and trace the unavailable motion vectors in the architecture dropped sub-windows with good accuracy at a very low As mentioned above, in a multipoint video conference, computational and memory cost. usually only one or two persons are motion active at one The rest of this paper is organized as follows. Section 2 time. The active conferees (e.g., the conferees who are presents the proposed DSWS scheme. In Section 3, a pre- speaking) are often the center of focus. Therefore, filtered activity-based motion-vector composing scheme allocating the active sub-windows with relatively higher is proposed for composing the unavailable motion
  • 3. bit-rates can provide a much better visual experience to 1 N the viewers. The active conferees usually have larger SAADm = ∑ ∑ | fm (x, y) − fmprev ( x + MVmx,n , y + MVmy,n ) | N n =1 x, y∈MBm ,n motions than others, thus they can be easily detected by (2) using the incoming motion information. The sum of the magnitude of motion vectors of a sub- In this paper, we observe that in multipoint video window can be used as a good indication of its motion conferencing, the temporal resolutions of the low-active activity. A sub-window is classified as active if the sum is sub-windows may be reduced without introducing larger than a predetermined threshold, THMV, otherwise it significant visual degradation from the Human Visual is classified as inactive. An inactive sub-window will be System (HVS) point of view. The main reason is that, skipped if its associated SAAD value defined in (2) is since the motions presented by the inactive sub-windows below a threshold, THSAAD. If an inactive sub-window is are relatively slow and the conferees usually concentrate skipped, the corresponding latest non-skipped sub- their focuses on the active ones, the effect of the temporal window is repeated to approximate the skipped sub- resolution reduction by skipping inactive sub-windows windows. Human visual perception is relatively can often be masked by the high motions in the active insensitive to the little differences between the skipped sub-windows thus is not sensitive to viewers’ perceptions. sub-windows and their reconstructed ones from sub- To make best use of this property, we propose to drop window repetition if the skipped sub-windows are motion inactive sub-windows by using sub-window inactive. The two thresholds, THMV and THSAAD, are used repetition to approximate those dropped sub-windows at for the classification, the larger the thresholds are set, the the end decoder, so that the saved bits can be used to more the sub-windows will be skipped, and the more the enhance the quality of the remaining non-skipped active saved bits will be used in other sub-windows (but jerky ones which are usually the regions of interest. In addition, motions will become more serious). The SAAD value of if a sub-window is decided to be skipped, much each sub-window is used to constrain the sub-window computation in transcoding this sub-window can be saved, skipping. If the current frame is inactive but the SAAD is thus significant computation reduction can be achieved. larger than a threshold, the proposed method enforces Sub-window skipping can be implemented in the H.263 that the sub-window, which would otherwise be skipped, syntax [11] by simply setting all the COD bits of the be encoded. This measure can prevent the error Macroblocks (MBs) belonging to the skipped sub- accumulation caused by slow but steady motions by using windows to “1” to get rid of sending the associated DCT only the motion activity measure as in [10]. It should be coefficients, motion vector information, and MB noted that, since the incoming sub-windows may be overhead bits. Only 99 bits (for 99 MB COD bits dropped in consecutive frames with the DSWS method, respectively) are required to represent a skipped QCIF the incoming motion vectors may not be valid since they (176x144) sub-window, thus the overhead is relatively may point to the dropped sub-windows that do not exist negligible. In our proposed DSWS scheme, the motion in the transcoded bit-stream. To compose and trace the information is used to calculate the motion activity of required but unavailable motion vectors along the each sub-window for dynamic sub-window skipping consecutively skipped sub-windows with respect to the control. The DSWS scheme is summarized as follows: corresponding latest encoded sub-windows, the motion MV if ( Sm < TH MV ) & &(SAADm < TH SAAD ) vector composing scheme proposed in Section 3 is used. The proposed DSWS scheme presents several advantages. then First, the quality of the active sub-windows can be Skip the transcoding of the mth sub-window effectively enhanced. The quality loss on the inactive sub-windows is relatively small and visually insensitive else to the viewer’s perception. Second, skipping a sub- Transcode the mth sub-window window implies saving much computation in transcoding that sub-window (in our simulation, about 2/3 of the where the sum of the magnitude of accumulated motion computation in transcoding that sub-window can be vectors of the mth sub-window is defined as saved), thus achieving significant computation reduction. N Finally, by skipping the motion inactive sub-windows, 1 MV Sm = N ∑ ( MV x m ,n ) + MVmy, n , (1) many whole-frame skippings due to insufficient bit- n =1 allocation can be avoided, so that the temporal resolution of the motion active sub-windows can be kept as high as N is the number of MBs in a sub-window, ( MVmx,n , MVmy,n ) possible. Moreover, the proposed method can be is the motion vector associated with the nth macroblock combined with the dynamic bit-allocation scheme of the mth sub-window with respect to its corresponding presented in [4] to further improve the visual quality with prev previously encoded sub-window (i.e., f m (⋅) ), and the almost no extra complexity. Sum of Accumulated Absolute Difference (SAAD) of the sub-window is defined as
  • 4. 3. COMPOSING MOTION VECTORS IN THE motion vector MV n − k is an interpolation with a form SKIPPED SUB-WINDOWS [10,13-14]: 4 After performing the DSWS scheme, sub-windows may n−k ∑ IV i n−k Ai ACTi be skipped in consecutive frames. Similar to the frame- MV = i =1 (3) 4 rate conversion situation stated in [10,12], the motion vectors in the dropped sub-windows are usually ∑ A ACT i =1 i i unavailable in the incoming bit-stream, thus need to be where Ai and ACTi represents the corresponding re-computed. For example, in Fig. 3, a situation where overlapping area with and the activity of the ith one sub-window is dropped in two consecutive frames in neighboring grid MB respectively. In (3), the block transcoding is illustrated. As showin in Fig. 3, each video activities, ACTi’s, can be computed in the pixel-domain stream going into the MCU carries the incoming motion or the DCT-domain. Since the number of the DCT n vectors IVm , where n and m represent the frame and MB coefficients of a MB is usually small, computing the sequence numbers respectively, for the 16x16 MBs on activity in the DCT domain is much more efficient. We the segmented grid. In this example, the equivalent propose to use the sum of the magnitude of the non-zero outgoing motion vector sent from the MCU to the client- DCT AC coefficient as the activity measure as follows: n terminals of the block MB1 should be ACTi = ∑ Coef (4) i, j n n n−1 n− 2 OV = IV + MV 1 1 1 + MV 1 , instead of the incoming j∉DC where Coefi,j is the jth nonzero DCT AC coefficient of motion vector IV . However, MV1n −1 and MV1n − 2 do not 1 n the ith anchor block which is decoded and de-quantized ' '' exist in the incoming bit-stream since MB1 and MB1 from the incoming bit-stream. Since the number of the are not aligned with the MB grid. Thus the outgoing nonzero de-quantized DCT coefficients in a residual motion vector needs to be either re-estimated using block is usually small, the computational cost is thus motion estimation schemes or composed using the quite low. available motion information of the MBs on the gird. As explained in [12], there are several drawbacks with the above interpolation scheme. First, for consecutively dropped sub-windows, the interpolation should be MB1 − 3 n MB n− 3 (dropped) MB1 2 n− MB n − 2 n (dropped) MB1 −1 n MB1 MB n processed in backward order starting from the last ' MB1 dropped sub-window to the first dropped sub-window. 2 2 2 '' MB1 MV1n −1 IV1n MV1n− 2 This backward processing requires all motion vectors of ''' n MB MB3 − 3 1 MB1 3 n− n− MB3 2 MB n − 2 4 MB3 −1 n MB n −1 n MB3 MB n 4 4 OV1n the dropped sub-windows be stored, which requires much Sub-window (n-3) Sub-window (n-2) Sub-window (n-1) Sub-window (n) extra memory. Another drawback of the interpolation Tracing macroblocks in backward order scheme is the inaccuracy of the resultant motion vector. IV1n− 2 IV2n− 2 IV1n−1 IV2n−1 IV1n IV2n In spite of the proper weighting of each neighboring motion-vector based on the associated overlapping area IV3n− 2 IV4n− 2 IV3n−1 IV4n−1 IV3n IV4n and activity, unreliable motion vectors can be produced Incoming motion vectors because the area covered by four neighboring MBs may be too divergent and too large to be described by a single motion. The interpolation of these diverse motion flows Fig. 3. Motion vector tracing in a MB with 2 sub- thus may not produce an accurate motion vector. To windows skipping. improve the results, a Forward Dominant Vector Motion vector re-estimation is undesirable due to the Selection (FDVS) scheme was proposed in [12] to intensive computation. Instead, motion vector composing compose and trace the un-available motion vectors in the using the available motion information is a better dropped frames. Instead of interpolating the target approach [10,12]. Similar problem has been discussed motion vector from its four neighboring motion vectors, for video downscaling applications [13-15]. When a MB the FDVS method selects one out of four neighboring is not aligned with the MB grid, it will in general overlap motion vectors as a dominant one to approximate the n−k with four MBs with the corresponding motion vectors target motion-vector, i.e., MV = { IV1n − k , IV2n − k , IV3n − k , IV4n − k }. Composing the motion n−1 n−1 n−1 n−1 dominant( IV1 , IV2 , IV3 , IV4 ), where the vector MV n − k from the neighboring motion vectors dominant motion-vector is defined as the motion-vector { IV1n − k , IV2n − k , IV3n − k , IV4n − k } is to find a mapping carried by the neighboring MB which overlaps the target n −k MB the most. Fig. 4 illustrates the FDVS scheme for the function MV = f( IV1n −1 , IV2n −1 , IV3n−1 , IV4n −1 ) which 2-frame skipping case. The FDVS scheme can compose can approximate MV n − k with a good accuracy. One the motion vectors with good accuracy and low complexity in the forward order [12]. straightforward mapping function for composing the
  • 5. for i = 1 to 4 if IVi − IVmean > kstd ⋅ IVstd IVi is unreliable, remove it from the (dropped) (dropped) MB1 − 3 n MB n − 3 2 MB1 2 n− MB n − 2 '' 2 MB1 −1 n ' MB1 MB n 1 MB n 2 dominant vector candidate list MB1 IV1n n− MB 3 3 MB ''' MB1 n −3 MB n−2 n −2 MV1n −1 = IV2n −1 MB n −1 MB n MB n else MB MB n −1 3 IVi is reliable, keep it in the dominant 1 3 3 4 MV1n − 2 = IV2n − 2 4 4 OV1n Frame (n-3) Frame (n-2) Frame (n-1) Frame (n) vector candidate list where Kstd is a predefined constant. (3) Calculate the area-activity products Ai⋅ACTi for the MBs with the motion vector in the dominant vector candidate list, where Ai is the overlapping area with the neighboring block (i) and ACTi is the activity Fig. 4. Forward dominant vector selection schemes [12]. measure as defined in (5). Then select the motion The performance of the FDVS method can be further vector of the neighboring MB with the largest area- improved with a little extra computational complexity. If activity product as the dominant vector. there is no strongly dominant MB which overlaps the reference block with a significantly large area (e.g., the 4. EXPERIMENTAL RESULTS overlapping area is larger than a predefined threshold, In our experiments, two 200-frame standard QCIF say 80% of the block area), selecting a dominant vector (176x144) test sequences “foreman” and “carphone” with which diverges largely from the other neighboring a frame rate of 30 frames/s are used to verify the motion vectors may degrade the quality significantly performance of the proposed PA-FDVS scheme. The since the motion vector chosen may be unreliable. To performance comparisons of the full-search motion solve this problem, we propose to remove the unreliable estimation method and the motion-vector composing motion vectors from the candidate list before selecting methods (interpolation, FDVS, and PA-FDVS methods) the dominant motion vector if no strongly dominant MB using the two test sequences are shown in Table 1 and is found. Furthermore, in the dominant vector selection, Figure 5. Table 1 shows the average PSNR comparisons the “largest overlapping area” may not be the best for the two test-sequences which were first encoded with criterion when the overlapping areas of some of the other 128 kb/s and 30 frames/s and then transcoded with 32 k/s neighboring anchor blocks are similar. In this case, we and 7.5 frames/s. The result in Table 1 indicates that PA- propose to select the neighboring MB with the largest FDVS performs better than FDVS and significantly overlapping energy/activity as the dominant MB and use outperforms the interpolation scheme. The frame-by- the activity measure as defined in (4). frame PSNR comparison of the PA-FDVS and FDVS The proposed Pre-filtered Activity-based FDVS scheme schemes with the same test condition used in Table 1 are (PA-FDVS) is summarized as follows: shown in Figure 5. Alhough the average PSNR values of the PA-FDVS and FDVS in Table 1 are close, Figure 5 (1) For each MB, calculate the largest overlapping area, suggests that the PA-FDVS scheme achieves significant and if the largest overlapping area is greater than a PSNR improvement (up to 1.6 dB and 2.9 dB for the predetermined threshold (e.g., 80% in our simulation) “foreman” and “carphone” sequences respectively) over then select the motion vector of the neighboring MB the FDVS scheme on several frames with many divergent with the largest overlapping area as the dominant object-motions. vector and process the next MB, otherwise go to step Table 1 (2). Performance comparison of different motion vector (2) Perform the following motion vector pre-filtering estimation and composition methods. Incoming bit- procedure: streams of 128 kb/s and 30 fps were transcoded into 32 Set the initial candidate list as the four neighboring kb/s and 7.5 fps vectors { IV1 , IV2 , IV3 , IV4 } Calculate the mean and the standard deviation of Test sequence MV composition method Average PSNR the four neighboring motion vectors as follows: Full-scale ME 27.39 Interpolation 23.72 Foreman 1 4 FDVS 25.51 IVmean = ∑ IVi 4 i=1 FA-FDVS 25.67 Full-scale ME 29.47 1 4 2 Interpolation 27.07 IVstd = ∑ ( IVi − IVmean ) 4 i=1 Carphone FDVS 28.16 FA-FDVS 28.27
  • 6. 31 forem an skipped sub-windows at the video decoders. The thresholds, THMV and THSAD, are empirically set at 0.2 30 and 10 respectively. Fig. 7 illustrates that the proposed 29 DSWS method achieves PSNR gain on the sub-windows 28 with relatively high activities, while the low-activity sub- 27 windows are degraded. Table 2 shows the comparison of the average PSNR of all the sub-windows using the two P S NR 26 methods. As shown in Fig. 7 and Table 2, the proposed 25 DSWS scheme achieves 0.2 and 0.39 dB average PSNR 24 improvements on the overall and the non-skipped sub- 23 windows respectively. In sub-window 4, the performance 22 FDV S is degraded by 0.4 dB because of many long intervals P A -FDV S 21 with relatively low motion-activity. The degradation is 0 5 10 15 20 25 fram e 30 35 40 45 50 caused by the temporal resolution reduction by repeating the previously decoded sub-window for the period of (a) sub-window skipping, the temporal resolution reduction 35 carphone is visually insignificant, since the motions of these sub- 34 windows are very small. In this simulation case, 418 out 33 of 1600 sub-windows are skipped, thus achieving about 32 31 17% computation reduction since the computation 30 required for sub-window skipping decision and motion- 29 vector composing is about 1/3 of the computation for 28 sub-window transcoding. The computation reduction P S NR 27 26 ratio depends on the two threshold values: THMV and 25 THSAD. The higher the threshold values, the lower the 24 computation demand, however the lower the video 23 quality. It is thus possible to achieve better trade-offs 22 21 FDV S P A -FDV S between computational cost and video quality by 20 adjusting the threshold values adaptively. 0 5 10 15 20 25 30 35 40 45 50 frame (b) Sub-window 1 Fig. 5. Performance comparison of the FDVS and the 800 Motion Activity 600 FA-FDVS schemes. Incoming bit-streams of 128 Kb/s 400 200 and 30 fps are transcoded with 32 Kb/s and 7.5 fps: (a) 0 1 15 29 43 57 71 85 99 113 127 141 155 169 183 197 211 225 239 253 267 281 295 309 323 337 351 365 “foreman” sequence; (b) “carphone” sequence. Frame No. To verify the effectiveness of the proposed DSWS (a) scheme, four 400-frame QCIF video sequences captured Sub-window 2 800 from a four-point video-conference with a frame-rate of Motion Activity 600 400 15 frames/s are used for experiments. We firstly encoded 200 the four QCIF video sequences with 128 kb/s and 15 0 1 15 29 43 57 71 85 99 113 127 141 155 169 183 197 211 225 239 253 267 281 295 309 323 337 351 365 frames/s using the pubic-domain H.263 TMN8 software Frame No. [16]. The four input bit-streams are then jointly (b) transcoded into a single CIF (352x288) video with an Sub-window 3 output bit-rate of 128 kb/s and an output frame-rate of 15 800 Motion Activity 600 frames/s using the proposed DSWS scheme. The 400 200 compression ratio performed by the transcoder is thus 0 1 15 29 43 57 71 85 99 113 127 141 155 169 183 197 211 225 239 253 267 281 295 309 323 337 351 365 four in our experiments. Frame No. (c) Fig. 6 depicts the motion-activity of each sub-window. In Sub-window 4 the simulated video conference session, most of the time 800 Motion Activity 600 only one or two sub-windows are motion active. Figure 7 400 200 compares the frame-by-frame PSNR performance of the 0 1 15 29 43 57 71 85 99 113 127 141 155 169 183 197 211 225 239 253 267 281 295 309 323 337 351 365 proposed DSWS method and the direct transcoding using Frame No. TMN8 [17]. The PSNR of a skipped sub-window is (d) computed from the incoming QCIF sub-window images Fig. 6. Motion-activity of each sub-window and the latest previously reconstructed non-skipped one, since the sub-window repetitions will occur for the
  • 7. 32 Table 2 31.5 Direct DS W S Average PSNR Comparison of the proposed DSWS and 31 direct transcoding schemes 30.5 Average PSNR of Average PSNR of all non-skipped frames frames(dB) 30 (dB) P S NR 29.5 29 Ori. DSWS Ori. DSWS 28.5 Sub-window 1 28.54 29.14 +0.60 28.55 29.31 +0.76 28 (151 skipped) 27.5 Sub-window 2 28.59 29.16 +0.57 28.52 29.25 +0.73 27 (75 skipped) 0 50 100 150 200 fram e num ber 250 300 350 400 Sub-window 3 29.54 29.56 +0.02 29.48 29.59 +0.11 (54 skipped) (a) Sub-window 4 28.99 28.59 -0.40 28.73 28.68 -0.05 34 RT (139 skipped) Average 28.91 29.11 +0.20 28.82 29.21 +0.39 Direct DS W S 33 32 31 5. CONCLUSIONS P S NR 30 In this paper, we proposed a dynamic sub-window skipping scheme for multipoint video conferencing. The 29 28 proposed scheme can enhance the visual quality of the 27 active sub-windows by saving bits from skipping the inactive ones without introducing significant quality degradation. We also presented an efficient motion- 26 0 50 100 150 200 250 300 350 400 fram e num ber vector composing scheme to compose and trace the (b) motion vectors in the skipped sub-windows LB 32 31.5 Direct DS W S The proposed method is particularly useful in multi-point video conferencing applications since the focuses in such 31 applications are mainly on the active conferees. Simulation results verify the effectiveness of the 30.5 proposed method. Significant computation reduction is 30 also achieved with the proposed method. Furthermore, P S NR 29.5 29 the proposed algorithm is fully compatible with the 28.5 H.263 standard, thus can be integrated into current 28 commercial products. The proposed method can also be 27.5 further extended to enhance the quality of sp ecific 27 0 50 100 150 200 fram e num ber 250 300 350 400 regions/objects in region/object-based coding standards such as MPEG-4 [5]. (c) RB 6. REFERENCES 31 Direct DS W S [1] M.-T. Sun and I.-M. Pao, “Multipoint video 30.5 30 conferencing,” Visual Commun. and Image Proc., 29.5 Marcel Dekker, 1997. 29 [2] G. Keesman et al., “Transcoding of MPEG P S NR 28.5 28 bitstream,” Signal Proc. Image Commun., pp. 481- 27.5 500, 1996. 27 [3] P. A. A. Assuncao and M. Ghanbari, “A frequency 26.5 domain video transcoder for dynamic bit rate 26 0 50 100 150 200 250 300 350 400 reduction of MPEG-2 bit streams,” IEEE Trans. Circuits Syst. Video Technol., vol. 8, no. 8, pp. 953- fram e num ber (d) 567, Dec. 1998. Fig. 7. PSNR comparison of the proposed method and [4] M.-T. Sun, T.-D. Wu, and J.-N. Hwang, "Dynamic TMN8 bit allocation in video combing for multipoint video
  • 8. conferencing," IEEE Trans. on Circuit and Systems., vol. 45, no. 5, pp. 644-648, May. 1998. [5] G. Keesman, “Multi-program video compression using joint bit-rate control,” Philips Journal of Research, vol. 50, pp. 21-45, 1996. [6] L. Wang and A. Vincent, “Bit allocation and Constraints for Joint Coding of Multiple Video programs,” IEEE Trans. Circuits Syst. Video Technol., vol. 9, no. 6, pp. 949-959, Sep. 1999. [7] A. Vetro, H. Sun and Y. Wang, “MPEG-4 rate control for multiple video objects,” IEEE Trans. Circuits Syst. Video Technol., vol. 9, no. 1, pp. 186- 199, Feb. 1999. [8] M. E. Lukacs and D. G. Boyer, , and M. Mills, “The personal presence system experimental research prototype,” IEEE Int. Conf. Comm., vol. 2, pp. 1112-1116, Jun. 1996. [9] M. E. Lukac and D. G. Boyer, “A universal broadband multipoint teleconferencing service for the 21st century,” IEEE Comm. Magazine, vol. 33, no. 11, pp. 36-43, Nov. 1995. [10] J.-N. Hwang, T.-D. Wu, and C.-W. Lin, "Dynamic frame skipping in video transcoding,” Proc. IEEE Workshop on Multimedia Signal Proc., Dec. 1998, Redondo Beach, USA. [11] ITU-T Recommendation H.263, “Video codec for low bit-rate communication,” May. 1996. [12] J. Youn, M.-T. Sun and C.-W. Lin, “Adaptive motion vector refinement for high performance transcoding,” IEEE Trans. Multimedia, vol. 1, no. 1, pp. 30-40 Mar. 1999. [13] N. Bjorkand and C. Christopoulos, “Transcoder architecture for video coding,” IEEE Trans. Consumer Electron., vol.44, pp.88–98, Feb.1998. [14] B. Shen, I.K. Sethi, and V. Bhaskaran, “Adaptive motion-vector resampling for compressed video downscaling,” IEEE Trans. Circuits Syst. Video Technol., vol. 9, no. 6, pp. 929-936, Sep. 1999. [15] M. R. Hashemi, L. Winger, and S. Panchanathan, “Compressed domain vector resampling for downscaling of MPEG video,” Proc. IEEE Int. Conf. Image Proc., pp. 276-279, Japan, Oct. 1999. [16] Image Procssing Lab, University of British Columbia, “H.263+ encoder/decoder,” TMN(H.263) codec, Feb. 1998. [17] ITU-T/SG16, “Video codec test model, TMN8,” Portland, June 1997.