O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Perceptually-aware Per-title Encoding for Adaptive Video Streaming.pdf

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Carregando em…3
×

Confira estes a seguir

1 de 15 Anúncio

Perceptually-aware Per-title Encoding for Adaptive Video Streaming.pdf

Baixar para ler offline

In live streaming applications, a fixed set of bitrate-resolution pairs (known as bitrate ladder) is used for simplicity and efficiency to avoid the additional encoding run-time required to find optimum resolution-bitrate pairs for every video content. However, an optimized bitrate ladder may result in (i) decreased storage or delivery costs or/and (ii) increased Quality of Experience (QoE). This paper introduces a perceptually-aware per-title encoding (PPTE) scheme for video streaming applications. In this scheme, optimized bitrate-resolution pairs are predicted online based on Just Noticeable Difference (JND) in quality perception to avoid adding perceptually similar representations in the bitrate ladder. To this end, Discrete Cosine Transform (DCT)-energy-based low-complexity spatial and temporal features for each video segment are used. Experimental results show that, on average, PPTE yields bitrate savings of 16.47% and 27.02% to maintain the same PSNR and VMAF, respectively, compared to the reference HTTP Live Streaming (HLS) bitrate ladder without any noticeable additional latency in streaming accompanied by a 30.69% cumulative decrease in storage space for various representations.

In live streaming applications, a fixed set of bitrate-resolution pairs (known as bitrate ladder) is used for simplicity and efficiency to avoid the additional encoding run-time required to find optimum resolution-bitrate pairs for every video content. However, an optimized bitrate ladder may result in (i) decreased storage or delivery costs or/and (ii) increased Quality of Experience (QoE). This paper introduces a perceptually-aware per-title encoding (PPTE) scheme for video streaming applications. In this scheme, optimized bitrate-resolution pairs are predicted online based on Just Noticeable Difference (JND) in quality perception to avoid adding perceptually similar representations in the bitrate ladder. To this end, Discrete Cosine Transform (DCT)-energy-based low-complexity spatial and temporal features for each video segment are used. Experimental results show that, on average, PPTE yields bitrate savings of 16.47% and 27.02% to maintain the same PSNR and VMAF, respectively, compared to the reference HTTP Live Streaming (HLS) bitrate ladder without any noticeable additional latency in streaming accompanied by a 30.69% cumulative decrease in storage space for various representations.

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Semelhante a Perceptually-aware Per-title Encoding for Adaptive Video Streaming.pdf (20)

Anúncio

Mais recentes (20)

Perceptually-aware Per-title Encoding for Adaptive Video Streaming.pdf

  1. 1. Perceptually-aware Per-title Encoding for Adaptive Video Streaming Vignesh V Menon1, Hadi Amirpour1, Mohammad Ghanbari1,2, and Christian Timmerer1 1 Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität, Klagenfurt, Austria 2 School of Computer Science and Electronic Engineering, University of Essex, UK 19 July 2022 Vignesh V Menon Perceptually-aware Per-title Encoding for Adaptive Video Streaming 1
  2. 2. Outline 1 Introduction 2 PPTE 3 Results 4 Conclusion Vignesh V Menon Perceptually-aware Per-title Encoding for Adaptive Video Streaming 2
  3. 3. Introduction Per-title Encoding In HAS, each video is encoded at a fixed set of bitrate-resolution pairs, referred to as bitrate ladder. The “one-size-fits-all” can be optimized per title to increase the Quality of Experience (QoE) or decrease the bitrate of the representations as introduced for VoD services.1 0.2 0.5 1.2 4.5 16.8 Bitrate (in Mbps) 30 40 50 60 70 80 90 VMAF Dolls-540p Dolls-1080p Park-540p Park-1080p Figure: Rate-Distortion (RD) curves using VMAF as the quality metric of Dolls and Park sequences of MCML dataset encoded at 540p and 1080p resolutions. 1 J. De Cock et al. “Complexity-based consistent-quality encoding in the cloud”. In: 2016 IEEE International Conference on Image Processing (ICIP). 2016, pp. 1484–1488. doi: 10.1109/ICIP.2016.7532605. Vignesh V Menon Perceptually-aware Per-title Encoding for Adaptive Video Streaming 3
  4. 4. Introduction Motivation for Percetually-aware Per-title Encoding The selection of bitrate-resolution pairs (i.e., (rt, bt) where t ≥ 0) from the convex-hull is a challenging task. The increased number of selected bitrate-resolution pairs for the bitrate ladder may improve QoE, but leads to an increase in storage and bandwidth requirements.2 Furthermore, the selected bitrate-resolution pairs from the convex-hull for the bitrate ladder may not always be perceptually different in video quality. 0.2 0.5 1.2 4.5 16.8 Bitrate (in Mbps) 50 60 70 80 90 100 VMAF 360p 432p 540p 720p 1080p 1440p 2160p Figure: The HLS bitrate ladder of Characters sequence of MCML dataset. 2 Tianchi Huang et al. “Deep Reinforced Bitrate Ladders for Adaptive Video Streaming”. In: NOSSDAV ’21. Istanbul, Turkey: Association for Computing Machinery, 2021, 66–73. isbn: 9781450384353. doi: 10.1145/3458306.3458873. url: https://doi.org/10.1145/3458306.3458873. Vignesh V Menon Perceptually-aware Per-title Encoding for Adaptive Video Streaming 4
  5. 5. Introduction Vision of the paper b0 b1 b2 b3 b4 b5 b6 Bitrate v0 v1=v0 + vJ(v0) v2=v1 + vJ(v1) v3=v2 + vJ(v2) v4=v3 + vJ(v3) v5=v4 + vJ(v4) v6=v5 + vJ(v5) VMAF vmax r1 r0 r2 r3 r4 r5 r6 Figure: The ideal bitrate ladder envisioned in this paper. The blue line denotes the corresponding rate-distortion curve, while the red dotted line denotes VMAF=vmax . When the VMAF value is greater than vmax , the video stream is deemed to be perceptually lossless. Vignesh V Menon Perceptually-aware Per-title Encoding for Adaptive Video Streaming 5
  6. 6. PPTE PPTE Input Title Feature Extraction Bitrate Ladder Prediction Resolutions (R) Average JND (vJ) Bitrate Range {bmin, bmax } Maximum VMAF {vmax } Per-title Encoding Segments (E,h) pairs (r, b) pairs Figure: PPTE architecture. Vignesh V Menon Perceptually-aware Per-title Encoding for Adaptive Video Streaming 6
  7. 7. PPTE Phase 1: Feature Extraction PPTE Phase 1: Feature Extraction Compute texture energy per block A DCT-based energy function is used to determine the block-wise feature of each frame defined as: Hk = w−1 X i=0 w−1 X j=0 e|( ij wh )2−1| |DCT(i, j)| (1) where wxw is the size of the block, and DCT(i, j) is the (i, j)th DCT component when i + j > 0, and 0 otherwise. The energy values of blocks in a frame is averaged to determine the energy per frame.3 E = C−1 X k=0 Hp,k C · w2 (2) 3 Michael King et al. “A New Energy Function for Segmentation and Compression”. In: 2007 IEEE International Conference on Multimedia and Expo. 2007, pp. 1647–1650. doi: 10.1109/ICME.2007.4284983. Vignesh V Menon Perceptually-aware Per-title Encoding for Adaptive Video Streaming 7
  8. 8. PPTE Phase 1: Feature Extraction PPTE Phase 1: Feature Extraction hp: SAD of the block level energy values of frame p to that of the previous frame p − 1. hp = C−1 X k=0 | Hp,k, Hp−1,k | C · w2 (3) where C denotes the number of blocks in frame p. Latency Speed of feature extraction = 370fps for UHD video with 8 CPU threads and x86 SIMD optimization Vignesh V Menon Perceptually-aware Per-title Encoding for Adaptive Video Streaming 8
  9. 9. PPTE Phase 1: Feature Extraction PPTE Phase 2: Bitrate ladder Prediction Step 1: b0 = bmin vr,b0 = A0,r log q h E · b2 0 + A1,r v0 = max(vr,b0 ) r0 = arg maxr∈R(vr,b0 ) (r0, b0) is the first point of the bitrate ladder A0,r and A1,r Parameters trained using linear regression Vignesh V Menon Perceptually-aware Per-title Encoding for Adaptive Video Streaming 9
  10. 10. PPTE Phase 1: Feature Extraction PPTE Phase 2: Bitrate ladder Prediction Step 2: t = 1 for t ≥ 1 do vt = vt−1 + vJ(vt−1) br,vt = r q E h e vt −A1,r A0,r bt = min(br,vt ) rt = arg minr∈R(br,vt ) if bt bmax or vt vmax then End of the algorithm else (rt, bt) is the (t + 1)th point of the bitrate ladder. t = t + 1 Vignesh V Menon Perceptually-aware Per-title Encoding for Adaptive Video Streaming 10
  11. 11. Results Results 0.2 0.5 1.2 4.5 16.8 Bitrate (in Mbps) 30 40 50 60 70 80 VMAF HLS Ladder Proposed Scheme (a) IntoTree 0.2 0.5 1.2 4.5 16.8 Bitrate (in Mbps) 20 40 60 80 VMAF HLS Ladder Proposed Scheme (b) DaylightRoad2 0.2 0.5 1.2 4.5 16.8 Bitrate (in Mbps) 30 40 50 60 70 80 90 VMAF HLS Ladder Proposed Scheme (c) TreeShade Figure: Comparison of RD curves for encoding the IntoTree, DaylightRoad2, and TreeShade sequences using the HLS bitrate ladder and PPTE. Vignesh V Menon Perceptually-aware Per-title Encoding for Adaptive Video Streaming 11
  12. 12. Results Results 40 50 60 70 E 10 20 30 40 50 h 10 15 20 25 30 35 40 45 50 | S| Figure: ∆S results for various values of E and h. 40 50 60 70 E 10 20 30 40 50 h 5 10 15 20 25 30 35 40 |BDR V | Figure: Bjøntegaard delta rate w.r.t VMAF (BDRV ) results for various values of E and h. ∆S = 1 − P bopt P bref (4) where bref and bopt represent the sum of bitrates of all representations in the fixed bitrate ladder and the optimized bitrate ladder, respectively. Vignesh V Menon Perceptually-aware Per-title Encoding for Adaptive Video Streaming 12
  13. 13. Results Results Table: Results of PPTE against HLS bitrate ladder. Dataset Video SI TI E h BDRV BDRP ∆S Avg. JND JVET4 DaylightRoad2 40.51 16.21 54.78 20.35 -23.84% -10.88% -40.32% 6.99 JVET FoodMarket4 38.26 17.68 60.61 22.67 -19.22% -6.21% -28.13% 6.72 MCML5 Characters 50.43 29.85 42.66 21.06 -74.60% -71.70% -53.69% 3.82 MCML Crowd 33.76 10.13 56.74 15.89 -30.12% -15.63% -31.06% 7.85 MCML Lake 42.04 11.84 47.89 21.11 -38.00% -0.37% -44.83% 5.03 MCML Park 22.63 8.17 40.55 9.22 -10.47% -10.50% -15.35% 6.28 SJTU6 Fountains 43.37 11.42 63.30 26.83 -32.73% -2.18% -29.65% 5.80 SJTU RushHour 29.14 16.21 56.12 25.11 -20.50% -7.34% -42.73% 6.92 SJTU TrafficFlow 33.57 13.8 56.64 28.00 -53.34% -42.89% -44.83% 5.95 SJTU TreeShade 52.88 5.29 60.24 11.31 -48.38% -39.02% -31.06% 6.74 VGEG7 IntoTree 324.41 12.09 45.77 30.94 -26.23% -7.08% -40.32% 4.92 VGEG OldTownCross 29.66 11.62 50.31 27.64 -33.77% -25.07% -28.13% 5.86 VGEG ParkJoy 62.78 27.00 76.32 41.10 -15.68% -2.39% -18.16% 5.19 Average -27.02% -16.47% -30.69% 5.85 *These sequences were used for training. 4 Jill Boyce et al. JVET-J1010: JVET common test conditions and software reference configurations. July 2018. 5 Manri Cheon and Jong-Seok Lee. “Subjective and Objective Quality Assessment of Compressed 4K UHD Videos for Immersive Experience”. In: IEEE Transactions on Circuits and Systems for Video Technology 28.7 (2018), pp. 1467–1480. doi: 10.1109/TCSVT.2017.2683504. 6 L. Song et al. “The SJTU 4K Video Sequence Dataset”. In: Fifth International Workshop on Quality of Multimedia Experience (QoMEX2013) (July 2013). 7 European Broadcasting Union (EBU). “The SVT High Definition Multi Format Test Set”. In: Feb. 2006. url: https://tech.ebu.ch/docs/hdtv/svt-multiformat-conditions-v10.pdf. Vignesh V Menon Perceptually-aware Per-title Encoding for Adaptive Video Streaming 13
  14. 14. Conclusion Conclusion This paper proposed a perceptually-aware online per-title encoding (PPTE) scheme for live streaming applications. PPTE includes an algorithm that predicts the optimal resolution-bitrate pairs for every video segment based on JND in visual quality perception. Live streaming using PPTE requires 16.47% fewer bits to maintain the same PSNR and 27.02% fewer bits to maintain the same VMAF compared to the reference HLS bitrate ladder. The improvement in the compression efficiency is achieved with an average storage reduc- tion of 30.69% compared to the reference HLS bitrate ladder. Vignesh V Menon Perceptually-aware Per-title Encoding for Adaptive Video Streaming 14
  15. 15. Q A Q A Thank you for your attention! Vignesh V Menon (vignesh.menon@aau.at) Vignesh V Menon Perceptually-aware Per-title Encoding for Adaptive Video Streaming 15

×