O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

CAPS_Presentation.pdf

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Carregando em…3
×

Confira estes a seguir

1 de 18 Anúncio

CAPS_Presentation.pdf

Baixar para ler offline

In live streaming applications, a fixed set of bitrate-resolution pairs (known as bitrate ladder) is generally used to avoid additional pre-processing run-time to analyze the complexity of every video content and determine the optimized bitrate ladder. Furthermore, live encoders use the fastest available preset for encoding to ensure the minimum possible latency in streaming. For live encoders, it is expected that the encoding speed is equal to the video framerate. An optimized encoding preset may result in (i) increased Quality of Experience (QoE) and (ii) improved CPU utilization while encoding. In this light, this paper introduces a Content-Adaptive encoder Preset prediction Scheme (CAPS) for adaptive live video streaming applications. In this scheme, the encoder preset is determined using Discrete Cosine Transform (DCT)-energy-based low-complexity spatial and temporal features for every video segment, the number of CPU threads allocated for each encoding instance, and the target encoding speed. Experimental results show that CAPS yields an overall quality improvement of 0.83 dB PSNR and 3.81 VMAF with the same bitrate, compared to the fastest preset encoding of the HTTP Live Streaming (HLS) bitrate ladder using x265 HEVC open-source encoder. This is achieved by maintaining the desired encoding speed and reducing CPU idle time.

In live streaming applications, a fixed set of bitrate-resolution pairs (known as bitrate ladder) is generally used to avoid additional pre-processing run-time to analyze the complexity of every video content and determine the optimized bitrate ladder. Furthermore, live encoders use the fastest available preset for encoding to ensure the minimum possible latency in streaming. For live encoders, it is expected that the encoding speed is equal to the video framerate. An optimized encoding preset may result in (i) increased Quality of Experience (QoE) and (ii) improved CPU utilization while encoding. In this light, this paper introduces a Content-Adaptive encoder Preset prediction Scheme (CAPS) for adaptive live video streaming applications. In this scheme, the encoder preset is determined using Discrete Cosine Transform (DCT)-energy-based low-complexity spatial and temporal features for every video segment, the number of CPU threads allocated for each encoding instance, and the target encoding speed. Experimental results show that CAPS yields an overall quality improvement of 0.83 dB PSNR and 3.81 VMAF with the same bitrate, compared to the fastest preset encoding of the HTTP Live Streaming (HLS) bitrate ladder using x265 HEVC open-source encoder. This is achieved by maintaining the desired encoding speed and reducing CPU idle time.

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Mais recentes (20)

Anúncio

CAPS_Presentation.pdf

  1. 1. Content-adaptive Encoder Preset Prediction for Adaptive Live Streaming Vignesh V Menon1, Hadi Amirpour1, Prajit T Rajendran2, Mohammad Ghanbari1,3, and Christian Timmerer1 1 Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität, Klagenfurt, Austria 2 Universite Paris-Saclay, CEA, List, F-91120, Palaiseau, France 3 School of Computer Science and Electronic Engineering, University of Essex, UK 9 December 2022 Vignesh V Menon Content-adaptive Encoder Preset Prediction for Adaptive Live Streaming 1
  2. 2. Outline 1 Introduction 2 Research Problem 3 Content-Adaptive Encoder Preset Prediction Scheme (CAPS) 4 Evaluation 5 Conclusions Vignesh V Menon Content-adaptive Encoder Preset Prediction for Adaptive Live Streaming 2
  3. 3. Introduction Introduction HTTP Adaptive Streaming (HAS) HTTP Adaptive Streaming (HAS)1 has become the de-facto standard in delivering video content for various clients regarding internet speeds and device types. Traditionally, a fixed bitrate ladder, e.g., HTTP Live Streaming (HLS) bitrate ladder2, is used in live streaming. 1 A. Bentaleb et al. “A Survey on Bitrate Adaptation Schemes for Streaming Media Over HTTP”. In: IEEE Communications Surveys Tutorials 21.1 (2019), pp. 562–585. doi: 10.1109/COMST.2018.2862938. 2 https://developer.apple.com/documentation/http live streaming/ hls authoring specification for apple devices, last access: Nov 30, 2022. Vignesh V Menon Content-adaptive Encoder Preset Prediction for Adaptive Live Streaming 3
  4. 4. Introduction Introduction Live encoding in HAS Figure: Encoding time of HLS bitrate ladder2 representations of the Wood s000 sequence (5 second duration, 24fps) of VCD dataset using ultrafast preset of x265 and 8 CPU threads. For every representation, maintaining a fixed encoding speed, which is the same as the video framerate, independent of the video content, is a key goal for a live encoder. Reduction in encoding speed may lead to the unacceptable outcome of dropped frames during transmission, eventually decreasing the Quality of Experience (QoE).a Increase in encoding speed leads to more CPU idle time! a Pradeep Ramachandran et al. “Content Adaptive Live Encoding with Open Source Codecs”. In: Proceedings of the 11th ACM Multimedia Systems Conference. MMSys ’20. Istanbul, Turkey: Association for Computing Machinery, 2020, 345–348. isbn: 9781450368452. doi: 10.1145/3339825.3393580. url: https://doi.org/10.1145/3339825.3393580. Vignesh V Menon Content-adaptive Encoder Preset Prediction for Adaptive Live Streaming 4
  5. 5. Introduction Introduction Live encoding in HAS The preset for the fastest encoding (ultrafast for x2643 and x2654) is used as the encoding preset for all live content, independent of the dynamic complexity of the content. The resulting encode is sub-optimal, especially when the type of the content is dynamically changing, which is the typical use-case for live streams.5 When the content becomes easier to encode, the encoder would achieve a higher encoding speed than the target encoding speed. This, in turn, introduces unnecessary CPU idle time as it waits for the video feed. 3 https://www.videolan.org/developers/x264.html, last access: Nov 30, 2022. 4 https://www.videolan.org/developers/x265.html, last access: Nov 30, 2022. 5 Qingxiong Huangyuan et al. “Performance evaluation of H.265/MPEG-HEVC encoders for 4K video sequences”. In: Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific. 2014, pp. 1–8. doi: 10.1109/APSIPA.2014.7041782. Vignesh V Menon Content-adaptive Encoder Preset Prediction for Adaptive Live Streaming 5
  6. 6. Research Problem Research Problem For easy-to-encode content, encoder preset need to be configured such that encoding speed can be reduced while still being compatible with the expected live encoding speed, improving the quality of the encoded content. When the content becomes complex again, the encoder preset need to be reconfigured to move back to the faster configuration that achieves live encoding speed.6 This paper targets an encoding scheme that determines the encoding preset configuration dy- namically, which: is adaptive to the video content. maximizes the CPU utilization for a given target encoding speed. maximizes the compression efficiency. 6 Sergey Zvezdakov, Denis Kondranin, and Dmitriy Vatolin. “Machine-Learning-Based Method for Content-Adaptive Video Encoding”. In: 2021 Picture Coding Symposium (PCS). 2021, pp. 1–5. doi: 10.1109/PCS50896.2021.9477507. Vignesh V Menon Content-adaptive Encoder Preset Prediction for Adaptive Live Streaming 6
  7. 7. Content-Adaptive Encoder Preset Prediction Scheme (CAPS) Content-Adaptive encoder Preset prediction Scheme (CAPS) Bitrate Ladder Video Segment Video Complexity Feature Extraction E h L Bitrate set (B) Resolution set (R) Target speed Apriori information (e.g., codec, CPU threads) Encoding Preset Prediction Encoder Encoder Encoder Encoder … rep1 rep2 repN repN-1 Figure: The encoding pipeline using CAPS envisioned in this paper. Vignesh V Menon Content-adaptive Encoder Preset Prediction for Adaptive Live Streaming 7
  8. 8. Content-Adaptive Encoder Preset Prediction Scheme (CAPS) Phase 1: Video Complexity Feature Extraction CAPS Phase 1: Video Complexity Feature Extraction Compute texture energy per block A DCT-based energy function is used to determine the block-wise feature of each frame defined as: Hk = w−1 X i=0 w−1 X j=0 e|( ij wh )2−1| |DCT(i, j)| (1) where wxw is the size of the block, and DCT(i, j) is the (i, j)th DCT component when i + j > 0, and 0 otherwise. The energy values of blocks in a frame are averaged to determine the energy per frame.7,8 Es = K−1 X k=0 Hs,k K · w2 (2) 7 Michael King, Zinovi Tauber, and Ze-Nian Li. “A New Energy Function for Segmentation and Compression”. In: 2007 IEEE International Conference on Multimedia and Expo. 2007, pp. 1647–1650. doi: 10.1109/ICME.2007.4284983. 8 Vignesh V Menon et al. “Efficient Content-Adaptive Feature-Based Shot Detection for HTTP Adaptive Streaming”. In: 2021 IEEE International Conference on Image Processing (ICIP). 2021, pp. 2174–2178. doi: 10.1109/ICIP42928.2021.9506092. Vignesh V Menon Content-adaptive Encoder Preset Prediction for Adaptive Live Streaming 8
  9. 9. Content-Adaptive Encoder Preset Prediction Scheme (CAPS) Phase 1: Video Complexity Feature Extraction CAPS Phase 1: Video Complexity Feature Extraction hs: SAD of the block level energy values of frame s to that of the previous frame s − 1. hs = K−1 X k=0 | Hs,k, Hs−1,k | K · w2 (3) where K denotes the number of blocks in frame s. The luminescence of non-overlapping blocks k of sth frame is defined as: Ls,k = p DCT(0, 0) (4) The block-wise luminescence is averaged per frame denoted as Ls as shown below. Ls = K−1 X k=0 Ls,k K · w2 (5) Vignesh V Menon Content-adaptive Encoder Preset Prediction for Adaptive Live Streaming 9
  10. 10. Content-Adaptive Encoder Preset Prediction Scheme (CAPS) Phase 2: Encoding Preset Prediction CAPS Phase 2: Encoding Preset Prediction E h L Model set log(r) log(b) ̂ 𝑡!!"# . . . ̂ 𝑡!!$% Preset selection ̂ 𝑝 Figure: Encoding Preset Prediction architecture. Model Set: models to predict the encoding time for each preset. Preset selection: select the optimized preset that yields the target encoding time. Vignesh V Menon Content-adaptive Encoder Preset Prediction for Adaptive Live Streaming 10
  11. 11. Content-Adaptive Encoder Preset Prediction Scheme (CAPS) Phase 2: Encoding Preset Prediction CAPS Phase 2: Encoding Preset Prediction Model Set Random Forest (RF) models are trained to predict the encoding times for the pre-defined set of encoding presets (P). The minimum and maximum encoder preset (pmin and pmax , respectively) are chosen based on the target encoder. For example, x265 HEVC9 encoder supports encoding presets ranging from 0 to 9 (i.e., ultrafast to placebo). The model set predicts the encoding times for each of the presets in P as t̂pmin to t̂pmax . 9 G. J. Sullivan et al. “Overview of the high efficiency video coding (HEVC) standard”. In: IEEE Transactions on circuits and systems for video technology 22.12 (2012), pp. 1649–1668. Vignesh V Menon Content-adaptive Encoder Preset Prediction for Adaptive Live Streaming 11
  12. 12. Content-Adaptive Encoder Preset Prediction Scheme (CAPS) Phase 2: Encoding Preset Prediction CAPS Phase 2: Encoding Preset Prediction Preset Selection The preset is selected, which is closest to the target encoding time T, which is defined as: T = n f (6) where n represents the number of frames in the segment. The function ensures that the encoding time is not greater than the target encoding time T. t̂ = argminp | T − tp | c.t. p ∈ [pmin, pmax ]; t̂ ≤ T (7) where p is the selected optimum preset and t̂ is the encoding time for p. Vignesh V Menon Content-adaptive Encoder Preset Prediction for Adaptive Live Streaming 12
  13. 13. Evaluation Test Methodology Test Methodology Dataset: Video Complexity Dataset (VCD)10 (500 Ultra HD sequences, 5 s duration, 24 fps). Table: Encoder Settings. Encoder x265 v3.5 Target encoding speed/time 24 fps/ 5 seconds CPU threads 8 Ratecontrol CBR Table: Representations considered in this paper2 . Representation ID 01 02 03 04 05 06 07 08 09 10 11 12 r (height in pixels) 360 432 540 540 540 720 720 1080 1080 1440 2160 2160 b (in Mbps) 0.145 0.300 0.600 0.900 1.600 2.400 3.400 4.500 5.800 8.100 11.600 16.800 E, h, and L features are extracted from the video segments using VCA11 run in eight CPU threads. 10 Hadi Amirpour et al. “VCD: Video Complexity Dataset”. In: Proceedings of the 13th ACM Multimedia Systems Conference. MMSys ’22. Athlone, Ireland: Association for Computing Machinery, 2022, 234–239. isbn: 9781450392839. doi: 10.1145/3524273.3532892. url: https://doi.org/10.1145/3524273.3532892. 11 Vignesh V Menon et al. “VCA: Video Complexity Analyzer”. In: Proceedings of the 13th ACM Multimedia Systems Conference. MMSys ’22. Athlone, Ireland: Association for Computing Machinery, 2022, 259–264. isbn: 9781450392839. doi: 10.1145/3524273.3532896. url: https://doi.org/10.1145/3524273.3532896. Vignesh V Menon Content-adaptive Encoder Preset Prediction for Adaptive Live Streaming 13
  14. 14. Evaluation Experimental Results Experimental Results Figure: Average relative importance of features in the encoding time prediction of 2160p en- coding for all presets. Preset 0 1 2 3 4 5 6 7 8 R2 0.97 0.96 0.97 0.97 0.97 0.96 0.97 0.98 0.98 MAE 0.06 0.09 0.12 0.17 0.22 0.31 0.33 0.41 0.49 Table: Average encoding time prediction performance results of every preset considered for experimental vali- dation. (p = 0: ultrafast) Vignesh V Menon Content-adaptive Encoder Preset Prediction for Adaptive Live Streaming 14
  15. 15. Evaluation Experimental Results Experimental Results Figure: Average preset chosen for each representation in the HLS bitrate ladder. On average, representation 01 (0.145 Mbps) chooses slow preset (p = 6). On average, representations 11 (11.6 Mbps) and 12 (16.8 Mbps) choose ultrafast preset (p = 0). Vignesh V Menon Content-adaptive Encoder Preset Prediction for Adaptive Live Streaming 15
  16. 16. Evaluation Experimental Results Experimental Results (a) (b) (c) Figure: (a) Average encoding time, (b) Average PSNR, and (c) Average VMAF for each representation. Using ultrafast preset for all representations introduces significant CPU idle time for lower bitrate representations. However, CAPS yield lower CPU idle time when the encodings are carried out concurrently. Using CAPS, visual quality improves significantly at lower bitrate representations. CAPS yields an overall BD-VMAF of 3.81 and BD-PSNR of 0.83 dB. Vignesh V Menon Content-adaptive Encoder Preset Prediction for Adaptive Live Streaming 16
  17. 17. Conclusions Conclusions This paper proposed CAPS, a content-adaptive encoder preset prediction scheme for adap- tive live streaming applications. CAPS predicts the optimized encoder preset for a given target bitrate, and resolution for each segment, which helps improve the compression-efficiency of video encodings. DCT-energy-based features are used to determine segments’ spatial and temporal complex- ity. CAPS yield lower idle time and an overall quality improvement of 0.83dB PSNR and 3.81 VMAF score with the same bitrate, compared to the fastest preset (ultrafast) x265 encoding of the reference HLS bitrate ladder. Vignesh V Menon Content-adaptive Encoder Preset Prediction for Adaptive Live Streaming 17
  18. 18. Q & A Q & A Thank you for your attention! Vignesh V Menon (vignesh.menon@aau.at) VCA Vignesh V Menon Content-adaptive Encoder Preset Prediction for Adaptive Live Streaming 18

×