Content-gnostic Bitrate Ladder Prediction for Adaptive Video Streaming

Content-gnostic Bitrate Ladder
Prediction forAdaptive Video Streaming
29/09/2020
Angeliki Katsenou

Structure
I. Motivation
II. Compression and Content
III. Proposed Framework
IV. Results
V.Conclusion and Future Work

I. Motivation
Cisco reports on internet data traffic estimate that the video data share is
expected to reach 80% by 2022 and is expected to increase more. [1]
Due to the pandemic and recently shift towards remote work life-style, this figure
is probably almost a reality.
Video providers employ adaptive streaming to address the users specifications.
Traditionally, this is achieved by creating several versions for a video sequence
using different encoding parameters, such as resolution.
This, however, requires a huge amount of encodings, which impacts on time, cost
and energy (increased CO2 footprint).
11%
17%
11%
6%20%
16%
19%
Distribution of energy
consumption for production
and use in 2017
TVs (production)
Computers
(production)
Smartphones
(production)
Others
Terminals (use)
Networks (use)
Data Centers (use)
“…as of the end of December last year,
the maximum number of daily meeting
participants, both free and paid,
conducted on Zoom was approximately
10 million. In March this year, we reached
more than 200 million daily meeting
participants, both free and paid.” [2]
Eric S. Yuan
Founder and CEO, Zoom

I. Motivation
Fig.1 Sample frames of a 100 4K dataset.
101
102
103
104
105
106
Bitrate (kbps)
25
30
35
40
45
50
55
PSNR(dB)
4K
RQsFHD
RQs
HD
RQs
Fig.2 PSNR-log(Rate) curves across resolutions.
One ladder
does not fit all!
Table 1 The encoding ladder presented in Apple Tech Note TN2224.

I. Motivation
How can we ﬁnd the “best” bitrate ladder per content so that we do not compromise the quality of
experience?
How could we make this process more computationally efﬁcient without degrading the delivered
video quality?
Table 1 The encoding ladder presented in Apple Tech Note TN2224.
Table 2 Netflix’s per-title can change both the
number of rungs and their resolution. [3, 4]
Other Per-Title Approaches: Bitmovin, Mux, CAMBRIA, etc

I. Motivation
How can we ﬁnd the “best” bitrate ladder per content so that we do
not compromise the quality of experience?
How could we make this process more computationally efﬁcient
without degrading the delivered video quality? Convex Hull-
Optimal Encoding
Solution
Sub-optimal
Encoding Solution
Sub-optimal
Encoding Solution
Practical
Approach
Fig.3 RD curves and convex hull.
Ideally the optimal solution would to build the ladder by sampling
the convex hull of the RQ curves across resolutions.
We propose a content-gnostic machine-
learning based approach that predicts the
bitrate ladder.

II. Content Features and Compression
Fig.4 Correlation matrix of HM coding statistics to
spatio-temporal features. [5]
Fig.5 Examples of predicted PSNR-Rate curves. [5]

101
102
103
104
105
106
Bitrate (kbps)
25
30
35
40
45
50
55
PSNR(dB)
4K
RQsFHD
RQs
HD
RQs
Fig.2 PSNR-log(Rate) curves across resolutions.
5000 10000 50000
log (Bitrate (kbps))
32
34
36
38
40
PSNR(dB)
4K
FHD
HD
Convex Hull
{QP
high
FHD
,QP
HD
}
{QP
4K
,QP
low
FHD
}
Fig.6 Example of RQ curves’ intersection.
Finding the cross-over points helps defining
the switching of resolution on the convex hull.
We assume that the RQs are intersecting in an ordered monotonic fashion (e.g. 2160p intersects with the
1080p, 1080p with the 720p, etc).

Fig.7 Scatterplots of cross-over QPs.
15 20 25 30 35 40 45
QP
4K
15
20
25
30
35
40
45
QP
low
FHD
PCC: .9917
SROCC: .9888
20 25 30 35 40
QPhigh
FHD
20
25
30
35
40
QP
HD
PCC: .9817
SROCC: .9538
This relation can be used to improve cross-
over QP predictions.

Content
Features
Extraction
Machine
Learning-based
Regression
Testing Videos @
Native Spatial
Resolution
Spatio-temporal
Features of
Testing Videos
Video
CodecBitrate of
Cross-over
Points
RQ Convex Hull
Fitting
Ground-truth -
RQ Convex Hull
Training Videos @
Native Resolution
Downscaling
Resolution
Training Videos @ all considered
resolutions
Training
Videos Cross-
over QPs
Training Videos @
Native Resolution
Spatio-temporal Features of Training
Videos
Training Process
Testing Process
Upscaling
Resolution
Decoded Training
Videos @ all
considered
resolutions
Upscaled Training Videos
@ Native Resolution
Quality
Metrics
Computation
Upscaled
Decoded Training
Videos @ Native
Resolution
Decoded Testing
Videos
@ Cross-over QPs
Upscaled Decoded
Testing Videos @ Cross-over
QPs
Quality Metric Values for
Training Videos
Quality Metric Values for Testing
Videos at Cross-over Points
Testing Videos @ Native Spatial
Resolution
Predicted Cross-
over QPs per
Resolution
Predicted
BitrateLadder • RQ Convex Hull
Eq.
• Rate-QP Eq.
• Resolution
Switching Rate
points
Fig.8 Proposed method.

Fig.9 RQ convex hulls (blue: 2160p, red: 1080p, yellow: 720p. purple: 540p, green: 480p).

We ﬁtted the convex hull in a 3rd order polynomial.
This means that after determining the cross-over QPs, we need four encodes in order to determine
the polynomial parameters.
Then, we can sample the convex hull and build the bitrate ladder.
Table 3 Fitted Models.

17 18 19 20 21 22 23 24 25
log2(Bitrate)
20
30
40
50
60
70
80
90
100
VMAF
17 18 19 20 21 22 23 24 25
log2(BitRate)
20
25
30
35
40
45
50
55
PSNR(dB)
Fig.10 PSNR-Rate Ladder Fig.11 VMAF-Rate Ladder
RL,i ≃ 2RL,i−1 or log(RL,i) ≃ 1 + log(RL,i−1) , where RL,i ∈ (Rmin, Rmax)
QL,i(RL,i) ≤ Qmax and
dQL,i
RL
> ϵ , where ϵ → 0
Building the bitrate ladder:
1. Determine the operational bitrate range;
2. Sample the bitrate:
3. Sample the quality:

IV. Results
0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
stdTC
std
20
25
30
35
40
45
QP
4K
Table 4 List of Features [5].
Fig.12 Example of content dependency of cross-over QPs.

IV. Results
Fig.1 Sample frames of a 100 4K dataset. Fig.13 Spatial and Temporal Information of the dataset.

IV. Results
From PCS2019 paper[6]: We have tested the proposed framework with HM16.20, considering the resolutions
{2160p,1080p,720p}.
Lanczos-3 ﬁlter (ffmpeg implementation) was used for the spatial down/up-sampling.
We compare our method against two state-of-the-art solutions:
• Brute force method: we performed encodings with a QP step equal to 1. The brute force method theoretically
creates the optimal convex hull. This is considered our ground truth.
• Interpolation-based method: 7 encodings per resolution (using equidistant QPs to cover the range) and by
using a piece-wise cubic Hermite interpolation for the in-between QPs. This method of course results in
constructing a suboptimal convex hull, but it can provide a good approximation of it, while signiﬁcantly
reducing the number of pre-encodes.

IV. Results
We applied feature selection, and particularly Recursive Feature Elimination on the set of spatio-temporal
features.
We perform a sequential prediction of the QPs starting from the higher resolution:
• For the QP4K prediction, we only relied on spatio-temporal features.
• For the rest of the predictions, we made use of the identiﬁed relations and considered the previously predicted QPs (of the
highest resolutions) as features.
We have tested various regression methods, such as SVMs with different kernels, RFs, etc, but GPs were the best
performing models.
To avoid overﬁtting, we performed a 10-fold cross-validation.

IV. Results
15 20 25 30 35 40
True QP4K
15
20
25
30
35
40
PredictedQP4K
20 25 30 35 40
True QPhigh
FHD
20
22
24
26
28
30
32
34
36
38
40
PredictedQPhigh
FHD
14 16 18 20 22 24 26 28 30 32 34 36
True QPlow
HD
14
16
18
20
22
24
26
28
30
32
34
36
PredictedQPlow
HD
Fig.14 Predicted cross QP 4K
Fig.15 Predicted cross QP FHD high.
Fig.16 Predicted cross QP HD
Table 5 Results on cross-over QPs prediction.

IV. Results
The different distributions
are due to the different
reference convex hulls.
Fig.17 BDRate Histogram. Fig.18 BDPSNR Histogram.
Most outliers refer to sequences
that do not comply with the
hypothesis that the RQs are
intersecting in a resolution-
monotonic manner.

IV. Results
0 5 10 15
Bitrate (kbps) 104
31
32
33
34
35
36
37
38
39
PSNR(dB)
4K - 2160p
FHD - 1080p
HD - 720p
Convex Hull
0 0.5 1 1.5 2
Bitrate (kbps) 105
30
32
34
36
38
40
PSNR(dB)
campfirepartyg
op1 - BDRate:0.18364 , BDPSNR:-0.0040022
Ground Truth Convex Hull
Predicted Convex Hull
BDRate=0.18%
BDPSNR=-0.004dB
0 1 2 3 4 5
Bitrate (kbps) 10
4
37
37.5
38
38.5
39
39.5
40
PSNR(dB)
4K - 2160p
HD - 1080p
SD - 720p
Convex Hull
0 1 2 3 4 5 6 7
Bitrate (kbps) 10
4
37.5
38
38.5
39
39.5
40
PSNR(dB)
barsceneg
op1 - BDRate:2.0087 , BDPSNR:-0.0091561
Ground Truth Convex Hull
Predicted Convex Hull
BDRate=2.009%
BDPSNR=-0.009dB
Fig.19 Examples of results.

IV. Results
94.2% fewer encodings compared to the brute
force method and 80.95% compared to the
interpolation-based method.
Proposed method overhead: the average feature
extraction time for a sequence at 4K resolution to
the average 4K encoding time for a sequence at
QP=27 is 0.18.
Table 6 Comparison of the number of encodes required per method.

V. Conclusion and Future Work
Conclusions:
We proposed a method that can predict the bitrate ladders of the considered resolutions based on spatio-temporal
features extracted from the uncompressed videos at their native resolution and with a few video encodings (two
encodes per RQ intersecting points).
The ﬁrst results are promising compared to the ground truth, while requiring 94.2% and 81% fewer pre-encodes
compared to the brute force method and the interpolation- based method, respectively.
Future Work:
Our focus will be on validating the presented method across different codecs.
We will also work on identifying cross-codecs optimization of bitrate ladders.

References
1. “Global Mobile Data Traffic Forecast Update 2017-2022”, White Paper, Cisco, 2018.
2. E. S. Yuan, “A message to our users”, https://blog.zoom.us/a-message-to-our-users/
3. J. De Cock, Z. Li, M. Manohara, and A. Aaron, “Complexity-based consistent quality encoding in the Cloud”, IEEE ICIP 2016.
4. J. Sole, L. Guo, A. Norkin, M. Afonso, K. Swanson, and A. Aaron, “Performance comparison of video coding standards: an 
adaptive streaming perspective,” https://medium.com/netflix-techblog/performance- comparison- of- video- coding- standards- an- adaptive- streaming-
perspective- d45d0183ca95, 2018.
5.A. Katsenou, M. Afonso, D. Agrafiotis, and D. R. Bull, “Predicting Video Rate-Distortion Curves using Textural Features,” in PCS 2016.
6. A. V. Katsenou, J. Sole, and D. R. Bull, “Content-gnostic Bitrate Ladder Prediction for Adaptive Video Streaming,” in PCS 2019.

Thanks to
Dr Joel Sole
Dr Mariana Afonso
Prof David Bull

Content-gnostic Bitrate Ladder Prediction for Adaptive Video Streaming

Recomendados

Recomendados

Mais conteúdo relacionado

Mais de Förderverein Technische Fakultät

Mais de Förderverein Technische Fakultät (20)

Último

Último (20)

Content-gnostic Bitrate Ladder Prediction for Adaptive Video Streaming