FAUST: Fast Per-Scene Encoding Using Entropy-Based Scene Detection and Machine Learning

FAUST: Fast Per-Scene Encoding Using Entropy-Based Scene
Detection and Machine Learning
Anatoliy Zabrovskiy, Prateek Agrawal, Christian Timmerer, Radu Prodan
The 30th IEEE International Conference
of the Open Innovations Association
FRUCT
October 27-29, 2021 Oulu, Finland.

FAUST Approach. Goal
Goal:
- Develop Fast Approach for Per-Scene Encoding using Scene Detection and
Machine Learning (FAUST)
FAUST approach is based on four phases:
1) Fast Entropy based scene detection;
2) ANN based YPSNR quality prediction;
3) Convex Hull calculation and interpolation;
4) Per-scene encoding optimization.
2
1. Scene
detection
2. Quality
prediction
3. Convex Hull,
Interpolation
4. Per-scene
optimization

3
Fast Entropy based scene detection (phase 1)
The fast entropy based scene detection method includes the following steps:
1. The input video is encoded to low bitrate and low resolution (100 Kbps, 144p) using FFmpeg and x264
ultrafast encoding preset.
2. The Temporal Information (TI) and Spatial Information (SI) metrics are calculated for the encoded video
sequence.
3. Then the FAUST approach detects the scenes using the TI metrics calculated on the previous step.
Input
Video
- encoding preset: ultrafast
- encoding bitrate: 100kbps
- resolution: 144p
FFmpeg
Encoded
Video
Calculating TI
and SI metrics
SITI program
- avg. TI entropy (per second)
- avg. TI (per second)
- scene detection
FAUST program
Scenes
Difference
threshold

4
ANN based YPSNR quality prediction (phase 2)
Scene 1
Scene 2
Scene 3
Scene n
Middle
segment
Middle
segment
Middle
segment
Middle
segment
● video height;
● video width;
● encoding bitrate;
● encoding preset;
● SI (for 144p video);
● TI (for 144p video);
● input_segment_size (in bytes);
● 144p_segment_size (in bytes);
● segment duration;
● fps.
Predicting YPSNR
using ANN
Segment
no.
Bitrate Resolution YPSNR
1 1000 432p 35
1 2300 720p 37
1 4300 1080p 38
1 7000 1440p 40
1 12000 2160p 43
... ... ... ...
For each middle segment for all scenes, the ANN
predicts YPSNR values for all possible
combinations of resolutions and bitrates from the
static bitrate ladder.
2 sec.
2 sec.
2 sec.
2 sec.
Static/classic
bitrate ladder

5
ANN based YPSNR quality prediction. Results
Based on the results:
The developed ANN model on testing data is able to predict the
YPSNR metric with low mean absolute (MAE) and mean square
errors (MSE) of 0.15 and 0.08, respectively.
The results with various possible combinations of ANN input
parameters are presented in the table.

6
Convex Hull calculation and interpolation (phase 3)
Fig. 1. Convex Hull. Tears of Steel video, scene 4. Fig. 2. Interpolated Convex Hull. Tears of
Steel video, scene 4.

7
Per-scene encoding optimization (phase 4)
Steps:
● For each scene the FAUST approach selects all
points on the interpolated Convex Hull which
belong to YPSNR range [30 dB, 45 dB].
● It uses the 1.5 YPSNR spacing to calculate the
number of bitrate/resolution pairs for each
scene.
● For the selected YPSNR points, the FAUST
approach ﬁnds the appropriate bitrates and
resolutions using interpolated Convex Hulls.

Results and analysis. Fast scene detection
8
The total scene detection time shows that the FAUST approach detects video scenes
almost three times faster than FFmpeg and more than three times faster than
PySceneDetect tool.

Performance analysis with classic bitrate ladder
9
The proposed FAUST approach
improves both the bitrate reduction
and the overall video quality.

State-of-the-art comparison
10
● The video scene detection time using
our FAUST approach (9.9 s) is more
than three times faster than the
MiPSO framework algorithm, i.e. 34 s.
● With our FAUST approach, there is no
need to run multiple tests (or trial)
encoding to build a convex hull.

11
Conclusions
The key advantages of the FAUST approach:
● A fast scene detection using entropy based method.
● No need to run test encodings to build Convex Hulls. The developed
ANN model on testing data is able to predict the YPSNR metric with
low mean absolute (MAE) and mean square errors (MSE) of 0.15 and
0.08, respectively.
● Per-scene bitrate ladders.

Thank you!
12
Anatoliy Zabrovskiy
anatoliy.zabrovskiy@aau.at

FAUST: Fast Per-Scene Encoding Using Entropy-Based Scene Detection and Machine Learning

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a FAUST: Fast Per-Scene Encoding Using Entropy-Based Scene Detection and Machine Learning

Semelhante a FAUST: Fast Per-Scene Encoding Using Entropy-Based Scene Detection and Machine Learning (20)

Mais de Alpen-Adria-Universität

Mais de Alpen-Adria-Universität (20)

Último

Último (20)

FAUST: Fast Per-Scene Encoding Using Entropy-Based Scene Detection and Machine Learning