In the latest years, we have witnessed a growing number of media transmitted and stored on computers and mobile devices. For this reason, there is an actual need to employ smart compression algorithms to reduce the size of our media files. However, such techniques are often responsible for severe reduction of user perceived quality. In this talk we present several approaches we have developed to restore degraded images and videos to match their original quality, making use of Generative Adversarial Networks. The aim of the talk is to highlight the main features of our research work, including the advantages of our solution, the current challenges and the possible directions for future improvements.
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Â
Advances in Visual Quality Restoration with Generative Adversarial Networks
1. Advances in Visual Quality Restoration
with Generative Adversarial Networks
Leonardo Galteri - PhD
University of Florence, MICC
2.
3. *Amazon Cloud Outbound Traffic
-1,125,000 $
Why does this happens? (in 2019âŠ)
âą First Episode of Season 2 had 15M viewers
âą Stream this at reasonable quality at a cost of
0,020 $/GB*
4. Working on the Encoder
ICPRâ18
âą Adaptive video coding approach
5. Working on the Encoder
ICPRâ18
X.264
âą Adaptive video coding approach
7. Improving Compressed Images
Given an uncompressed frame đ„đ»đ
đ„đżđ = đ(đ„đ»đ; đ)
We want to learn a function
đș(đ„đżđ) â đâ1(đ„đ»đ; đ)
where đ are codec parameters.
đ„đżđ đș(đ„đżđ)
ICCVâ17
8. A Deep Residual Network for
Reconstruction
âą We use strided convolution to reduce feature map size.
âą We avoid checkerboard artifacts with NN upsampling followed by 2
more convolutional layers
âą Trained on patches 128x128 pixel extracted from MS-COCO. ICCVâ17
9. Limitations of MSE and SSIM Losses
JPEG SSIM Loss Original
âą SSIM and MSE losses are able to reduce effectively compression artefacts.
âą However, reconstructions appear blurry and there are many missing details with respect to the
uncompressed version of the image.
ICCVâ17
11. The Sub-Patch Discriminator
âą 128 x 128 patches are split into smaller 16x16 sub-patches, concatenated with correspondent
input sub-patches and processed by the discriminator.
âą The discriminator is trained with a binary cross-entropy loss over all the sub-patches.
ICCVâ17
12. Generator Loss
ICCVâ17
âą The new objective of Generator is:
âđș = âđ + đâđŽđ·đ
where: âđ = đ đŒđ»đ
đ„,đŠ â đ đŒđ đ
đ„,đŠ
2
is called perceptual loss, a MSE loss in VGG19 feature space, and:
âđŽđ·đ = â log đ· đŒđ đ|đŒđżđ
is the adversarial loss, which measures how good is the
fooling the discriminator.
13. Effect of Sub-Patch Discriminator
âą This technique allows to reduce the mosquito noise present in reconstructions.
W/o Sub-Patch With Sub-Patch ICCVâ17
14. Predicting QF
âą We train a CNN regressor, named QF predictor, to drive a finite Ensemble of Generators
âą We use the most appropriateGenerator to restore the image
QF
Predictor
đș(đ = đ0; đ„)
đș(đ = đđ; đ„)
âŠ
âŠ
Model
Switcher
TMMâ19
đș(đ = đđ; đ„)
đ„
17. Subjective Evaluation
âą DSIS setup test image compared to original
and similarity scored in 0-100
âą We compare SSIM Loss vs Adversarial
Training using the same Generator
architecture.
âą Subjects have a strong preference forGAN
restored images over SSIM ones.
Method MOS Std. Dev.
SSIM 49.51 22.72
GAN 68.32 20.75
TMMâ19
18. Object Detection Results
Class
GAN
AP gain
@QF 20
Dog +18.6
Cat +16.6
Sheep +14.3
Cow +12.5
âą Use an object detector, Faster R-CNN to assess the visual quality of restored images
âą Compute mAP on PASCALVOC using several JPEG quality factors and the correspondent
reconstructions.
âą Large increase in detector
performance
âą Largest gainers are
deformable âfurryâ objects
such as animals
TMMâ19
19. Enters MobileNetV2
âą MobileNetV2 was originally proposed to reduce computational burden of CNNs
âą Depthwise separable convolutions drop-in replacement for convolutional layers
âą Inverted residual blocks better propagate gradients across layers but more memory
efficient
Sandler CVPRâ18
Residual Block Inverted Residual Block
20. A Deep Residual Network for Reconstruction
âą Keep the Generator identical except for the Inverted Residual Blocks!
âą ï»ż
Train on the small DIV2K dataset
âą Augmentation: resizing 256, 384 and 512; random crops of 224x224;
mirror flipping.
Inverted
Residual
Block
Inverted
Residual
Block
CAIPâ19
26. No-Reference Evaluation
âą According to NIQE and BRISQUE value GAN images as âmore naturalâ the the
original ones!
âą VIIDEO is likely penalizing reconstruction for lack of temporal coherence
CAIPâ19
27. âą GANs are well known to work well when the distribution is simpler
âą Faces are possibly the most interesting object we are willing to transmit
âą Here what we can do with a severe degradation and a specialized GAN
Specialized Artifact Removal
ACM MMâ19 Best Demo
28. Specialized Artifact Removal
ACM MMâ19 Best Demo
âą GANs are well known to work well when the distribution is simpler
âą Faces are possibly the most interesting object we are willing to transmit
âą Here what we can do with a severe degradation and a specialized GAN
40. Evaluation using Language
âą Task: âevaluate the weighted combination of all of the visually significant
attributes of an imageâ
Quality ACM MM ASIAâ21
Best Paper Award
41. A statue of a woman
wearing a christmas tie
A brown and white dog
wearing a tie
A brown and white dog
wearing a red tie
A statue of a woman
wearing a christmas tie
A brown and white dog
wearing a tie
A brown and white dog
wearing a red tie
A statue of a woman
wearing a christmas tie
A brown and white dog
wearing a tie
A brown and white dog
wearing a red tie
JPEG GAN HQ
Evaluation using Language
Use image captioning algorithm to evaluate the fine semantics of the image
ACM MM ASIAâ21
Best Paper Award
44. Assesment Methodolgy
ACM MM ASIAâ21
Best Paper Award
Language Model
Language
Metric
0.752
Language Model Pseudo Ground Truth Caption
Predicted Caption
Input Image
Reference
Image
âą âGTâ caption can be generated from the reference imagein case captions
are not available (the usual case)
45. Evaluating Enhanced Images
ACM MM ASIAâ21
Best Paper Award
âą Our approach scores higher GAN
reconstructions (REC) of JPEG
compressed images across a
wide range of QFs
âą Results are consistent for all
captioning metrics across all
qualities/enhancements
46. Changing the Captioning Model
ACM MM ASIAâ21
Best Paper Award
âą Using a better captioning
model increases the
correlation with MOS
âą Visual features are shared
among [1] and [2]
[1] M. Cornia, M. Stefanini, L. Baraldi, and R. Cucchiara. Meshed-memory transformer for image
captioning. In Proc. of CVPR 2020
[2] P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang. Bottom-up and top-
down attention for image captioning and visual question answering. In Proc. of CVPR 2018
[1]
47. JPEG GAN
ï»ż
A man riding a wave on a surfboard in the ocea
ï»ż
A couple of people sitting next to a christmas tree.
Qualitative Analysis
ACM MM ASIAâ21
Best Paper Award
48. Whatâs Next for Restoration?
âą Innovative technologies for restoration replacingGANs
âą Smaller architectures and new ways to train them
âą Blind restoration
49. References
L. Galteri, M. Bertini , L. Seidenari, A. Del Bimbo, Video Compression for Object Detection Algorithmsâ, ICPR 2018
L. Galteri, L. Seidenari, M. Bertini, A. Del Bimbo, 'Deep Generative Adversarial Compression Artifact Removalâ,
IEEE ICCV 2017
L. Galteri, L. Seidenari, M. Bertini, A. Del Bimbo, âDeep Universal Generative Adversarial Compression Artifact
Removalâ, IEEE TMM 2019
L. Galteri, L. Seidenari, M. Bertini, A. Del Bimbo, âTowards Real-Time Image Enhancement GANsâ, CAIP 2019
L. Galteri, L. Seidenari, M. Bertini, T. Uricchio, A. Del Bimbo, âFast Video Quality Enchancement Using GANsâ, ACM
MM Best Demo 2019
L. Galteri, M. Bertini, L. Seidenari, T. Uricchio, A. Del Bimbo, âIncreasing Video Perceptual Quality with GANs and
Semantic Codingâ, ACM MM 2020
L. Galteri, L. Seidenari, P. Bongini, M. Bertini, A. Del Bimbo, âLanguage Based Image Quality Enhancementâ, ACM
MM Asia Best Paper Award 2021