SlideShare uma empresa Scribd logo
1 de 49
Advances in Visual Quality Restoration
with Generative Adversarial Networks
Leonardo Galteri - PhD
University of Florence, MICC
*Amazon Cloud Outbound Traffic
-1,125,000 $
Why does this happens? (in 2019
)
‱ First Episode of Season 2 had 15M viewers
‱ Stream this at reasonable quality at a cost of
0,020 $/GB*
Working on the Encoder
ICPR’18
‱ Adaptive video coding approach
Working on the Encoder
ICPR’18
X.264
‱ Adaptive video coding approach
Deep CNN
Is there Another Way?
Improving Compressed Images
Given an uncompressed frame đ‘„đ»đ‘„
đ‘„đżđ‘„ = 𝒞(đ‘„đ»đ‘„; 𝜃)
We want to learn a function
đș(đ‘„đżđ‘„) ≈ 𝒞−1(đ‘„đ»đ‘„; 𝜃)
where 𝜃 are codec parameters.
đ‘„đżđ‘„ đș(đ‘„đżđ‘„)
ICCV’17
A Deep Residual Network for
Reconstruction
‱ We use strided convolution to reduce feature map size.
‱ We avoid checkerboard artifacts with NN upsampling followed by 2
more convolutional layers
‱ Trained on patches 128x128 pixel extracted from MS-COCO. ICCV’17
Limitations of MSE and SSIM Losses
JPEG SSIM Loss Original
‱ SSIM and MSE losses are able to reduce effectively compression artefacts.
‱ However, reconstructions appear blurry and there are many missing details with respect to the
uncompressed version of the image.
ICCV’17
Generative Adversarial Network
G
Generator
D
Discriminator
Image
REAL or
FAKE?
High Quality
Images (REAL)
Low Quality
Images
Restored
Images (FAKE)
D is trained to tell apart
real from reconstructed
images
G is trained to fool D
ICCV’17
The Sub-Patch Discriminator
‱ 128 x 128 patches are split into smaller 16x16 sub-patches, concatenated with correspondent
input sub-patches and processed by the discriminator.
‱ The discriminator is trained with a binary cross-entropy loss over all the sub-patches.
ICCV’17
Generator Loss
ICCV’17
‱ The new objective of Generator is:
ℒđș = ℒ𝑃 + đœ†â„’đŽđ·đ‘‰
where: ℒ𝑃 = 𝜙 đŒđ»đ‘„
đ‘„,𝑩 − 𝜙 đŒđ‘…đ‘„
đ‘„,𝑩
2
is called perceptual loss, a MSE loss in VGG19 feature space, and:
â„’đŽđ·đ‘‰ = − log đ· đŒđ‘…đ‘„|đŒđżđ‘„
is the adversarial loss, which measures how good is the
fooling the discriminator.
Effect of Sub-Patch Discriminator
‱ This technique allows to reduce the mosquito noise present in reconstructions.
W/o Sub-Patch With Sub-Patch ICCV’17
Predicting QF
‱ We train a CNN regressor, named QF predictor, to drive a finite Ensemble of Generators
‱ We use the most appropriateGenerator to restore the image
QF
Predictor
đș(𝜃 = 𝜃0; đ‘„)
đș(𝜃 = 𝜃𝑁; đ‘„)




Model
Switcher
TMM’19
đș(𝜃 = 𝜃𝑛; đ‘„)
đ‘„
Quality Prediction Results
TMM’19
Qualitative Results
JPEG AR-CNN GAN ORIGINAL
TMM’19
Subjective Evaluation
‱ DSIS setup test image compared to original
and similarity scored in 0-100
‱ We compare SSIM Loss vs Adversarial
Training using the same Generator
architecture.
‱ Subjects have a strong preference forGAN
restored images over SSIM ones.
Method MOS Std. Dev.
SSIM 49.51 22.72
GAN 68.32 20.75
TMM’19
Object Detection Results
Class
GAN
AP gain
@QF 20
Dog +18.6
Cat +16.6
Sheep +14.3
Cow +12.5
‱ Use an object detector, Faster R-CNN to assess the visual quality of restored images
‱ Compute mAP on PASCALVOC using several JPEG quality factors and the correspondent
reconstructions.
‱ Large increase in detector
performance
‱ Largest gainers are
deformable ’furry’ objects
such as animals
TMM’19
Enters MobileNetV2
‱ MobileNetV2 was originally proposed to reduce computational burden of CNNs
‱ Depthwise separable convolutions drop-in replacement for convolutional layers
‱ Inverted residual blocks better propagate gradients across layers but more memory
efficient
Sandler CVPR’18
Residual Block Inverted Residual Block
A Deep Residual Network for Reconstruction
‱ Keep the Generator identical except for the Inverted Residual Blocks!
‱ ï»ż
Train on the small DIV2K dataset
‱ Augmentation: resizing 256, 384 and 512; random crops of 224x224;
mirror flipping.
Inverted
Residual
Block
Inverted
Residual
Block
CAIP’19
Qualitative Results
ICCV’17 RAW
Bit/Pixel 0.146
FPS 4 onTitan Xp GPU
Bit/Pixel 12
FPS -
All videos 720p
CAIP’19
Qualitative Results
Bit/Pixel 0.146
FPS 42 onTitan Xp GPU
Very Fast RAW
Bit/Pixel 12
FPS -
CAIP’19
All videos 720p
Qualitative Results
Bit/Pixel 0.146
FPS 20 onTitan Xp GPU
Bit/Pixel 12
FPS -
Fast RAW
CAIP’19
All videos 720p
RAW
Bit/Pixel 0.0570
FPS 3 onV100 GPU
Bit/Pixel 12
FPS -
Wave.ONE
Qualitative Results
CAIP’19
All videos 720p
Qualitative Results
Bit/Pixel 0.0570
FPS 1,6 onTitan Xp GPU*
Bit/Pixel 0.146 (x3)
FPS 20 onTitan Xp GPU (x12)
Fast
Wave.ONE
All videos 720p
CAIP’19
No-Reference Evaluation
‱ According to NIQE and BRISQUE value GAN images as ’more natural’ the the
original ones!
‱ VIIDEO is likely penalizing reconstruction for lack of temporal coherence
CAIP’19
‱ GANs are well known to work well when the distribution is simpler
‱ Faces are possibly the most interesting object we are willing to transmit
‱ Here what we can do with a severe degradation and a specialized GAN
Specialized Artifact Removal
ACM MM’19 Best Demo
Specialized Artifact Removal
ACM MM’19 Best Demo
‱ GANs are well known to work well when the distribution is simpler
‱ Faces are possibly the most interesting object we are willing to transmit
‱ Here what we can do with a severe degradation and a specialized GAN
ACM MM’19 Best Demo
Original frame
Semantic mask
Transmitter
Semantic Coding + GAN
ACM MM’20
Semantic Coding + GAN
ACM MM’20
Received frame
Receiver
Semantic Segmentation
ACM MM’20
‱ Use BiSeNet to label each pixel
detected as face or neck as
foreground, the remainder as
background.
Semantic Segmentation
ACM MM’20
+
+
Generator Loss
𝑀
𝑀
face
background
Semantic Segmentation
ACM MM’20
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0 5 10 15 20 25 30 35 40 45
LPIPS
Our Method Baseline
0
5
10
15
20
25
30
35
40
0 5 10 15 20 25 30 35 40 45
BRISQUE
Our Method Baseline
~ 30% improvement ~ 55% improvement
RESULTS – QUALITATIVE 1 min
COMPRESSED
RESULTS – QUALITATIVE 1 min
COMPRESSED +
SALIENCY
RESULTS – QUALITATIVE 1 min
COMPRESSED +
SALIENCY + GAN
Evaluation using Language
‱ Task: “evaluate the weighted combination of all of the visually significant
attributes of an image”
Quality ACM MM ASIA’21
Best Paper Award
A statue of a woman
wearing a christmas tie
A brown and white dog
wearing a tie
A brown and white dog
wearing a red tie
A statue of a woman
wearing a christmas tie
A brown and white dog
wearing a tie
A brown and white dog
wearing a red tie
A statue of a woman
wearing a christmas tie
A brown and white dog
wearing a tie
A brown and white dog
wearing a red tie
JPEG GAN HQ
Evaluation using Language
Use image captioning algorithm to evaluate the fine semantics of the image
ACM MM ASIA’21
Best Paper Award
Assesment Methodolgy
ACM MM ASIA’21
Best Paper Award
Assesment Methodolgy
ACM MM ASIA’21
Best Paper Award
Assesment Methodolgy
ACM MM ASIA’21
Best Paper Award
Language Model
Language
Metric
0.752
Language Model Pseudo Ground Truth Caption
Predicted Caption
Input Image
Reference
Image
‱ “GT” caption can be generated from the reference imagein case captions
are not available (the usual case)
Evaluating Enhanced Images
ACM MM ASIA’21
Best Paper Award
‱ Our approach scores higher GAN
reconstructions (REC) of JPEG
compressed images across a
wide range of QFs
‱ Results are consistent for all
captioning metrics across all
qualities/enhancements
Changing the Captioning Model
ACM MM ASIA’21
Best Paper Award
‱ Using a better captioning
model increases the
correlation with MOS
‱ Visual features are shared
among [1] and [2]
[1] M. Cornia, M. Stefanini, L. Baraldi, and R. Cucchiara. Meshed-memory transformer for image
captioning. In Proc. of CVPR 2020
[2] P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang. Bottom-up and top-
down attention for image captioning and visual question answering. In Proc. of CVPR 2018
[1]
JPEG GAN
ï»ż
A man riding a wave on a surfboard in the ocea
ï»ż
A couple of people sitting next to a christmas tree.
Qualitative Analysis
ACM MM ASIA’21
Best Paper Award
What’s Next for Restoration?
‱ Innovative technologies for restoration replacingGANs
‱ Smaller architectures and new ways to train them
‱ Blind restoration
References
L. Galteri, M. Bertini , L. Seidenari, A. Del Bimbo, Video Compression for Object Detection Algorithms’, ICPR 2018
L. Galteri, L. Seidenari, M. Bertini, A. Del Bimbo, 'Deep Generative Adversarial Compression Artifact Removal’,
IEEE ICCV 2017
L. Galteri, L. Seidenari, M. Bertini, A. Del Bimbo, ‘Deep Universal Generative Adversarial Compression Artifact
Removal’, IEEE TMM 2019
L. Galteri, L. Seidenari, M. Bertini, A. Del Bimbo, ‘Towards Real-Time Image Enhancement GANs’, CAIP 2019
L. Galteri, L. Seidenari, M. Bertini, T. Uricchio, A. Del Bimbo, ‘Fast Video Quality Enchancement Using GANs’, ACM
MM Best Demo 2019
L. Galteri, M. Bertini, L. Seidenari, T. Uricchio, A. Del Bimbo, ‘Increasing Video Perceptual Quality with GANs and
Semantic Coding’, ACM MM 2020
L. Galteri, L. Seidenari, P. Bongini, M. Bertini, A. Del Bimbo, ‘Language Based Image Quality Enhancement’, ACM
MM Asia Best Paper Award 2021

Mais conteĂșdo relacionado

Semelhante a Advances in Visual Quality Restoration with Generative Adversarial Networks

Single Image Super Resolution using Fuzzy Deep Convolutional Networks
Single Image Super Resolution using Fuzzy Deep Convolutional NetworksSingle Image Super Resolution using Fuzzy Deep Convolutional Networks
Single Image Super Resolution using Fuzzy Deep Convolutional Networks
Greeshma M.S.R
 
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)
Universitat PolitĂšcnica de Catalunya
 
Comparative Analysis of image Enhancement Techniques on Real Time images
Comparative Analysis of image Enhancement Techniques on Real Time imagesComparative Analysis of image Enhancement Techniques on Real Time images
Comparative Analysis of image Enhancement Techniques on Real Time images
IJSRED
 
Alex_Vlachos_Advanced_VR_Rendering_Performance_GDC2016
Alex_Vlachos_Advanced_VR_Rendering_Performance_GDC2016Alex_Vlachos_Advanced_VR_Rendering_Performance_GDC2016
Alex_Vlachos_Advanced_VR_Rendering_Performance_GDC2016
Alex Vlachos
 

Semelhante a Advances in Visual Quality Restoration with Generative Adversarial Networks (20)

Single Image Super Resolution using Fuzzy Deep Convolutional Networks
Single Image Super Resolution using Fuzzy Deep Convolutional NetworksSingle Image Super Resolution using Fuzzy Deep Convolutional Networks
Single Image Super Resolution using Fuzzy Deep Convolutional Networks
 
DALL-E.pdf
DALL-E.pdfDALL-E.pdf
DALL-E.pdf
 
Deep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent spaceDeep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent space
 
Detection of surface flaws in a pipe using vision based technique
Detection of surface flaws in a pipe using vision based techniqueDetection of surface flaws in a pipe using vision based technique
Detection of surface flaws in a pipe using vision based technique
 
Comparitive Analysis for Pre-Processing of Images and Videos using Histogram ...
Comparitive Analysis for Pre-Processing of Images and Videos using Histogram ...Comparitive Analysis for Pre-Processing of Images and Videos using Histogram ...
Comparitive Analysis for Pre-Processing of Images and Videos using Histogram ...
 
IMAGE PROCESSING
IMAGE PROCESSINGIMAGE PROCESSING
IMAGE PROCESSING
 
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
 
Post processing of jpeg image using MLP
Post processing of jpeg image using MLP Post processing of jpeg image using MLP
Post processing of jpeg image using MLP
 
Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution Overview
 
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)
 
Enhance and quantify Microstructure using Machine Learning
Enhance and quantify Microstructure using Machine LearningEnhance and quantify Microstructure using Machine Learning
Enhance and quantify Microstructure using Machine Learning
 
Comparative Analysis of image Enhancement Techniques on Real Time images
Comparative Analysis of image Enhancement Techniques on Real Time imagesComparative Analysis of image Enhancement Techniques on Real Time images
Comparative Analysis of image Enhancement Techniques on Real Time images
 
Adaptive Image Compression Using Saliency and KAZE Features
Adaptive Image Compression Using Saliency and KAZE FeaturesAdaptive Image Compression Using Saliency and KAZE Features
Adaptive Image Compression Using Saliency and KAZE Features
 
Ad24210214
Ad24210214Ad24210214
Ad24210214
 
Alex_Vlachos_Advanced_VR_Rendering_Performance_GDC2016
Alex_Vlachos_Advanced_VR_Rendering_Performance_GDC2016Alex_Vlachos_Advanced_VR_Rendering_Performance_GDC2016
Alex_Vlachos_Advanced_VR_Rendering_Performance_GDC2016
 
Interactive Control over Temporal Consistency while Stylizing Video Streams
Interactive Control over Temporal Consistency while Stylizing Video StreamsInteractive Control over Temporal Consistency while Stylizing Video Streams
Interactive Control over Temporal Consistency while Stylizing Video Streams
 
Mmclass5b
Mmclass5bMmclass5b
Mmclass5b
 
Depth estimation do we need to throw old things away
Depth estimation do we need to throw old things awayDepth estimation do we need to throw old things away
Depth estimation do we need to throw old things away
 
Final image processing
Final image processingFinal image processing
Final image processing
 
Photo echance. Problems. Solutions. Ideas
Photo echance. Problems. Solutions. Ideas Photo echance. Problems. Solutions. Ideas
Photo echance. Problems. Solutions. Ideas
 

Mais de Förderverein Technische FakultÀt

The Digital Transformation of Education: A Hyper-Disruptive Era through Block...
The Digital Transformation of Education: A Hyper-Disruptive Era through Block...The Digital Transformation of Education: A Hyper-Disruptive Era through Block...
The Digital Transformation of Education: A Hyper-Disruptive Era through Block...
Förderverein Technische FakultÀt
 
Don't Treat the Symptom, Find the Cause!.pptx
Don't Treat the Symptom, Find the Cause!.pptxDon't Treat the Symptom, Find the Cause!.pptx
Don't Treat the Symptom, Find the Cause!.pptx
Förderverein Technische FakultÀt
 
The Computing Continuum.pdf
The Computing Continuum.pdfThe Computing Continuum.pdf
The Computing Continuum.pdf
Förderverein Technische FakultÀt
 

Mais de Förderverein Technische FakultÀt (20)

Supervisory control of business processes
Supervisory control of business processesSupervisory control of business processes
Supervisory control of business processes
 
The Digital Transformation of Education: A Hyper-Disruptive Era through Block...
The Digital Transformation of Education: A Hyper-Disruptive Era through Block...The Digital Transformation of Education: A Hyper-Disruptive Era through Block...
The Digital Transformation of Education: A Hyper-Disruptive Era through Block...
 
A Game of Chess is Like a Swordfight.pdf
A Game of Chess is Like a Swordfight.pdfA Game of Chess is Like a Swordfight.pdf
A Game of Chess is Like a Swordfight.pdf
 
From Mind to Meta.pdf
From Mind to Meta.pdfFrom Mind to Meta.pdf
From Mind to Meta.pdf
 
Miniatures Design for Tabletop Games.pdf
Miniatures Design for Tabletop Games.pdfMiniatures Design for Tabletop Games.pdf
Miniatures Design for Tabletop Games.pdf
 
Distributed Systems in the Post-Moore Era.pptx
Distributed Systems in the Post-Moore Era.pptxDistributed Systems in the Post-Moore Era.pptx
Distributed Systems in the Post-Moore Era.pptx
 
Don't Treat the Symptom, Find the Cause!.pptx
Don't Treat the Symptom, Find the Cause!.pptxDon't Treat the Symptom, Find the Cause!.pptx
Don't Treat the Symptom, Find the Cause!.pptx
 
Engineering Serverless Workflow Applications in Federated FaaS.pdf
Engineering Serverless Workflow Applications in Federated FaaS.pdfEngineering Serverless Workflow Applications in Federated FaaS.pdf
Engineering Serverless Workflow Applications in Federated FaaS.pdf
 
The Role of Machine Learning in Fluid Network Control and Data Planes.pdf
The Role of Machine Learning in Fluid Network Control and Data Planes.pdfThe Role of Machine Learning in Fluid Network Control and Data Planes.pdf
The Role of Machine Learning in Fluid Network Control and Data Planes.pdf
 
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...
 
Towards a data driven identification of teaching patterns.pdf
Towards a data driven identification of teaching patterns.pdfTowards a data driven identification of teaching patterns.pdf
Towards a data driven identification of teaching patterns.pdf
 
Förderverein Technische FakultÀt.pptx
Förderverein Technische FakultÀt.pptxFörderverein Technische FakultÀt.pptx
Förderverein Technische FakultÀt.pptx
 
The Computing Continuum.pdf
The Computing Continuum.pdfThe Computing Continuum.pdf
The Computing Continuum.pdf
 
East-west oriented photovoltaic power systems: model, benefits and technical ...
East-west oriented photovoltaic power systems: model, benefits and technical ...East-west oriented photovoltaic power systems: model, benefits and technical ...
East-west oriented photovoltaic power systems: model, benefits and technical ...
 
Machine Learning in Finance via Randomization
Machine Learning in Finance via RandomizationMachine Learning in Finance via Randomization
Machine Learning in Finance via Randomization
 
IT does not stop
IT does not stopIT does not stop
IT does not stop
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Industriepraktikum_ UnterstĂŒtzung bei Projekten in der Automatisierung.pdf
Industriepraktikum_ UnterstĂŒtzung bei Projekten in der Automatisierung.pdfIndustriepraktikum_ UnterstĂŒtzung bei Projekten in der Automatisierung.pdf
Industriepraktikum_ UnterstĂŒtzung bei Projekten in der Automatisierung.pdf
 
Introduction to 5G from radio perspective
Introduction to 5G from radio perspectiveIntroduction to 5G from radio perspective
Introduction to 5G from radio perspective
 
Förderverein Technische FakultÀt
Förderverein Technische FakultÀt Förderverein Technische FakultÀt
Förderverein Technische FakultÀt
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Navi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

Advances in Visual Quality Restoration with Generative Adversarial Networks

  • 1. Advances in Visual Quality Restoration with Generative Adversarial Networks Leonardo Galteri - PhD University of Florence, MICC
  • 2.
  • 3. *Amazon Cloud Outbound Traffic -1,125,000 $ Why does this happens? (in 2019
) ‱ First Episode of Season 2 had 15M viewers ‱ Stream this at reasonable quality at a cost of 0,020 $/GB*
  • 4. Working on the Encoder ICPR’18 ‱ Adaptive video coding approach
  • 5. Working on the Encoder ICPR’18 X.264 ‱ Adaptive video coding approach
  • 6. Deep CNN Is there Another Way?
  • 7. Improving Compressed Images Given an uncompressed frame đ‘„đ»đ‘„ đ‘„đżđ‘„ = 𝒞(đ‘„đ»đ‘„; 𝜃) We want to learn a function đș(đ‘„đżđ‘„) ≈ 𝒞−1(đ‘„đ»đ‘„; 𝜃) where 𝜃 are codec parameters. đ‘„đżđ‘„ đș(đ‘„đżđ‘„) ICCV’17
  • 8. A Deep Residual Network for Reconstruction ‱ We use strided convolution to reduce feature map size. ‱ We avoid checkerboard artifacts with NN upsampling followed by 2 more convolutional layers ‱ Trained on patches 128x128 pixel extracted from MS-COCO. ICCV’17
  • 9. Limitations of MSE and SSIM Losses JPEG SSIM Loss Original ‱ SSIM and MSE losses are able to reduce effectively compression artefacts. ‱ However, reconstructions appear blurry and there are many missing details with respect to the uncompressed version of the image. ICCV’17
  • 10. Generative Adversarial Network G Generator D Discriminator Image REAL or FAKE? High Quality Images (REAL) Low Quality Images Restored Images (FAKE) D is trained to tell apart real from reconstructed images G is trained to fool D ICCV’17
  • 11. The Sub-Patch Discriminator ‱ 128 x 128 patches are split into smaller 16x16 sub-patches, concatenated with correspondent input sub-patches and processed by the discriminator. ‱ The discriminator is trained with a binary cross-entropy loss over all the sub-patches. ICCV’17
  • 12. Generator Loss ICCV’17 ‱ The new objective of Generator is: ℒđș = ℒ𝑃 + đœ†â„’đŽđ·đ‘‰ where: ℒ𝑃 = 𝜙 đŒđ»đ‘„ đ‘„,𝑩 − 𝜙 đŒđ‘…đ‘„ đ‘„,𝑩 2 is called perceptual loss, a MSE loss in VGG19 feature space, and: â„’đŽđ·đ‘‰ = − log đ· đŒđ‘…đ‘„|đŒđżđ‘„ is the adversarial loss, which measures how good is the fooling the discriminator.
  • 13. Effect of Sub-Patch Discriminator ‱ This technique allows to reduce the mosquito noise present in reconstructions. W/o Sub-Patch With Sub-Patch ICCV’17
  • 14. Predicting QF ‱ We train a CNN regressor, named QF predictor, to drive a finite Ensemble of Generators ‱ We use the most appropriateGenerator to restore the image QF Predictor đș(𝜃 = 𝜃0; đ‘„) đș(𝜃 = 𝜃𝑁; đ‘„) 
 
 Model Switcher TMM’19 đș(𝜃 = 𝜃𝑛; đ‘„) đ‘„
  • 16. Qualitative Results JPEG AR-CNN GAN ORIGINAL TMM’19
  • 17. Subjective Evaluation ‱ DSIS setup test image compared to original and similarity scored in 0-100 ‱ We compare SSIM Loss vs Adversarial Training using the same Generator architecture. ‱ Subjects have a strong preference forGAN restored images over SSIM ones. Method MOS Std. Dev. SSIM 49.51 22.72 GAN 68.32 20.75 TMM’19
  • 18. Object Detection Results Class GAN AP gain @QF 20 Dog +18.6 Cat +16.6 Sheep +14.3 Cow +12.5 ‱ Use an object detector, Faster R-CNN to assess the visual quality of restored images ‱ Compute mAP on PASCALVOC using several JPEG quality factors and the correspondent reconstructions. ‱ Large increase in detector performance ‱ Largest gainers are deformable ’furry’ objects such as animals TMM’19
  • 19. Enters MobileNetV2 ‱ MobileNetV2 was originally proposed to reduce computational burden of CNNs ‱ Depthwise separable convolutions drop-in replacement for convolutional layers ‱ Inverted residual blocks better propagate gradients across layers but more memory efficient Sandler CVPR’18 Residual Block Inverted Residual Block
  • 20. A Deep Residual Network for Reconstruction ‱ Keep the Generator identical except for the Inverted Residual Blocks! ‱ ï»ż Train on the small DIV2K dataset ‱ Augmentation: resizing 256, 384 and 512; random crops of 224x224; mirror flipping. Inverted Residual Block Inverted Residual Block CAIP’19
  • 21. Qualitative Results ICCV’17 RAW Bit/Pixel 0.146 FPS 4 onTitan Xp GPU Bit/Pixel 12 FPS - All videos 720p CAIP’19
  • 22. Qualitative Results Bit/Pixel 0.146 FPS 42 onTitan Xp GPU Very Fast RAW Bit/Pixel 12 FPS - CAIP’19 All videos 720p
  • 23. Qualitative Results Bit/Pixel 0.146 FPS 20 onTitan Xp GPU Bit/Pixel 12 FPS - Fast RAW CAIP’19 All videos 720p
  • 24. RAW Bit/Pixel 0.0570 FPS 3 onV100 GPU Bit/Pixel 12 FPS - Wave.ONE Qualitative Results CAIP’19 All videos 720p
  • 25. Qualitative Results Bit/Pixel 0.0570 FPS 1,6 onTitan Xp GPU* Bit/Pixel 0.146 (x3) FPS 20 onTitan Xp GPU (x12) Fast Wave.ONE All videos 720p CAIP’19
  • 26. No-Reference Evaluation ‱ According to NIQE and BRISQUE value GAN images as ’more natural’ the the original ones! ‱ VIIDEO is likely penalizing reconstruction for lack of temporal coherence CAIP’19
  • 27. ‱ GANs are well known to work well when the distribution is simpler ‱ Faces are possibly the most interesting object we are willing to transmit ‱ Here what we can do with a severe degradation and a specialized GAN Specialized Artifact Removal ACM MM’19 Best Demo
  • 28. Specialized Artifact Removal ACM MM’19 Best Demo ‱ GANs are well known to work well when the distribution is simpler ‱ Faces are possibly the most interesting object we are willing to transmit ‱ Here what we can do with a severe degradation and a specialized GAN
  • 29.
  • 30.
  • 33. Semantic Coding + GAN ACM MM’20 Received frame Receiver
  • 34. Semantic Segmentation ACM MM’20 ‱ Use BiSeNet to label each pixel detected as face or neck as foreground, the remainder as background.
  • 35. Semantic Segmentation ACM MM’20 + + Generator Loss 𝑀 𝑀 face background
  • 36. Semantic Segmentation ACM MM’20 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0 5 10 15 20 25 30 35 40 45 LPIPS Our Method Baseline 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40 45 BRISQUE Our Method Baseline ~ 30% improvement ~ 55% improvement
  • 37. RESULTS – QUALITATIVE 1 min COMPRESSED
  • 38. RESULTS – QUALITATIVE 1 min COMPRESSED + SALIENCY
  • 39. RESULTS – QUALITATIVE 1 min COMPRESSED + SALIENCY + GAN
  • 40. Evaluation using Language ‱ Task: “evaluate the weighted combination of all of the visually significant attributes of an image” Quality ACM MM ASIA’21 Best Paper Award
  • 41. A statue of a woman wearing a christmas tie A brown and white dog wearing a tie A brown and white dog wearing a red tie A statue of a woman wearing a christmas tie A brown and white dog wearing a tie A brown and white dog wearing a red tie A statue of a woman wearing a christmas tie A brown and white dog wearing a tie A brown and white dog wearing a red tie JPEG GAN HQ Evaluation using Language Use image captioning algorithm to evaluate the fine semantics of the image ACM MM ASIA’21 Best Paper Award
  • 42. Assesment Methodolgy ACM MM ASIA’21 Best Paper Award
  • 43. Assesment Methodolgy ACM MM ASIA’21 Best Paper Award
  • 44. Assesment Methodolgy ACM MM ASIA’21 Best Paper Award Language Model Language Metric 0.752 Language Model Pseudo Ground Truth Caption Predicted Caption Input Image Reference Image ‱ “GT” caption can be generated from the reference imagein case captions are not available (the usual case)
  • 45. Evaluating Enhanced Images ACM MM ASIA’21 Best Paper Award ‱ Our approach scores higher GAN reconstructions (REC) of JPEG compressed images across a wide range of QFs ‱ Results are consistent for all captioning metrics across all qualities/enhancements
  • 46. Changing the Captioning Model ACM MM ASIA’21 Best Paper Award ‱ Using a better captioning model increases the correlation with MOS ‱ Visual features are shared among [1] and [2] [1] M. Cornia, M. Stefanini, L. Baraldi, and R. Cucchiara. Meshed-memory transformer for image captioning. In Proc. of CVPR 2020 [2] P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang. Bottom-up and top- down attention for image captioning and visual question answering. In Proc. of CVPR 2018 [1]
  • 47. JPEG GAN ï»ż A man riding a wave on a surfboard in the ocea ï»ż A couple of people sitting next to a christmas tree. Qualitative Analysis ACM MM ASIA’21 Best Paper Award
  • 48. What’s Next for Restoration? ‱ Innovative technologies for restoration replacingGANs ‱ Smaller architectures and new ways to train them ‱ Blind restoration
  • 49. References L. Galteri, M. Bertini , L. Seidenari, A. Del Bimbo, Video Compression for Object Detection Algorithms’, ICPR 2018 L. Galteri, L. Seidenari, M. Bertini, A. Del Bimbo, 'Deep Generative Adversarial Compression Artifact Removal’, IEEE ICCV 2017 L. Galteri, L. Seidenari, M. Bertini, A. Del Bimbo, ‘Deep Universal Generative Adversarial Compression Artifact Removal’, IEEE TMM 2019 L. Galteri, L. Seidenari, M. Bertini, A. Del Bimbo, ‘Towards Real-Time Image Enhancement GANs’, CAIP 2019 L. Galteri, L. Seidenari, M. Bertini, T. Uricchio, A. Del Bimbo, ‘Fast Video Quality Enchancement Using GANs’, ACM MM Best Demo 2019 L. Galteri, M. Bertini, L. Seidenari, T. Uricchio, A. Del Bimbo, ‘Increasing Video Perceptual Quality with GANs and Semantic Coding’, ACM MM 2020 L. Galteri, L. Seidenari, P. Bongini, M. Bertini, A. Del Bimbo, ‘Language Based Image Quality Enhancement’, ACM MM Asia Best Paper Award 2021

Notas do Editor

  1. 1 min