Generative adversarial networks (GANs) are introduced, including the basic GAN framework containing a generator and discriminator. Various types of GANs are then discussed, such as DCGANs, semi-supervised GANs, and character GANs. The document concludes with a summary of resources on GANs and applications such as image-to-image translation and conditional waveform synthesis.
4. GANs
• Generative
• Learn a generative model
• Adversarial
• Trained in an adversarial setting
• Networks
• Use Deep Neural Networks
4
5. Generative Adversarial Networks
(Ian J. Goodfellow, 10 Jun 2014)
5
Two components, the generator and the discriminator:
The generator G needs to capture the data distribution.
The discriminator D estimates the probability that a sample comes from the training
data rather than from G.
6. • The discriminator in a GAN is simply a classifier.
• It tries to distinguish real data from the data created by the generator.
• It could use any network architecture appropriate to the type of data it’s classifying.
The Discriminator
1
2
3
6
7. The Generator
7
1. Produce generator output from sampled random noise.
2. Get discriminator "Real" or "Fake" classification for generator output.
3. Calculate loss from discriminator classification.
4. Backpropagate through both the discriminator and generator to obtain gradients.
5. Use gradients to change only the generator weights.
1
2
3
4
5
10. Why Generative Models?
We’ve only seen discriminative models so far
• Given an image X, predict a label Y
• Estimates P(Y|X)
Discriminative models have several key limitations
• Can’t model P(X), i.e. the probability of seeing a certain image
• • Thus, can’t sample from P(X), i.e. can’t generate new images
Generative models (in general) cope with all of above
• Can model P(X)
• Can generate new images
• We can use these new images for lack of enough data.
10
11. Semi – Supervised GANs (5 Jun 2016, Augustus Odena)
11
• In Normal GANs we train both Generator & Discriminator at the same time. / after training we discard the discriminator
because we used it just for training the Generator.
• For semi-supervised learning, we need to transform the discriminator into a multi-class classifier.
• for each input image, the discriminator has to learn the probabilities of it being a one, two, three and so on.
• Help the Generator to produce realistic images ⇒ we have to instruct the Discriminator to distinguish between real / fake
samples.
• Use the Generator’s Images, along with the labeled &
Unlabeled training data, to help classify the dataset.
1
2
3
4
5
12. Semi – Supervised GANs VS. Normal GANs
12
96.4
92.8
88.3
80.2
96.5
89.5
85.9
75
0
10
20
30
40
50
60
70
80
90
100
1000 Sample 100 Sample 50 Sample 25 Sample
Model
Accuracy
# of Samples
Compare Accuracy of training with different # of samples
SGAN GAN
13. What is DCGAN? (19 Nov 2015 , Alec Radford)
DCGAN: Deep Convolutional Generative Adversarial Networks
It works in the opposite direction of the image classifier (CNN).
CNN transforms an image to a class label (list of probabilities).
DCGAN generates an image from random parameters.
13
14. Training Strategy of DCGAN
We train two models simultaneously.
CNN: Classifying authentic and fake images.
"Authentic" images are provided as training data to CNN.
DCGAN: Trained to generate images classified as authentic by CNN.
By trying to fool CNN, DCGAN learns to generate images similar to the training data.
14
15. Training Loop of DCGAN
15
P(A) : Probability that A is
authentic.
P(B) : Probability that B is
authentic.
By repeating this loop, CNN becomes
more accurate and DCGAN becomes
more crafty.
23. Character GAN
23
Character GAN: a generative model that can be trained on only a few samples (8 – 15) of a
given character.
We train a generative model in a low-data setting (8 to 15 training samples) to repose &
animate characters based on keypoint positions.
This model generates novel poses based on keypoint locations.
24. Character GAN
24
• Since we only have limited training samples, one of the key challenges lies in how to address (dis)occlusion. e.g. when a
hand moves behind or in front of a body.
• To address mentioned problem we introduce a novel layering approach which explicitly splits the input keypoints into
different layers which are processed independently.
• These layers represent different parts of the character and provide a strong implicit bias that helps to obtain realistic
results even with strong (dis)occlusions.
• To combine the features of individual layers ⇒ we use an adaptive scaling approach conditioned on all keypoints.
• This model scales to larger datasets when more data is available.
1
2
3
4
5
25. Character GAN
25
• (Problem) If some body parts are occluded by others they still exist even though
they may not be visible. ⇒ we split our characters into different layers (Solve).
• To model different layers, the generator processes each keypoint individually and learns a representation
of each keypoint layer.
• Adaptive scaling technique: ⇒ we scale the features of each layer before concatenating them!
• For AST we first learn an embedding of the keypoints & their layers. ⇔ Based on this embedding we then
learn scaling parameters for each keypoint layer & use them to scale the features of each layer.
1
2
3
4
5
• Model:
Generator: ⇒ generates the image based on the keypoint location.
Discriminator: ⇒ trained to distinguish between real & fake image-keypoint pairs.
26. Character GAN - Layering
1
2
3
4
• our layered approach is beneficial when several
keypoints overlap in the 2D space,
• e.g. representing the “left” side of the character
(e.g. left arm and leg), the “middle” part of the
character (e.g. head and torso) and the “right” side
of the character (e.g. right arm and leg).
• These layers can be modeled individually and can
then be composed to form a final image.
• To model this, our generator processes each
keypoint layer individually and learns a
representation of each keypoint layer.
26
27. Character GAN – Patch Based Refinement
1
2
3
4
• To further improve the final result we apply a patch-based refinement
algorithm that replaces generated patches with their closest real patch.
• In our case, given a real & a generated image, for each patch in the
generated image, we find the closest patch in the dataset.
• We use Patch Match approximation nearest neighbor algorithm: ⇒ Replace
the generated patches with their real equivalent.
• This approach often improves the sharpness and general image quality over
the output of the generator.
27
30. Construct a combination of Gan Models introduced.
Optimize it for Low-Complexity
Create a creativity based GAN application
Lunch it as an Experimental product
Steps Ahead
30
32. • [1] Goodfellow, I., et al.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D.,
Weinberger, K.Q. (eds.) NIPS 2014, pp. 2672–2680 (2014)
• [2] Bengio, Y. (2009). Learning deep architectures for AI. Now Publishers.
• [3] Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative
adversarial networks. In: ICLR 2016 (2016)
• [4] Hinton, G., Deng, L., Dahl, G. E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.,
and Kingsbury, B. (2012a). Deep neural networks for acoustic modeling in speech recognition. IEEE Signal
Processing Magazine, 29(6), 82–97.
• [5] Krizhevsky, A., Sutskever, I., and Hinton, G. (2012). ImageNet classification with deep convolutional neural
networks. In NIPS’2012.
• [6] Jarrett, K., Kavukcuoglu, K., Ranzato, M., and LeCun, Y. (2009). What is the best multi-stage architecture for
object recognition? In Proc. International Conference on Computer Vision (ICCV’09), pages 2146–2153. IEEE.
• [7] Goodfellow, I. J., Mirza, M., Courville, A., and Bengio, Y. (2013b). Multi-prediction deep Boltzmann machines.
In NIPS’2013.
• [8] Hinz, T., Fisher, M., Wang, O., Shechtman, E., and Wermter, S., “CharacterGAN: Few-Shot Keypoint Character
Animation and Reposing”, 2021. Link: https://arxiv.org/abs/2102.03141
Resources
32
33. • [9] Choi, H., Park, C. and Lee, K., 2020. From Inference To Generation: End-To-End Fully Self-Supervised Generation Of Human
Face From Speech.
• [10] Odena, Augustus. “Semi-Supervised Learning with Generative Adversarial Networks.” ArXiv abs/1606.01583 (2016): n. pag.
• [11] Chen, Xiang et al. “FTGAN: A Fully-trained Generative Adversarial Networks for Text to Face
Generation.” ArXiv abs/1904.05729 (2019): n. pag.
• [12] Y. Choi, M. Choi, M. Kim, J. Ha, S. Kim and J. Choo, "StarGAN: Unified Generative Adversarial Networks for Multi-domain
Image-to-Image Translation," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 2018,
pp. 8789-8797, doi: 10.1109/CVPR.2018.00916.
• [13] K. Kumar, R. Kumar, T. de Boissiere, L. Gestin, W. Z. Teoh, J. Sotelo, A. de Brebisson, Y. Bengio, and A. Courville,
“MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis,” arXiv.org, 09-Dec-2019.
• [14] S. Palazzo, C. Spampinato, I. Kavasidis, D. Giordano and M. Shah, "Generative Adversarial Networks
Conditioned by Brain Signals," in 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy
• [15] Shimizu T., Xu J., Tasaka K. (2020) MobileGAN: Compact Network Architecture for Generative Adversarial
Network. In: Palaiahnakote S., Sanniti di Baja G., Wang L., Yan W. (eds) Pattern Recognition. ACPR 2019.
Lecture Notes in Computer Science, vol 12046. Springer, Cham. https://doi.org/10.1007/978-3-030-41404-7_23
• [16] https://developers.google.com/machine-learning/gan/
Resources
33
35. Thank you
2020-12-16 Creativity based Hardware aware Machine Vision System 35
Parham Zilouchian
p.zilouchian@gmil.com
zilouchian.org
36. Summary
GANs are generative models that are implemented using
two stochastic neural network modules: Generator and
Discriminator.
Generator tries to generate samples from random noise as
input
Discriminator tries to distinguish the samples from
Generator and samples from the real data distribution.
Both networks are trained adversarially (in tandem) to
fool the other component. In this process, both models
become better at their respective tasks.
36