Deep Generative Learning for All

Universitat Politècnica de Catalunya
Universitat Politècnica de CatalunyaAssociate Professor at Universitat Politècnica de Catalunya em Universitat Politècnica de Catalunya
Deep Generative
Learning for All
(a.k.a. The GenAI Hype)
Xavier Giro-i-Nieto
@DocXavi
xavigiro.upc@gmail.com
Associate Professor (on leave)
Universitat Politècnica de Catalunya
Institut de Robòtica Industrial
ELLIS Unit Barcelona
Spring 2020
[Summer School website]
2
Acknowledgements
Santiago Pascual
santi.pascual@upc.edu
@santty128
PhD 2019
Universitat Politecnica de Catalunya
Technical University of Catalonia
Albert Pumarola
apumarola@iri.upc.edu
@AlbertPumarola
PhD 2021
Universitat Politècnica de Catalunya
Technical University of Catalonia
Kevin McGuinness
kevin.mcguinness@dcu.ie
Research Fellow
Insight Centre for Data Analytics
Dublin City University
Gerard I. Gállego
PhD Student
Universitat Politècnica de Catalunya
gerard.ion.gallego@upc.edu
@geiongallego
3
Acknowledgements
Eduard Ramon
Applied Scientist
Amazon Barcelona
@eram1205
Wentong Liao
Applied Scientist
Amazon Barcelona
Ciprian Corneanu
Applied Scientist
Amazon Seattle
Laia Tarrés
PhD Student
Universitat Politècnica de Catalunya
laia.tarres@upc.edu
Outline
1. Motivation
2. Discriminative vs Generative Models
a. P(Y|X): Discriminative Models
b. P(X): Generative Models
c. P(X|Y): Conditioned Generative Models
3. Latent variable
4. Architectures
a. GAN
b. Auto-regressive
c. VAE
d. Diffusion
Image generation
5
#StyleGAN3 (NVIDIA) Karras, Tero, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and
Timo Aila. "Alias-free generative adversarial networks." NeurIPS 2021. [code]
6
#DiT Peebles, William, and Saining Xie. "Scalable Diffusion Models with Transformers." arXiv 2022.
Image generation
7
#DALL-E-2 (OpenAI) Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen "Hierarchical Text-Conditional
Image Generation with CLIP Latents." 2022. [blog]
Text-to-Image generation
8
Text-to-Video generation
#Make-a-video (Meta) Singer, Uriel, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu et al.
"Make-a-video: Text-to-video generation without text-video data." arXiv 2022.
“A dog wearing a Superhero
outfit with red cape flying
through the sky”
Synthetic labels to train discriminative models
9
#BigDatasetGAN Li, Daiqing, Huan Ling, Seung Wook Kim, Karsten Kreis, Adela Barriuso, Sanja Fidler, and Antonio
Torralba. "BigDatasetGAN: Synthesizing ImageNet with Pixel-wise Annotations." arXiv 2022.
Video Super-resolution
10
#TecoGAN Chu, M., Xie, Y., Mayer, J., Leal-Taixé, L., & Thuerey, N. Learning temporal coherence via self-supervision for
GAN-based video generation. ACM Transactions on Graphics 2020.
Human Motion Transfer
11
#EDN Chan, C., Ginosar, S., Zhou, T., & Efros, A. A. Everybody dance now. ICCV 2019.
Speech Enhancement
12
Recover lost information/add enhancing details by learning the natural distribution of audio
samples.
original
enhanced
Outline
1. Motivation
2. Discriminative vs Generative Models
a. P(Y|X): Discriminative Models
b. P(X): Generative Models
c. P(X|Y): Conditioned Generative Models
3. Latent variable
4. Architectures
a. GAN
b. Auto-regressive
c. VAE
d. Diffusion
14
Discriminative vs Generative Models
Philip Isola, Generative Models of Images. MIT 2023.
Outline
1. Motivation
2. Discriminative vs Generative Models
a. Pθ
(Y|X): Discriminative Models
b. Pθ
(X): Generative Models
c. Pθ
(X|Y): Conditioned Generative Models
3. Latent variable
4. Architectures
a. GAN
b. Auto-regressive
c. VAE
d. Diffusion
Pθ
(Y|X): Discriminative Models
16
Slide credit:
Albert Pumarola (UPC 2019)
Classification Regression
Text Prob. of being a Potential Customer
Image
Audio Speech Translation
Jim Carrey
What Language?
X=Data
Y=Labels
θ = Model parameters
Discriminative Modeling
Pθ
(Y|X)
17
0.01
0.09
0.9
input
Network (θ) output
class
Figure credit: Javier Ruiz (UPC TelecomBCN)
Discriminative model: Tell me the probability of some ‘Y’ responses given ‘X’
inputs.
Pθ
(Y | X = [pixel1
, pixel2
, …, pixel784
])
Pθ
(Y|X): Discriminative Models
Outline
1. Motivation
2. Discriminative vs Generative Models
a. P(Y|X): Discriminative Models
b. P(X): Generative Models
c. P(X|Y): Conditioned Generative Models
3. Sampling
4. Architectures
a. GAN
b. Auto-regressive
c. VAE
d. Diffusion
19
Slide Concept: Albert Pumarola (UPC 2019)
Pθ
(X): Generative Models
Classification Regression Generative
Text Prob. of being a Potential Customer
“What about Ron magic?” offered Ron.
To Harry, Ron was loud, slow and soft
bird. Harry did not like to think about
birds.
Image
Audio Language Translation
Music Composer and Interpreter
MuseNet Sample
Jim Carrey
What Language?
Discriminative Modeling
Pθ
(Y|X)
Generative Modeling
Pθ
(X)
X=Data
Y=Labels
θ = Model parameters
Each real sample xi
comes from
an M-dimensional probability
distribution P(X).
X = {x1
, x2
, …, xN
}
Pθ
(X): Generative Models
21
1) We want our model with parameters θ to output samples with distribution
Pθ
(X), matching the distribution of our training data P(X).
2) We can sample points from Pθ
(X) plausibly looking how P(X) distributed.
P(X)
Distribution of training data
Pλ,μ,σ
(X)
Distribution of training data
Example: Gaussian Mixture Models (GMM)
Pθ
(X): Generative Models
22
What are the parameters θ we need to estimate in deep neural networks ?
θ = (weights & biases)
output
Network (θ)
?
Pθ
(X): Generative Models
Outline
1. Motivation
2. Discriminative vs Generative Models
a. P(Y|X): Discriminative Models
b. P(X): Generative Models
c. P(X|Y): Conditioned Generative Models
3. Sampling
4. Architectures
a. GAN
b. Auto-regressive
c. VAE
d. Diffusion
Pθ
(X|Y): Conditioned Generative Models
Joint probabilities P(X|Y) to
model conditioning variables on
the generative process:
X = {x1
, x2
, …, xN
}
Y = {y1
, y2
, …, yN
}
DOG
CAT
TRUCK
PIZZA
THRILLER
SCI-FI
HISTORY
/aa/
/e/
/o/
Outline
1. Motivation
2. Discriminative vs Generative Models
a. P(Y|X): Discriminative Models
b. P(X): Generative Models
c. P(X|Y): Conditioned Generative Models
3. Sampling
4. Architectures
a. Generative Adversarial Networks (GANs)
b. Auto-regressive
c. Variational Autoencoders (VAEs)
d. Diffusion
Our learned model should be able to make up new samples from the distribution,
not just copy and paste existing samples!
26
Figure from NIPS 2016 Tutorial: Generative Adversarial Networks (I. Goodfellow)
Sampling
Philip Isola, Generative Models of Images. MIT 2023.
Sampling
Slide concept: Albert Pumarola (UPC 2019)
Learn
Sample Out
Training Dataset
Generated Samples
Feature
space
Manifold Pθ
(X)
“Model the data distribution so that we can sample new points out of the
distribution”
Sampling
Sampling
z
Generated Samples
How could we generate diverse samples from a deterministic deep neural network ?
Generator
(θ)
Sampling
Generated Samples
How could we generate diverse samples from a deterministic deep neural network ?
Generator
(θ)
Sample z from a known prior, for example, a multivariate normal distribution N(0, I).
Example: dim(z)=2
x’
z
Slide concept: Albert Pumarola (UPC 2019)
Learn
Training Dataset
Interpolated Samples
Feature
space
Manifold Pθ
(X)
Traversing the learned manifold through interpolation.
Interpolation
Disentanglement
Philip Isola, Generative Models of Images. MIT 2023.
Disentanglement
Philip Isola, Generative Models of Images. MIT 2023.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
■ Generator & Discriminator Networks
■ Adversarial Training
■ Conditional GANs
○ Auto-regressive
○ Variational Autoencoders (VAEs)
○ Diffusion
35
Credit: Santiago Pascual [slides] [video]
36
Generator & Discriminator
We have two modules: Generator (G) and Discriminator (D).
● They “fight” against each other during training→ Adversarial Learning
D’s goal:
Classify between real
samples and those
produced by G.
G’s goal:
Fool D to
missclassify.
Goodfellow, Ian J., Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and
Yoshua Bengio. "Generative Adversarial Nets." NeurIPS 2014.
37
Discriminator
Discriminator network D → binary classifier between real (x) and generated (x’).
samples.
Generated (1)
Discriminator
(θ)
x’
Discriminator
(θ)
x Real (0)
38
Generator
Real world
samples
Database
Discriminator
Real
Loss
Latent
random
variable
Sample
Sample
Generated
z
Generator & Discriminator
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
■ Generator & Discriminator Networks
■ Adversarial Training
■ Conditional GANs
○ Auto-regressive
○ Variational Autoencoders (VAEs)
○ Diffusion
Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to
detect whether money is real or fake.
100
100
FAKE: It’s
not even
green
Adversarial Training Analogy: is it fake money?
Figure: Santiago Pascual (UPC)
Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to detect
whether money is real or fake.
100
100
FAKE:
There is no
watermark
Adversarial Training Analogy: is it fake money?
Figure: Santiago Pascual (UPC)
Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to detect
whether money is real or fake.
100
100
FAKE:
Watermark
should be
rounded
Adversarial Training Analogy: is it fake money?
Figure: Santiago Pascual (UPC)
Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to
detect whether money is real or fake.
After enough iterations, and if the counterfeiter is good enough (in terms of G network it
means “has enough parameters”), the police should be confused.
REAL?
FAKE?
Adversarial Training Analogy: is it fake money?
Figure: Santiago Pascual (UPC)
Adversarial Training
Generator
Real world
images
Discriminator
Real
Loss
Latent
random
variable
Sample
Sample
Generated
Alternate between training the discriminator and generator
Neural Network
Neural Network
Figure: Kevin McGuinness (DCU)
Adversarial Training: Discriminator
Generator
Real world
images
Discriminator
Real
Loss
Latent
random
variable
Sample
Sample
Generated
1. Fix generator weights, draw samples from both real world and generated images
2. Train discriminator to distinguish between real world and generated images
Backprop error to
update discriminator
weights
Figure: Kevin McGuinness (DCU)
Adversarial Training: Discriminator
Generator
Real world
images
Discriminator
Real
Loss
Latent
random
variable
Sample
Sample
Backprop error to
update discriminator
weights
Figure: Kevin McGuinness (DCU)
In the set up of the figure, which ground truth label for a generated image should we use to train the
discriminator ? Consider a binary encoding of “1” (Real) and “0” (Fake).
Generated
Adversarial Training: Generator
1. Fix discriminator weights
2. Sample from generator by injecting noise.
3. Backprop error through discriminator to update generator weights
Generator
Real world
images
Discriminator
Real
Loss
Latent
random
variable
Sample
Sample
Backprop error to
update generator
weights
Figure: Kevin McGuinness (DCU)
Generated
Adversarial Training: Generator
Generator
Real world
images
Discriminator
Real
Loss
Latent
random
variable
Sample
Sample
Backprop error to
update generator
weights
Figure: Kevin McGuinness (DCU)
In the set up of the figure, which ground truth label for a generated image should we use to train the
generator ? Consider a binary encoding of “1” (Real) and “0” (Fake).
Generated
Adversarial Training: How to make it work ?
Soumith Chintala, “How to train a GAN ? Tips and tricks to make GAN work”. Github 2016.
NeurIPS Barcelona 2016
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
■ Generator & Discriminator Networks
■ Adversarial Training
■ Conditional GANs
○ Variational Autoencoders (VAEs)
○ Diffusion
○ Auto-regressive
Non-Conditional GANs
51
Slide credit: Víctor Garcia
Discriminator
D(·)
Generator
G(·)
Real World
Random
seed (z)
Real/Generated
52
Conditional GANs (cGAN)
Slide credit: Víctor Garcia
Conditional Adversarial Networks
Real World
Real/Generated
Condition
Discriminator
D(·)
Generator
G(·)
53
Learn more about GANs
Ian Goodfellow.
NeurIPS Barcelona 2016.
Mihaela Rosca & Jeff Donahue.
UCL x Deepmind 2020.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
○ Diffusion
○ Auto-regressive
Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
■ AE vs VAE
■ Variational Inference
■ Reparametrization trick
■ Generative behaviour
○ Diffusion
○ Auto-regressive
Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
Manifold Pθ
(X)
Encode Decode
“Generate”
56
Auto-Encoder (AE)
z
Feature
space
● Learns Pθ
(X) with a reconstruction loss.
● Proposed as a pre-training stage for the encoder (“self-supervised learning”).
57
Auto-Encoder (AE)
Encode Decode
“Generate”
z
Feature
space
Manifold Pθ
(X)
Could we generate new samples by sampling from a normal distribution and
feeding it into the encoder, or the decoder (as in GANs) ?
?
58
Auto-Encoder (AE)
No, because the noise (or encoded noise) would be out of the learned manifold.
Encode Decode
“Generate”
z
Feature
space
Manifold Pθ
(X)
Could we generate new samples by sampling from a normal distribution and
feeding it into the encoder, or the decoder (as in GANs) ?
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
■ AE vs VAE
■ Variational Inference
■ Reparametrization trick
■ Generative behaviour
○ Diffusion
○ Auto-regressive
Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
60
Variational Auto-Encoder (AE)
Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv 2013.
Encoder: Predict the mean μ(X) and covariance ∑(X) of a multivariate normal
distribution.
Encode
Encode
Loss term to follow a normal
distribution N(0, I).
61
Source: Wikipedia. Image by Bscan - Own work, CC0, https://commons.wikimedia.org/w/index.php?curid=25235145
Maths 101: Multivariate normal distribution
62
Variational Auto-Encoder (AE)
Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv 2013.
Decoder: Trained to reconstruct the input data from a z sampled from N(μ, ∑).
Encode
z
Decode Reconstruction
loss term.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
■ AE vs VAE
■ Variational Inference
■ Reparametrization trick
■ Generative behaviour
○ Diffusion
○ Auto-regressive
Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
z
Encode Decode
Challenge:
We cannot backprop through sampling of because “Sampling” is not differentiable!
64
Reparametrization Trick
z
Solution: Reparameterization trick
Sample and define z from it, multiplying by and summing
65
Reparametrization Trick
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
■ AE vs VAE
■ Variational Inference
■ Reparametrization trick
■ Generative behaviour
○ Diffusion
○ Auto-regressive
Generative behaviour
z
67
How can we now generate new samples once the underlying generating
distribution is learned ?
z1
We can sample from our prior N(0,I), discarding the encoder path.
z2
z3
68
Generative behaviour
69
Generative behaviour
N(0, I)
Example: P(X) can be modelled mapping a simple normal distribution N(0, I) through a
powerful non-linear function g(z).
70
Generative behaviour
#NVAE Vahdat, Arash, and Jan Kautz. "NVAE: A deep hierarchical variational autoencoder." NeurIPS 2020. [code]
71
Walking around z manifold dimensions gives us spontaneous generation of
samples with different shapes, poses, identities, lightning, etc..
Generative behaviour
Learn more about VAEs
72
Andriy Mnih (UCL - Deepmind 2020)
Max Welling - University of Amsterdam (2020)
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
○ Diffusion
○ Auto-regressive
Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
○ Denoising Diffusion Models (DDM)
■ Forward diffusion process
■ Reverse denoising process
○ Auto-regressive
Forward Diffusion Process
Philip Isola, Generative Models of Images. MIT 2023.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
○ Denoising Diffusion Models (DDM)
■ Forward diffusion process
■ Reverse denoising process
○ Auto-regressive
Denoising Autoencoder (DAE)
Encode Decode
“Generate”
#DAE Vincent, Pascal, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. "Extracting and composing robust
features with denoising autoencoders." ICML 2008.
Philip Isola, Generative Models of Images. MIT 2023.
Reverse Denoising process
Data Manifold Pθ
(x0
)
x0
xT
Noise
Image
Network learns to
denoise step by step
CNN
U-net
Reverse Denoising process
What is the dimension of the latent variable in diffusion models ?
Same dimensionality as the diffused data.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
○ Denoising Diffusion Models (DDM)
○ Auto-regressive
Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
○ Denoising Diffusion Models (DDM)
○ Auto-regressive Models (AR)
Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
Motivation
PixelRNN
An RNN predicts the probability of each sample xi
with a categorical output
distribution: Softmax
83
#PixelRNN Van Oord, A., Kalchbrenner, N., & Kavukcuoglu, K. Pixel recurrent neural networks. ICML 2016.
PixelRNN
84
#PixelRNN Van Oord, A., Kalchbrenner, N., & Kavukcuoglu, K. Pixel recurrent neural networks. ICML 2016.
Why are not all completions identical ?
(aka how can AR offer a generative behaviour ?)
PixelCNN
85
#PixelCNN Van den Oord, A., Kalchbrenner, N., Espeholt, L., Vinyals, O., & Graves, A. Conditional image generation with
pixelcnn decoders. NeurIPS 2016.
Wavenet
86
Wavenet used dilated convolutions to produce synthetic audio, sample by
sample, conditioned over by receptive field of size T:
#Wavenet Oord, Aaron van den, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal
Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. "Wavenet: A generative model for raw audio." arXiv 2016. [blog]
The Transformer
Figure: Jay Alammar, “The illustrated Transformer” (2018)
#Transformer Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I.. Attention
is all you need. NeurIPS 2017.
Auto-regressive (at test).
The Transformer
Figure: Jay Alammar, “The illustrated Transformer” (2018)
Text completion
#GPT-2 Alec Radford, Jeffrey Wu, Dario Amodei, Daniela Amodei, Jack Clark, Miles Brundage, Ilya Sutskever, “Better
Language Models and Their Implications”. OpenAI Blog 2019.
“GPT-2 is trained with a simple objective: predict the next word, given all of the
previous words within some text.”
Condition Generated completions
In a shocking finding, scientist
discovered a herd of unicorns
living in a remote, previously
unexplored valley, in the Andes
Mountains. Even more surprising to
the researchers was the fact that
the unicorns spoke perfect
English.
The scientist named the population,
after their distinctive horn, Ovid’s
Unicorn. These four-horned, silver-white
unicorns were previously unknown to
science.
Now, after almost two centuries, the
mystery of what sparked this odd
phenomenon is finally solved.
Zero-shot learning
#GPT-2 Alec Radford, Jeffrey Wu, Dario Amodei, Daniela Amodei, Jack Clark, Miles Brundage, Ilya Sutskever, “Better
Language Models and Their Implications”. OpenAI Blog 2019.
GPT-2/3 can also solve tasks for which it was not trained for (zero-shot
learning).
Text Reading Comprehension
The 2008 Summer Olympics torch relay was run from March 24
until August 8, 2008, prior to the 2008 Summer Olympics,
with the theme of “one world, one dream”. Plans for the
relay were announced on April 26, 2007, in Beijing, China.
The relay, also called by the organizers as the “Journey of
Harmony”, lasted 129 days and carried the torch 137,000 km
(85,000 mi) – the longest distance of any Olympic torch
relay since the tradition was started ahead of the 1936
Summer Olympics.
After being lit at the birthplace of the Olympic Games in
Olympia, Greece on March 24, the torch traveled to the
Panathinaiko Stadium in Athens, and then to Beijing,
arriving on March 31. From Beijing, the torch was following
a route passing through six continents. The torch has
visited cities along the Silk Road, symbolizing ancient
links between China and the rest of the world. The relay
also included an ascent with the flame to the top of Mount
Everest on the border of Nepal and Tibet, China from the
Chinese side, which was closed specially for the event.
Q: What was the theme?
A: “one world, one dream”.
Q: What was the length of the race?
A: 137,000 km
Q: Was it larger than previous ones?
A: No
Q: Where did the race begin?
A: Olympia, Greece
Zero-shot learning
#GPT-2 Alec Radford, Jeffrey Wu, Dario Amodei, Daniela Amodei, Jack Clark, Miles Brundage, Ilya Sutskever, “Better
Language Models and Their Implications”. OpenAI Blog 2019.
“GPT-2 is trained with a simple objective: predict the next word, given all of the
previous words within some text.”
Zero-shot task performances
(GPT-2 was never trained for these tasks)
#iGPT Chen, M., Radford, A., Child, R., Wu, J., Jun, H., Luan, D., & Sutskever, I. Generative Pretraining from Pixels. ICML
2020.
GPT-2 / GPT-3
#ChatGPT [blog]
#GPT-4 (OpenAI) GPT-4 Technical Report. arXiv 2023. [blog]
ChatGPT / GPT-4
Discussion
Learn more about AR models
Nal Kalchbrenner, Mediterranean Machine Learning
Summer School 2022.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
○ Denoising Diffusion Models (DDM)
○ Auto-regressive Models (AR)
Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
97
Source: David Foster
Recommended books
Interview of David Foster for Machine
Learning Street Talk (2023)
Recommended courses
Deep Unsupervised Learning
(UC Berkeley CS294-158-SP2020)
1 de 99

Recomendados

Semantic segmentation with Convolutional Neural Network Approaches por
Semantic segmentation with Convolutional Neural Network ApproachesSemantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesFellowship at Vodafone FutureLab
1.2K visualizações29 slides
MobileNet - PR044 por
MobileNet - PR044MobileNet - PR044
MobileNet - PR044Jinwon Lee
8.5K visualizações26 slides
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기 por
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기NAVER Engineering
23.1K visualizações82 slides
Deep convolutional neural fields for depth estimation from a single image por
Deep convolutional neural fields for depth estimation from a single imageDeep convolutional neural fields for depth estimation from a single image
Deep convolutional neural fields for depth estimation from a single imageWei Yang
2.2K visualizações27 slides
Deep Learning and the state of AI / 2016 por
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Grigory Sapunov
15.3K visualizações91 slides
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I... por
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...Joonhyung Lee
2.1K visualizações34 slides

Mais conteúdo relacionado

Mais procurados

Domain adaptation por
Domain adaptationDomain adaptation
Domain adaptationTomoya Koike
330 visualizações14 slides
Convolutional Neural Network por
Convolutional Neural NetworkConvolutional Neural Network
Convolutional Neural NetworkVignesh Suresh
267 visualizações10 slides
Nonlinear dimension reduction por
Nonlinear dimension reductionNonlinear dimension reduction
Nonlinear dimension reductionYan Xu
2.8K visualizações36 slides
Depth estimation do we need to throw old things away por
Depth estimation do we need to throw old things awayDepth estimation do we need to throw old things away
Depth estimation do we need to throw old things awayNAVER Engineering
1.7K visualizações119 slides
Explicit Density Models por
Explicit Density ModelsExplicit Density Models
Explicit Density ModelsSangwoo Mo
563 visualizações63 slides
Pr045 deep lab_semantic_segmentation por
Pr045 deep lab_semantic_segmentationPr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentationTaeoh Kim
2.7K visualizações78 slides

Mais procurados(20)

Domain adaptation por Tomoya Koike
Domain adaptationDomain adaptation
Domain adaptation
Tomoya Koike330 visualizações
Convolutional Neural Network por Vignesh Suresh
Convolutional Neural NetworkConvolutional Neural Network
Convolutional Neural Network
Vignesh Suresh267 visualizações
Nonlinear dimension reduction por Yan Xu
Nonlinear dimension reductionNonlinear dimension reduction
Nonlinear dimension reduction
Yan Xu2.8K visualizações
Depth estimation do we need to throw old things away por NAVER Engineering
Depth estimation do we need to throw old things awayDepth estimation do we need to throw old things away
Depth estimation do we need to throw old things away
NAVER Engineering1.7K visualizações
Explicit Density Models por Sangwoo Mo
Explicit Density ModelsExplicit Density Models
Explicit Density Models
Sangwoo Mo563 visualizações
Pr045 deep lab_semantic_segmentation por Taeoh Kim
Pr045 deep lab_semantic_segmentationPr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentation
Taeoh Kim2.7K visualizações
Faster R-CNN - PR012 por Jinwon Lee
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
Jinwon Lee9.4K visualizações
An introduction on normalizing flows por Grigoris C
An introduction on normalizing flowsAn introduction on normalizing flows
An introduction on normalizing flows
Grigoris C299 visualizações
Challenges in Large Scale Machine Learning por Sudarsun Santhiappan
Challenges in Large Scale  Machine LearningChallenges in Large Scale  Machine Learning
Challenges in Large Scale Machine Learning
Sudarsun Santhiappan2.8K visualizações
Deep Learning: Recurrent Neural Network (Chapter 10) por Larry Guo
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
Larry Guo3.8K visualizações
Super resolution in deep learning era - Jaejun Yoo por JaeJun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun Yoo
JaeJun Yoo1.8K visualizações
PR-132: SSD: Single Shot MultiBox Detector por Jinwon Lee
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
Jinwon Lee3.5K visualizações
You Only Look Once: Unified, Real-Time Object Detection por DADAJONJURAKUZIEV
You Only Look Once: Unified, Real-Time Object DetectionYou Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object Detection
DADAJONJURAKUZIEV860 visualizações
Camera-Based Road Lane Detection by Deep Learning II por Yu Huang
Camera-Based Road Lane Detection by Deep Learning IICamera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning II
Yu Huang812 visualizações
Resnet por ashwinjoseph95
ResnetResnet
Resnet
ashwinjoseph951.7K visualizações
Meta-Learning with Memory-Augmented Neural Networks (MANN) por Yeonsu Kim
Meta-Learning with Memory-Augmented Neural Networks (MANN)Meta-Learning with Memory-Augmented Neural Networks (MANN)
Meta-Learning with Memory-Augmented Neural Networks (MANN)
Yeonsu Kim327 visualizações
PR-395: Variational Image Compression with a Scale Hyperprior por Hyeongmin Lee
PR-395: Variational Image Compression with a Scale HyperpriorPR-395: Variational Image Compression with a Scale Hyperprior
PR-395: Variational Image Compression with a Scale Hyperprior
Hyeongmin Lee286 visualizações
Support Vector Machine without tears por Ankit Sharma
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tears
Ankit Sharma24.9K visualizações
Intro to Deep Learning for Computer Vision por Christoph Körner
Intro to Deep Learning for Computer VisionIntro to Deep Learning for Computer Vision
Intro to Deep Learning for Computer Vision
Christoph Körner2.2K visualizações

Similar a Deep Generative Learning for All

Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ... por
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya
515 visualizações66 slides
GAN - Theory and Applications por
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and ApplicationsEmanuele Ghelfi
9.5K visualizações41 slides
EuroSciPy 2019 - GANs: Theory and Applications por
EuroSciPy 2019 - GANs: Theory and ApplicationsEuroSciPy 2019 - GANs: Theory and Applications
EuroSciPy 2019 - GANs: Theory and ApplicationsEmanuele Ghelfi
1.1K visualizações41 slides
Lecture17 xing fei-fei por
Lecture17 xing fei-feiLecture17 xing fei-fei
Lecture17 xing fei-feiTianlu Wang
417 visualizações120 slides
Adversarial examples in deep learning (Gregory Chatel) por
Adversarial examples in deep learning (Gregory Chatel)Adversarial examples in deep learning (Gregory Chatel)
Adversarial examples in deep learning (Gregory Chatel)MeetupDataScienceRoma
1.3K visualizações39 slides
Using model-based statistical inference to learn about evolution por
Using model-based statistical inference to learn about evolutionUsing model-based statistical inference to learn about evolution
Using model-based statistical inference to learn about evolutionErick Matsen
1.9K visualizações73 slides

Similar a Deep Generative Learning for All(20)

GAN - Theory and Applications por Emanuele Ghelfi
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
Emanuele Ghelfi9.5K visualizações
EuroSciPy 2019 - GANs: Theory and Applications por Emanuele Ghelfi
EuroSciPy 2019 - GANs: Theory and ApplicationsEuroSciPy 2019 - GANs: Theory and Applications
EuroSciPy 2019 - GANs: Theory and Applications
Emanuele Ghelfi1.1K visualizações
Lecture17 xing fei-fei por Tianlu Wang
Lecture17 xing fei-feiLecture17 xing fei-fei
Lecture17 xing fei-fei
Tianlu Wang417 visualizações
Adversarial examples in deep learning (Gregory Chatel) por MeetupDataScienceRoma
Adversarial examples in deep learning (Gregory Chatel)Adversarial examples in deep learning (Gregory Chatel)
Adversarial examples in deep learning (Gregory Chatel)
MeetupDataScienceRoma1.3K visualizações
Using model-based statistical inference to learn about evolution por Erick Matsen
Using model-based statistical inference to learn about evolutionUsing model-based statistical inference to learn about evolution
Using model-based statistical inference to learn about evolution
Erick Matsen1.9K visualizações
Distributed Meta-Analysis System por jarising
Distributed Meta-Analysis SystemDistributed Meta-Analysis System
Distributed Meta-Analysis System
jarising8.3K visualizações
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B... por NTNU
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
NTNU459 visualizações
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B... por Albert Orriols-Puig
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
Albert Orriols-Puig437 visualizações
The Success of Deep Generative Models por inside-BigData.com
The Success of Deep Generative ModelsThe Success of Deep Generative Models
The Success of Deep Generative Models
inside-BigData.com320 visualizações
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo... por Codiax
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...
Codiax161 visualizações
ISBA 2022 Susie Bayarri lecture por Pierre Jacob
ISBA 2022 Susie Bayarri lectureISBA 2022 Susie Bayarri lecture
ISBA 2022 Susie Bayarri lecture
Pierre Jacob448 visualizações
Gf o2014talk por Bob O'Hara
Gf o2014talkGf o2014talk
Gf o2014talk
Bob O'Hara721 visualizações
GAN in medical imaging por Cheng-Bin Jin
GAN in medical imagingGAN in medical imaging
GAN in medical imaging
Cheng-Bin Jin585 visualizações
Deep Generative Models por Chia-Wen Cheng
Deep Generative Models Deep Generative Models
Deep Generative Models
Chia-Wen Cheng1.7K visualizações
Striving to Demystify Bayesian Computational Modelling por Marco Wirthlin
Striving to Demystify Bayesian Computational ModellingStriving to Demystify Bayesian Computational Modelling
Striving to Demystify Bayesian Computational Modelling
Marco Wirthlin280 visualizações
Dirty data science machine learning on non-curated data por Gael Varoquaux
Dirty data science machine learning on non-curated dataDirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated data
Gael Varoquaux20K visualizações

Mais de Universitat Politècnica de Catalunya

Towards Sign Language Translation & Production | Xavier Giro-i-Nieto por
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya
290 visualizações94 slides
The Transformer - Xavier Giró - UPC Barcelona 2021 por
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021Universitat Politècnica de Catalunya
258 visualizações53 slides
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI... por
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Universitat Politècnica de Catalunya
183 visualizações92 slides
Open challenges in sign language translation and production por
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and productionUniversitat Politècnica de Catalunya
187 visualizações83 slides
Generation of Synthetic Referring Expressions for Object Segmentation in Videos por
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya
522 visualizações42 slides
Discovery and Learning of Navigation Goals from Pixels in Minecraft por
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftUniversitat Politècnica de Catalunya
193 visualizações40 slides

Mais de Universitat Politècnica de Catalunya(20)

Último

Indian council for child welfare por
Indian council for child welfareIndian council for child welfare
Indian council for child welfareRenuWaghmare2
7 visualizações21 slides
selection of preformed arch wires during the alignment stage of preadjusted o... por
selection of preformed arch wires during the alignment stage of preadjusted o...selection of preformed arch wires during the alignment stage of preadjusted o...
selection of preformed arch wires during the alignment stage of preadjusted o...MaherFouda1
7 visualizações100 slides
Assessment and Evaluation GROUP 3.pdf por
Assessment and Evaluation GROUP 3.pdfAssessment and Evaluation GROUP 3.pdf
Assessment and Evaluation GROUP 3.pdfkimberlyndelgado18
10 visualizações10 slides
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F... por
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...SwagatBehera9
5 visualizações36 slides
Radioactive and Non- radioactive probes por
Radioactive and Non- radioactive probesRadioactive and Non- radioactive probes
Radioactive and Non- radioactive probesNathiya .T Nathiya.T
6 visualizações14 slides
Vegetable grafting: A new crop improvement approach.pptx por
Vegetable grafting: A new crop improvement approach.pptxVegetable grafting: A new crop improvement approach.pptx
Vegetable grafting: A new crop improvement approach.pptxHimul Suthar
8 visualizações69 slides

Último(20)

Indian council for child welfare por RenuWaghmare2
Indian council for child welfareIndian council for child welfare
Indian council for child welfare
RenuWaghmare27 visualizações
selection of preformed arch wires during the alignment stage of preadjusted o... por MaherFouda1
selection of preformed arch wires during the alignment stage of preadjusted o...selection of preformed arch wires during the alignment stage of preadjusted o...
selection of preformed arch wires during the alignment stage of preadjusted o...
MaherFouda17 visualizações
Assessment and Evaluation GROUP 3.pdf por kimberlyndelgado18
Assessment and Evaluation GROUP 3.pdfAssessment and Evaluation GROUP 3.pdf
Assessment and Evaluation GROUP 3.pdf
kimberlyndelgado1810 visualizações
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F... por SwagatBehera9
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...
SwagatBehera95 visualizações
Radioactive and Non- radioactive probes por Nathiya .T Nathiya.T
Radioactive and Non- radioactive probesRadioactive and Non- radioactive probes
Radioactive and Non- radioactive probes
Nathiya .T Nathiya.T6 visualizações
Vegetable grafting: A new crop improvement approach.pptx por Himul Suthar
Vegetable grafting: A new crop improvement approach.pptxVegetable grafting: A new crop improvement approach.pptx
Vegetable grafting: A new crop improvement approach.pptx
Himul Suthar8 visualizações
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance... por InsideScientific
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...
InsideScientific115 visualizações
Note on the Riemann Hypothesis por vegafrank2
Note on the Riemann HypothesisNote on the Riemann Hypothesis
Note on the Riemann Hypothesis
vegafrank28 visualizações
vitamine B1.pptx por ajithkilpart
vitamine B1.pptxvitamine B1.pptx
vitamine B1.pptx
ajithkilpart29 visualizações
Applications of Large Language Models in Materials Discovery and Design por Anubhav Jain
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and Design
Anubhav Jain14 visualizações
Evaluation and Standardization of the Marketed Polyherbal drug Patanjali Divy... por Anmol Vishnu Gupta
Evaluation and Standardization of the Marketed Polyherbal drug Patanjali Divy...Evaluation and Standardization of the Marketed Polyherbal drug Patanjali Divy...
Evaluation and Standardization of the Marketed Polyherbal drug Patanjali Divy...
Anmol Vishnu Gupta7 visualizações
BLOTTING TECHNIQUES SPECIAL por MuhammadImranMirza2
BLOTTING TECHNIQUES SPECIALBLOTTING TECHNIQUES SPECIAL
BLOTTING TECHNIQUES SPECIAL
MuhammadImranMirza27 visualizações
Best Hybrid Event Platform.pptx por Harriet Davis
Best Hybrid Event Platform.pptxBest Hybrid Event Platform.pptx
Best Hybrid Event Platform.pptx
Harriet Davis8 visualizações
ALGAL PRODUCTS.pptx por RASHMI M G
ALGAL PRODUCTS.pptxALGAL PRODUCTS.pptx
ALGAL PRODUCTS.pptx
RASHMI M G 7 visualizações
Exploring the nature and synchronicity of early cluster formation in the Larg... por Sérgio Sacani
Exploring the nature and synchronicity of early cluster formation in the Larg...Exploring the nature and synchronicity of early cluster formation in the Larg...
Exploring the nature and synchronicity of early cluster formation in the Larg...
Sérgio Sacani1.4K visualizações
ZEBRA FISH: as model organism.pptx por mahimachoudhary0807
ZEBRA FISH: as model organism.pptxZEBRA FISH: as model organism.pptx
ZEBRA FISH: as model organism.pptx
mahimachoudhary080711 visualizações
TF-FAIR.pdf por Dirk Roorda
TF-FAIR.pdfTF-FAIR.pdf
TF-FAIR.pdf
Dirk Roorda6 visualizações
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe... por Anmol Vishnu Gupta
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...
Anmol Vishnu Gupta28 visualizações
Oral_Presentation_by_Fatma (2).pdf por fatmaalmrzqi
Oral_Presentation_by_Fatma (2).pdfOral_Presentation_by_Fatma (2).pdf
Oral_Presentation_by_Fatma (2).pdf
fatmaalmrzqi8 visualizações
Presentation on experimental laboratory animal- Hamster por Kanika13641
Presentation on experimental laboratory animal- HamsterPresentation on experimental laboratory animal- Hamster
Presentation on experimental laboratory animal- Hamster
Kanika136416 visualizações

Deep Generative Learning for All

  • 1. Deep Generative Learning for All (a.k.a. The GenAI Hype) Xavier Giro-i-Nieto @DocXavi xavigiro.upc@gmail.com Associate Professor (on leave) Universitat Politècnica de Catalunya Institut de Robòtica Industrial ELLIS Unit Barcelona Spring 2020 [Summer School website]
  • 2. 2 Acknowledgements Santiago Pascual santi.pascual@upc.edu @santty128 PhD 2019 Universitat Politecnica de Catalunya Technical University of Catalonia Albert Pumarola apumarola@iri.upc.edu @AlbertPumarola PhD 2021 Universitat Politècnica de Catalunya Technical University of Catalonia Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University Gerard I. Gállego PhD Student Universitat Politècnica de Catalunya gerard.ion.gallego@upc.edu @geiongallego
  • 3. 3 Acknowledgements Eduard Ramon Applied Scientist Amazon Barcelona @eram1205 Wentong Liao Applied Scientist Amazon Barcelona Ciprian Corneanu Applied Scientist Amazon Seattle Laia Tarrés PhD Student Universitat Politècnica de Catalunya laia.tarres@upc.edu
  • 4. Outline 1. Motivation 2. Discriminative vs Generative Models a. P(Y|X): Discriminative Models b. P(X): Generative Models c. P(X|Y): Conditioned Generative Models 3. Latent variable 4. Architectures a. GAN b. Auto-regressive c. VAE d. Diffusion
  • 5. Image generation 5 #StyleGAN3 (NVIDIA) Karras, Tero, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. "Alias-free generative adversarial networks." NeurIPS 2021. [code]
  • 6. 6 #DiT Peebles, William, and Saining Xie. "Scalable Diffusion Models with Transformers." arXiv 2022. Image generation
  • 7. 7 #DALL-E-2 (OpenAI) Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen "Hierarchical Text-Conditional Image Generation with CLIP Latents." 2022. [blog] Text-to-Image generation
  • 8. 8 Text-to-Video generation #Make-a-video (Meta) Singer, Uriel, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu et al. "Make-a-video: Text-to-video generation without text-video data." arXiv 2022. “A dog wearing a Superhero outfit with red cape flying through the sky”
  • 9. Synthetic labels to train discriminative models 9 #BigDatasetGAN Li, Daiqing, Huan Ling, Seung Wook Kim, Karsten Kreis, Adela Barriuso, Sanja Fidler, and Antonio Torralba. "BigDatasetGAN: Synthesizing ImageNet with Pixel-wise Annotations." arXiv 2022.
  • 10. Video Super-resolution 10 #TecoGAN Chu, M., Xie, Y., Mayer, J., Leal-Taixé, L., & Thuerey, N. Learning temporal coherence via self-supervision for GAN-based video generation. ACM Transactions on Graphics 2020.
  • 11. Human Motion Transfer 11 #EDN Chan, C., Ginosar, S., Zhou, T., & Efros, A. A. Everybody dance now. ICCV 2019.
  • 12. Speech Enhancement 12 Recover lost information/add enhancing details by learning the natural distribution of audio samples. original enhanced
  • 13. Outline 1. Motivation 2. Discriminative vs Generative Models a. P(Y|X): Discriminative Models b. P(X): Generative Models c. P(X|Y): Conditioned Generative Models 3. Latent variable 4. Architectures a. GAN b. Auto-regressive c. VAE d. Diffusion
  • 14. 14 Discriminative vs Generative Models Philip Isola, Generative Models of Images. MIT 2023.
  • 15. Outline 1. Motivation 2. Discriminative vs Generative Models a. Pθ (Y|X): Discriminative Models b. Pθ (X): Generative Models c. Pθ (X|Y): Conditioned Generative Models 3. Latent variable 4. Architectures a. GAN b. Auto-regressive c. VAE d. Diffusion
  • 16. Pθ (Y|X): Discriminative Models 16 Slide credit: Albert Pumarola (UPC 2019) Classification Regression Text Prob. of being a Potential Customer Image Audio Speech Translation Jim Carrey What Language? X=Data Y=Labels θ = Model parameters Discriminative Modeling Pθ (Y|X)
  • 17. 17 0.01 0.09 0.9 input Network (θ) output class Figure credit: Javier Ruiz (UPC TelecomBCN) Discriminative model: Tell me the probability of some ‘Y’ responses given ‘X’ inputs. Pθ (Y | X = [pixel1 , pixel2 , …, pixel784 ]) Pθ (Y|X): Discriminative Models
  • 18. Outline 1. Motivation 2. Discriminative vs Generative Models a. P(Y|X): Discriminative Models b. P(X): Generative Models c. P(X|Y): Conditioned Generative Models 3. Sampling 4. Architectures a. GAN b. Auto-regressive c. VAE d. Diffusion
  • 19. 19 Slide Concept: Albert Pumarola (UPC 2019) Pθ (X): Generative Models Classification Regression Generative Text Prob. of being a Potential Customer “What about Ron magic?” offered Ron. To Harry, Ron was loud, slow and soft bird. Harry did not like to think about birds. Image Audio Language Translation Music Composer and Interpreter MuseNet Sample Jim Carrey What Language? Discriminative Modeling Pθ (Y|X) Generative Modeling Pθ (X) X=Data Y=Labels θ = Model parameters
  • 20. Each real sample xi comes from an M-dimensional probability distribution P(X). X = {x1 , x2 , …, xN } Pθ (X): Generative Models
  • 21. 21 1) We want our model with parameters θ to output samples with distribution Pθ (X), matching the distribution of our training data P(X). 2) We can sample points from Pθ (X) plausibly looking how P(X) distributed. P(X) Distribution of training data Pλ,μ,σ (X) Distribution of training data Example: Gaussian Mixture Models (GMM) Pθ (X): Generative Models
  • 22. 22 What are the parameters θ we need to estimate in deep neural networks ? θ = (weights & biases) output Network (θ) ? Pθ (X): Generative Models
  • 23. Outline 1. Motivation 2. Discriminative vs Generative Models a. P(Y|X): Discriminative Models b. P(X): Generative Models c. P(X|Y): Conditioned Generative Models 3. Sampling 4. Architectures a. GAN b. Auto-regressive c. VAE d. Diffusion
  • 24. Pθ (X|Y): Conditioned Generative Models Joint probabilities P(X|Y) to model conditioning variables on the generative process: X = {x1 , x2 , …, xN } Y = {y1 , y2 , …, yN } DOG CAT TRUCK PIZZA THRILLER SCI-FI HISTORY /aa/ /e/ /o/
  • 25. Outline 1. Motivation 2. Discriminative vs Generative Models a. P(Y|X): Discriminative Models b. P(X): Generative Models c. P(X|Y): Conditioned Generative Models 3. Sampling 4. Architectures a. Generative Adversarial Networks (GANs) b. Auto-regressive c. Variational Autoencoders (VAEs) d. Diffusion
  • 26. Our learned model should be able to make up new samples from the distribution, not just copy and paste existing samples! 26 Figure from NIPS 2016 Tutorial: Generative Adversarial Networks (I. Goodfellow) Sampling
  • 27. Philip Isola, Generative Models of Images. MIT 2023. Sampling
  • 28. Slide concept: Albert Pumarola (UPC 2019) Learn Sample Out Training Dataset Generated Samples Feature space Manifold Pθ (X) “Model the data distribution so that we can sample new points out of the distribution” Sampling
  • 29. Sampling z Generated Samples How could we generate diverse samples from a deterministic deep neural network ? Generator (θ)
  • 30. Sampling Generated Samples How could we generate diverse samples from a deterministic deep neural network ? Generator (θ) Sample z from a known prior, for example, a multivariate normal distribution N(0, I). Example: dim(z)=2 x’ z
  • 31. Slide concept: Albert Pumarola (UPC 2019) Learn Training Dataset Interpolated Samples Feature space Manifold Pθ (X) Traversing the learned manifold through interpolation. Interpolation
  • 32. Disentanglement Philip Isola, Generative Models of Images. MIT 2023.
  • 33. Disentanglement Philip Isola, Generative Models of Images. MIT 2023.
  • 34. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ■ Generator & Discriminator Networks ■ Adversarial Training ■ Conditional GANs ○ Auto-regressive ○ Variational Autoencoders (VAEs) ○ Diffusion
  • 35. 35 Credit: Santiago Pascual [slides] [video]
  • 36. 36 Generator & Discriminator We have two modules: Generator (G) and Discriminator (D). ● They “fight” against each other during training→ Adversarial Learning D’s goal: Classify between real samples and those produced by G. G’s goal: Fool D to missclassify. Goodfellow, Ian J., Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. "Generative Adversarial Nets." NeurIPS 2014.
  • 37. 37 Discriminator Discriminator network D → binary classifier between real (x) and generated (x’). samples. Generated (1) Discriminator (θ) x’ Discriminator (θ) x Real (0)
  • 39. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ■ Generator & Discriminator Networks ■ Adversarial Training ■ Conditional GANs ○ Auto-regressive ○ Variational Autoencoders (VAEs) ○ Diffusion
  • 40. Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to detect whether money is real or fake. 100 100 FAKE: It’s not even green Adversarial Training Analogy: is it fake money? Figure: Santiago Pascual (UPC)
  • 41. Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to detect whether money is real or fake. 100 100 FAKE: There is no watermark Adversarial Training Analogy: is it fake money? Figure: Santiago Pascual (UPC)
  • 42. Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to detect whether money is real or fake. 100 100 FAKE: Watermark should be rounded Adversarial Training Analogy: is it fake money? Figure: Santiago Pascual (UPC)
  • 43. Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to detect whether money is real or fake. After enough iterations, and if the counterfeiter is good enough (in terms of G network it means “has enough parameters”), the police should be confused. REAL? FAKE? Adversarial Training Analogy: is it fake money? Figure: Santiago Pascual (UPC)
  • 44. Adversarial Training Generator Real world images Discriminator Real Loss Latent random variable Sample Sample Generated Alternate between training the discriminator and generator Neural Network Neural Network Figure: Kevin McGuinness (DCU)
  • 45. Adversarial Training: Discriminator Generator Real world images Discriminator Real Loss Latent random variable Sample Sample Generated 1. Fix generator weights, draw samples from both real world and generated images 2. Train discriminator to distinguish between real world and generated images Backprop error to update discriminator weights Figure: Kevin McGuinness (DCU)
  • 46. Adversarial Training: Discriminator Generator Real world images Discriminator Real Loss Latent random variable Sample Sample Backprop error to update discriminator weights Figure: Kevin McGuinness (DCU) In the set up of the figure, which ground truth label for a generated image should we use to train the discriminator ? Consider a binary encoding of “1” (Real) and “0” (Fake). Generated
  • 47. Adversarial Training: Generator 1. Fix discriminator weights 2. Sample from generator by injecting noise. 3. Backprop error through discriminator to update generator weights Generator Real world images Discriminator Real Loss Latent random variable Sample Sample Backprop error to update generator weights Figure: Kevin McGuinness (DCU) Generated
  • 48. Adversarial Training: Generator Generator Real world images Discriminator Real Loss Latent random variable Sample Sample Backprop error to update generator weights Figure: Kevin McGuinness (DCU) In the set up of the figure, which ground truth label for a generated image should we use to train the generator ? Consider a binary encoding of “1” (Real) and “0” (Fake). Generated
  • 49. Adversarial Training: How to make it work ? Soumith Chintala, “How to train a GAN ? Tips and tricks to make GAN work”. Github 2016. NeurIPS Barcelona 2016
  • 50. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ■ Generator & Discriminator Networks ■ Adversarial Training ■ Conditional GANs ○ Variational Autoencoders (VAEs) ○ Diffusion ○ Auto-regressive
  • 51. Non-Conditional GANs 51 Slide credit: Víctor Garcia Discriminator D(·) Generator G(·) Real World Random seed (z) Real/Generated
  • 52. 52 Conditional GANs (cGAN) Slide credit: Víctor Garcia Conditional Adversarial Networks Real World Real/Generated Condition Discriminator D(·) Generator G(·)
  • 53. 53 Learn more about GANs Ian Goodfellow. NeurIPS Barcelona 2016. Mihaela Rosca & Jeff Donahue. UCL x Deepmind 2020.
  • 54. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ○ Diffusion ○ Auto-regressive Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
  • 55. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ■ AE vs VAE ■ Variational Inference ■ Reparametrization trick ■ Generative behaviour ○ Diffusion ○ Auto-regressive Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
  • 56. Manifold Pθ (X) Encode Decode “Generate” 56 Auto-Encoder (AE) z Feature space ● Learns Pθ (X) with a reconstruction loss. ● Proposed as a pre-training stage for the encoder (“self-supervised learning”).
  • 57. 57 Auto-Encoder (AE) Encode Decode “Generate” z Feature space Manifold Pθ (X) Could we generate new samples by sampling from a normal distribution and feeding it into the encoder, or the decoder (as in GANs) ? ?
  • 58. 58 Auto-Encoder (AE) No, because the noise (or encoded noise) would be out of the learned manifold. Encode Decode “Generate” z Feature space Manifold Pθ (X) Could we generate new samples by sampling from a normal distribution and feeding it into the encoder, or the decoder (as in GANs) ?
  • 59. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ■ AE vs VAE ■ Variational Inference ■ Reparametrization trick ■ Generative behaviour ○ Diffusion ○ Auto-regressive Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
  • 60. 60 Variational Auto-Encoder (AE) Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv 2013. Encoder: Predict the mean μ(X) and covariance ∑(X) of a multivariate normal distribution. Encode Encode Loss term to follow a normal distribution N(0, I).
  • 61. 61 Source: Wikipedia. Image by Bscan - Own work, CC0, https://commons.wikimedia.org/w/index.php?curid=25235145 Maths 101: Multivariate normal distribution
  • 62. 62 Variational Auto-Encoder (AE) Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv 2013. Decoder: Trained to reconstruct the input data from a z sampled from N(μ, ∑). Encode z Decode Reconstruction loss term.
  • 63. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ■ AE vs VAE ■ Variational Inference ■ Reparametrization trick ■ Generative behaviour ○ Diffusion ○ Auto-regressive Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
  • 64. z Encode Decode Challenge: We cannot backprop through sampling of because “Sampling” is not differentiable! 64 Reparametrization Trick
  • 65. z Solution: Reparameterization trick Sample and define z from it, multiplying by and summing 65 Reparametrization Trick
  • 66. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ■ AE vs VAE ■ Variational Inference ■ Reparametrization trick ■ Generative behaviour ○ Diffusion ○ Auto-regressive
  • 67. Generative behaviour z 67 How can we now generate new samples once the underlying generating distribution is learned ?
  • 68. z1 We can sample from our prior N(0,I), discarding the encoder path. z2 z3 68 Generative behaviour
  • 69. 69 Generative behaviour N(0, I) Example: P(X) can be modelled mapping a simple normal distribution N(0, I) through a powerful non-linear function g(z).
  • 70. 70 Generative behaviour #NVAE Vahdat, Arash, and Jan Kautz. "NVAE: A deep hierarchical variational autoencoder." NeurIPS 2020. [code]
  • 71. 71 Walking around z manifold dimensions gives us spontaneous generation of samples with different shapes, poses, identities, lightning, etc.. Generative behaviour
  • 72. Learn more about VAEs 72 Andriy Mnih (UCL - Deepmind 2020) Max Welling - University of Amsterdam (2020)
  • 73. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ○ Diffusion ○ Auto-regressive Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
  • 74. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ○ Denoising Diffusion Models (DDM) ■ Forward diffusion process ■ Reverse denoising process ○ Auto-regressive
  • 75. Forward Diffusion Process Philip Isola, Generative Models of Images. MIT 2023.
  • 76. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ○ Denoising Diffusion Models (DDM) ■ Forward diffusion process ■ Reverse denoising process ○ Auto-regressive
  • 77. Denoising Autoencoder (DAE) Encode Decode “Generate” #DAE Vincent, Pascal, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. "Extracting and composing robust features with denoising autoencoders." ICML 2008.
  • 78. Philip Isola, Generative Models of Images. MIT 2023. Reverse Denoising process
  • 79. Data Manifold Pθ (x0 ) x0 xT Noise Image Network learns to denoise step by step CNN U-net Reverse Denoising process What is the dimension of the latent variable in diffusion models ? Same dimensionality as the diffused data.
  • 80. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ○ Denoising Diffusion Models (DDM) ○ Auto-regressive Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
  • 81. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ○ Denoising Diffusion Models (DDM) ○ Auto-regressive Models (AR) Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
  • 83. PixelRNN An RNN predicts the probability of each sample xi with a categorical output distribution: Softmax 83 #PixelRNN Van Oord, A., Kalchbrenner, N., & Kavukcuoglu, K. Pixel recurrent neural networks. ICML 2016.
  • 84. PixelRNN 84 #PixelRNN Van Oord, A., Kalchbrenner, N., & Kavukcuoglu, K. Pixel recurrent neural networks. ICML 2016. Why are not all completions identical ? (aka how can AR offer a generative behaviour ?)
  • 85. PixelCNN 85 #PixelCNN Van den Oord, A., Kalchbrenner, N., Espeholt, L., Vinyals, O., & Graves, A. Conditional image generation with pixelcnn decoders. NeurIPS 2016.
  • 86. Wavenet 86 Wavenet used dilated convolutions to produce synthetic audio, sample by sample, conditioned over by receptive field of size T: #Wavenet Oord, Aaron van den, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. "Wavenet: A generative model for raw audio." arXiv 2016. [blog]
  • 87. The Transformer Figure: Jay Alammar, “The illustrated Transformer” (2018) #Transformer Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I.. Attention is all you need. NeurIPS 2017. Auto-regressive (at test).
  • 88. The Transformer Figure: Jay Alammar, “The illustrated Transformer” (2018)
  • 89. Text completion #GPT-2 Alec Radford, Jeffrey Wu, Dario Amodei, Daniela Amodei, Jack Clark, Miles Brundage, Ilya Sutskever, “Better Language Models and Their Implications”. OpenAI Blog 2019. “GPT-2 is trained with a simple objective: predict the next word, given all of the previous words within some text.” Condition Generated completions In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English. The scientist named the population, after their distinctive horn, Ovid’s Unicorn. These four-horned, silver-white unicorns were previously unknown to science. Now, after almost two centuries, the mystery of what sparked this odd phenomenon is finally solved.
  • 90. Zero-shot learning #GPT-2 Alec Radford, Jeffrey Wu, Dario Amodei, Daniela Amodei, Jack Clark, Miles Brundage, Ilya Sutskever, “Better Language Models and Their Implications”. OpenAI Blog 2019. GPT-2/3 can also solve tasks for which it was not trained for (zero-shot learning). Text Reading Comprehension The 2008 Summer Olympics torch relay was run from March 24 until August 8, 2008, prior to the 2008 Summer Olympics, with the theme of “one world, one dream”. Plans for the relay were announced on April 26, 2007, in Beijing, China. The relay, also called by the organizers as the “Journey of Harmony”, lasted 129 days and carried the torch 137,000 km (85,000 mi) – the longest distance of any Olympic torch relay since the tradition was started ahead of the 1936 Summer Olympics. After being lit at the birthplace of the Olympic Games in Olympia, Greece on March 24, the torch traveled to the Panathinaiko Stadium in Athens, and then to Beijing, arriving on March 31. From Beijing, the torch was following a route passing through six continents. The torch has visited cities along the Silk Road, symbolizing ancient links between China and the rest of the world. The relay also included an ascent with the flame to the top of Mount Everest on the border of Nepal and Tibet, China from the Chinese side, which was closed specially for the event. Q: What was the theme? A: “one world, one dream”. Q: What was the length of the race? A: 137,000 km Q: Was it larger than previous ones? A: No Q: Where did the race begin? A: Olympia, Greece
  • 91. Zero-shot learning #GPT-2 Alec Radford, Jeffrey Wu, Dario Amodei, Daniela Amodei, Jack Clark, Miles Brundage, Ilya Sutskever, “Better Language Models and Their Implications”. OpenAI Blog 2019. “GPT-2 is trained with a simple objective: predict the next word, given all of the previous words within some text.” Zero-shot task performances (GPT-2 was never trained for these tasks)
  • 92. #iGPT Chen, M., Radford, A., Child, R., Wu, J., Jun, H., Luan, D., & Sutskever, I. Generative Pretraining from Pixels. ICML 2020. GPT-2 / GPT-3
  • 93. #ChatGPT [blog] #GPT-4 (OpenAI) GPT-4 Technical Report. arXiv 2023. [blog] ChatGPT / GPT-4
  • 95. Learn more about AR models Nal Kalchbrenner, Mediterranean Machine Learning Summer School 2022.
  • 96. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ○ Denoising Diffusion Models (DDM) ○ Auto-regressive Models (AR) Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
  • 98. Recommended books Interview of David Foster for Machine Learning Street Talk (2023)
  • 99. Recommended courses Deep Unsupervised Learning (UC Berkeley CS294-158-SP2020)