As proposed by the Paper, High-Resolution Image Synthesis with Latent Diffusion Models, latent diffusion models are a simple and efficient way that improve both the training and sampling efficiency of denoising diffusion models while retaining their quality
2. Image Generation/Synthesis
Generates new images from an existing dataset.
For example, GANs can create images that look like photographs of human faces, even though the faces don't
belong to any real person.
source:
3. Why it is important: Application areas
❖ Generating synthetic training data if training data is insufficient or collecting it is too costly,
generating human faces and objects in 2D and 3D.
❖ Now with AI being universal, the application extends to using image reconstruction to identify
if someone have undergone surgeries to change their appearance.
❖ Editing photographs by denoising images, enhancing the existing image data.
❖ In the drug discovery process.
❖ Tumor detection in human bodies, and applying filters on Instagram, Faceapp, etc.
5. Generative adversarial networks (GANs)
GANs achieve this level of realism by pairing a generator, which learns to produce the target output,
with a discriminator, which learns to distinguish true data from the output of the generator. The
generator tries to fool the discriminator, and the discriminator tries to keep from being fooled.
Source
6. Drawbacks of GANs
❖ Unstable training and mode collapse,
❖ autoregressive models generally suffer from slow synthesis speed.
7. Diffusion Models
❖ Diffusion models, originally proposed in 2015, have seen a recent revival in interest due to
their training stability and their promising sample quality results on image and audio
generation.
❖ Diffusion models work by corrupting the training data by progressively adding Gaussian noise,
slowly wiping out details in the data until it becomes pure noise, and then training a neural
network to reverse this corruption process.
❖ Running this reversed corruption process synthesizes data from pure noise by gradually
8. The debate: which is better?
❖ Being likelihood-based models, heavily using parameter sharing, they can model highly
complex distributions of natural images and overcome the drawbacks of AR models and GANs.
❖ Still Evaluating and optimizing these models in pixel space, however, has the downside of low
inference speed and very high training costs
❖ We address both drawbacks with our proposed LDMs, which work on a compressed latent
space of lower dimensionality.
9. Latent Diffusion Models
Just like any likelihood-based model, learning can be divided into two stages:
1. Perceptual Image Compression
2. Generative Modeling of Latent Representations
10.
11. Advantages:
❖ By leaving the high-dimensional image space, we obtain DMs which are computationally much
more efficient because sampling is performed on a low-dimensional space.
❖ We exploit the inductive bias of DMs inherited from their UNet architecture which makes them
particularly effective for data with spatial structure.
❖ Finally, we obtain general-purpose compression models whose latent space can be used to train
multiple generative models and which can also be utilized for other downstream applications
such as single-image CLIP-guided synthesis
12. Experiments and results:
❖ After getting trained unconditional models of images on CelebA-HQ, FFHQ , LSUN-Churches,
and -Bedrooms [102], the sample quality and their coverage of the data manifold were
evaluated using ii) FID and ii) Precision-and-Recall.
❖ We can see On CelebA-HQ, reports a new state-of-the-art FID of 5.11, outperforming previous
likelihood-based models and GANs.
13.
14.
15.
16. Conclusion
As proposed by the Paper, latent diffusion models are a simple and efficient way that improve both
the training and sampling efficiency of denoising diffusion models while retaining their quality.