LDM_ImageSythesis.pptx

Latent Diffusion Models
for High Resolution Image
Synthesis
-Akanksha Rawat
SJSU Master’s Student

Image Generation/Synthesis
Generates new images from an existing dataset.
For example, GANs can create images that look like photographs of human faces, even though the faces don't
belong to any real person.
source:

Why it is important: Application areas
❖ Generating synthetic training data if training data is insufficient or collecting it is too costly,
generating human faces and objects in 2D and 3D.
❖ Now with AI being universal, the application extends to using image reconstruction to identify
if someone have undergone surgeries to change their appearance.
❖ Editing photographs by denoising images, enhancing the existing image data.
❖ In the drug discovery process.
❖ Tumor detection in human bodies, and applying filters on Instagram, Faceapp, etc.

Generative adversarial networks (GANs)
GANs achieve this level of realism by pairing a generator, which learns to produce the target output,
with a discriminator, which learns to distinguish true data from the output of the generator. The
generator tries to fool the discriminator, and the discriminator tries to keep from being fooled.
Source

Drawbacks of GANs
❖ Unstable training and mode collapse,
❖ autoregressive models generally suffer from slow synthesis speed.

Diffusion Models
❖ Diffusion models, originally proposed in 2015, have seen a recent revival in interest due to
their training stability and their promising sample quality results on image and audio
generation.
❖ Diffusion models work by corrupting the training data by progressively adding Gaussian noise,
slowly wiping out details in the data until it becomes pure noise, and then training a neural
network to reverse this corruption process.
❖ Running this reversed corruption process synthesizes data from pure noise by gradually

The debate: which is better?
❖ Being likelihood-based models, heavily using parameter sharing, they can model highly
complex distributions of natural images and overcome the drawbacks of AR models and GANs.
❖ Still Evaluating and optimizing these models in pixel space, however, has the downside of low
inference speed and very high training costs
❖ We address both drawbacks with our proposed LDMs, which work on a compressed latent
space of lower dimensionality.

Latent Diffusion Models
Just like any likelihood-based model, learning can be divided into two stages:
1. Perceptual Image Compression
2. Generative Modeling of Latent Representations

Advantages:
❖ By leaving the high-dimensional image space, we obtain DMs which are computationally much
more efficient because sampling is performed on a low-dimensional space.
❖ We exploit the inductive bias of DMs inherited from their UNet architecture which makes them
particularly effective for data with spatial structure.
❖ Finally, we obtain general-purpose compression models whose latent space can be used to train
multiple generative models and which can also be utilized for other downstream applications
such as single-image CLIP-guided synthesis

Experiments and results:
❖ After getting trained unconditional models of images on CelebA-HQ, FFHQ , LSUN-Churches,
and -Bedrooms [102], the sample quality and their coverage of the data manifold were
evaluated using ii) FID and ii) Precision-and-Recall.
❖ We can see On CelebA-HQ, reports a new state-of-the-art FID of 5.11, outperforming previous
likelihood-based models and GANs.

Conclusion
As proposed by the Paper, latent diffusion models are a simple and efficient way that improve both
the training and sampling efficiency of denoising diffusion models while retaining their quality.

References:
https://paperswithcode.com/paper/high-resolution-image-synthesis-with-latent
https://arxiv.org/pdf/2112.10752v2.pdf
https://www.analyticsinsight.net/understanding-importance-generative-adversarial-networks-gans/
https://analyticsindiamag.com/diffusion-models-vs-gans-which-one-to-choose-for-image-synthesis/

LDM_ImageSythesis.pptx

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a LDM_ImageSythesis.pptx

Semelhante a LDM_ImageSythesis.pptx (20)

Último

Último (20)

LDM_ImageSythesis.pptx