Introduction to Diffusion Models on deep learning

5
1
2
3
5
6
Introduction
➔ Reminder on generative model
➔ Diffusion Model vs VAE
Denoising Diffusion Probabilistic Models
➔ Principle
➔ Forward and Reverse Diffusion
➔ Training and Sampling
Example: Fashion MNIST
➔ Generation of Fashion MNIST
DDPM improvements
➔ Beta scheduling and Variance learning
➔ Fast sampling
➔ Latent diffusion
DDPM applications
➔ Text-to-image
➔ Other task : inpainting / outpainting / super-resolution

Rappel - VAE - 1
Source https://lilianweng.github.io/posts/2021-07-11-diffusion-models/
6

7
Rappel - VAE - 2
VAE COST FUNCTION
Sampling process
Training process
metric
Decoder
- Gaussian
- Mixture of gaussian

Rappel - GAN - 1
2 networks in opposition :
- Generator
- Discriminator
Ideal solution :
- Generator ～ P(x|z)
- Discriminator = ½
Adding supervision concept in an
unsupervised task !
8
Data to train a
classification
network
Source
https://www.kdnuggets.com/2017/01/generative-adversari
al-networks-hot-topic-machine-learning.html

9
Rappel - GAN - 2
Sampling process
Training process
Generator
Train
discriminator
Train
generator
GAN COST FUNCTION
- Uniform

Rappel - GAN - 3 GAN convergence problems !
Vanishing gradient due to
discriminant being too
perfect, generator can’t train
anymore
Mode collapse due to
generator learning only some
good examples instead of the
whole data distribution
10
True Data
Generated Data
Source
https://lilianweng.github.io/posts/2017-08-20-gan/
No convergence due to the
nature of the problem
(MinMax)
True data
Generated data

11
VAE vs GAN
VAE
GAN
- Generate high
quality data
- Hard to train
- Have more
diversity
How to compare generative models ?

VAE vs DPM
12
VAE
Low dimensional
representation of the input.
DPM
Fixed
encoder
High dimensional
representation of the input.
Z
Decoder

DPM - Landscape
13
Source : https://github.com/bentoml/stable-diffusion-bentoml
Dhariwal & Nichol, 2021
Source : Dall-E 2

DDPM - Principle - 1
14
After the training the Diffusion Model will generate images from Gaussian noise:

15
There are three processes that characterize Diffusion Models:
1. Forward Diffusion Process
2. Reverse Diffusion Process
3. Sampling Process

16
Forward Diffusion Process
This process will add noise to any image gradually
0 ≤ t ≤ T ; T is a hyperparameter

17
Forward Diffusion Process
Examples of images at different times t
Here we choose T=1000, but it can be different values (it’s an hyperparameter)

18
Reverse Diffusion Process
We train a model to predict xt-1
from xt
x0
is any image from the dataset
a bit less
noisy than xt
a bit more
noisy than xt-1

19
Reverse Diffusion Process
The same model must predict every xt-1
from xt

20
Sampling Process
From a random noise we can generate an image

DDPM - Forward Diffusion - 1
21

22

23

24

25

26
So we can sample a noised image at any time step directly from original image

DDPM - Reverse Diffusion - 1
28
We predict only the mean, we know the rest.

29
We can predict xt-1
by predicting zt

30
A little bit of explanation (a tiny bit):
https://lilianweng.github.io/posts/2021-07-11-diffusion-models/

31

32

DDPM - Training - 0
34
https://arxiv.org/abs/2006.11239

35
Dataset
DDPM - Training - 1
x0

36
DDPM - Training - 3
Uniform
distribution
Between 1 and T
t = 50
x0

37
DDPM - Training - 4
Gaussian
distribution
Same shape than x0
zt
( = ϵ )
x0
t = 50

38
DDPM - Training - 5
x0
t = 50
zt

39
DDPM - Training - 5
xt
t = 50
zθ
( = ϵθ
)
x0
t = 50
zt
xt

40
DDPM - Training - 6
zθ
Loss
x0
t = 50
zt
zθ
zt
2
xt

41
DDPM - Training - 7
Loss
Backward
∇θ
Weight update

42
DDPM - Training - 8
and repeat !
It was just 1 iteration.

43
DDPM - Sampling - 0
https://arxiv.org/abs/2006.11239

44
DDPM - Sampling - 1
Gaussian
distribution
Same shape than training
dataset images
xT

45
DDPM - Sampling - 2
xT
T
zθ
(xT
,T) ≈ zT
Reminder:

46
DDPM - Sampling - 3
Gaussian
distribution
Same shape than training
dataset images
z (noise)

48
DDPM - Sampling - 5
and repeat !
Don’t generate xT
, replace it by xT-1
and T by T-1… and do it again T time.

DDPM
50
DDPM vs VAE vs GAN
VAE
GAN
- Generate high
quality data
- Hard to train
- Have more
diversity
- Long sampling
process

51
DDPM improvements
Improving the log-likelihood
metrics (Improved DDPM, 2021)
Improving image synthesis
Dhariwal & Nichol, 2021
Faster sampling
(Denoising Diffusion Implicit Model,
2021 / Latent Diffusion Model, 2021)

Beta scheduling - Cosine scheduling (IDDPM, 2021)
52
0.02 0.0001
1000
Almost pure noise
Strong noising

Variance learning (IDDPM, 2021)
53
Limitations :
“(...) learning reverse
process variances
(...) leads to
unstable training and
poorer sample
quality compared to
fixed variance.”
(DDPM, 2020)
0.001
DDPM
IDDPM
the early stage of diffusion are very important

54
Diffusion & reverse process (DDIM, 2021)
Mr. Gaussian
Mr. Data
Limitations :
“For example, it
takes around 20
hours to sample
50k images of size
32 x 32 from a
DDPM, but less
than a minute to do
so from a GAN on a
Nvidia 2080 Ti
GPU.” (DDIM,
2021)

55
Extension of DDPM (DDIM, 2021)
Generalisation to a bigger class of
inverse process (non-Markovian)
Important :
Same network and
training as a DDPM

56
Generation process (DDIM, 2021)
DDPM
DDIM
can be computed !!!

57
Finding a better inverse process (DDIM, 2021)
=> DDIM
FID

58
Noise interpolation (DDIM, 2021)
+

Latent diffusion : Concept of latent space (reminder)
60
Latent space of MNIST database for AE and VAE
Source https://thilospinner.com/towards-an-interpretable-latent-space/
● Similar objects are close to one another in the latent space
● Usually lower dimension than original data (therefore does compression as well)
● Usually impossible to visualize by a human

Latent diffusion : Concept of latent space (example)
61
Example : Word embedding
Actor - Pierre Curie + Marie Curie ≈ Actress
Turn sparse data (for instance words) into
vectors

DPM
62
+
n
o
i
s
e
+
d
i
f
f
u
s
i
o
n

Latent diffusion model
63
Latent space
Encoder Decoder
Image space
+
n
o
i
s
e
+
d
i
f
f
u
s
i
o
n

Conditional diffusion
A picture of GENCI’s supercomputer Jean Zay. On the storage bays, a picture
of the eponymous minister, with a background representing a simulation of a
turbulent flow of liquid sodium, and a quote from Jean Zay’s memoirs.
Alongside the bays, the cooling equipment with the logo of the manufacturer
and the owner of the supercomputer.
How to control the output of a diffusion model
and make sure it generates what we want ?
64

Conditional diffusion : text → image
A picture of GENCI’s supercomputer Jean Zay. On
the storage bays, a picture of the eponymous
minister, with a background representing a
simulation of a turbulent flow of liquid sodium, and
a quote from Jean Zay’s memoirs. Alongside the
bays, the cooling equipment with the logo of the
manufacturer and the owner of the
supercomputer.
Latent space
65

Conditional diffusion: text → image
66

Conditional diffusion : cross-attention
Stable Diffusion uses cross-attention to make the denoising process consistent with the
provided sentence embedding
67
Source Rombach, Robin, et al. "High-resolution image synthesis with latent diffusion models."
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

Conditional diffusion : other method
Spatial Self
Attention
Dense layer
Standard Unet layer
(conv, maxpool,
upsample, …)
68

➔ Inpainting
➔ Super-resolution
➔ Outpainting
Other tasks
Diffusion models can solve a variety of tasks. We already know about image generation, as well as
conditional image generation (for instance with a short paragraph describing the picture)
Other tasks:
69
Source Lugmayr, Andreas, et al. "Repaint: Inpainting using denoising diffusion probabilistic models." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

➔ Thin mask
➔ Right side mask
for halving the
image
➔ Every second row
of pixels for
alternating lines
Inpainting through masking
➔ Wide mask
➔ Outer mask for
expanding the image
➔ Every second pixel in
both directions for
super-resolution
We can solve many of these tasks through the usage of a mask
70

Inpainting through masking
71
Step t :

Inpainting through masking: step t
72
+
n
o
i
s
e
+
d
i
f
f
u
s
i
o
n
New artifacts added (in this coarse example, our diffusion
model drew a sun), so we force the known background again!
xt
x0
xt-1

Inpainting through masking: step t
73
+
n
o
i
s
e
+
d
i
f
f
u
s
i
o
n
xt
x0
xt-1
This operation does not take into
account the generated information

Inpainting deharmonization
74
Picture deharmonization: the generated image has a satisfying texture but is wrong semantically.
The suggested solution is to resample.

Inpainting resampling
75
xt
x0 noising
diffusion
× mask
× (1 - mask)
+ xt-1
resampling
This loop is performed several times (a hyperparameter) before moving on the next step

Inpainting resampling
76
n is the number of times the resampling loop was performed
Disadvantage: the number of required denoising steps is much higher
Advantage: it produces much more satisfying results
Lugmayr, Andreas, et al. "Repaint: Inpainting using denoising diffusion probabilistic models." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

77
Sources
Papers:
- Deep Unsupervised Learning using Nonequilibrium Thermodynamics (DPM) (https://arxiv.org/abs/1503.03585)
- Denoising Diffusion Probabilistic Models (DDPM) (https://arxiv.org/abs/2006.11239)
- Improved Denoising Diffusion Probabilistic Models (IDDPM) (https://arxiv.org/abs/2102.09672)
- Denoising Diffusion Implicit Models (DDIM) (https://arxiv.org/abs/2010.02502)
- Diffusion Models Beat GANs on Image Synthesis (https://arxiv.org/abs/2105.05233)
- High-Resolution Image Synthesis with Latent Diffusion Models (LDM) (https://arxiv.org/abs/2112.10752)
- Repaint: Inpainting using denoising diffusion probabilistic models (https://arxiv.org/pdf/2201.09865)
- Diffusion Models in Vision: A Survey (https://arxiv.org/abs/2209.04747)
- Diffusion Models: A Comprehensive Survey of Methods and Applications(https://arxiv.org/abs/2209.00796)
Other ressources:
- Lilian Weng’s article (https://lilianweng.github.io/posts/2021-07-11-diffusion-models)
- Yang Song’s article (https://yang-song.net/blog/2021/score)
- Outlier video (https://www.youtube.com/watch?v=HoKDTa5jHvg)

Question break #5 & Practice
78

Épisode 15 :
AI, droit, société et éthique
● Interprétabilité, reproductibilité, biais
● Cadre légal
● Privacy
● Session interactive
Durée : 2h
Next, on Fidle: Jeudi 23 mars, 14h00

To be continued...
Next on Fidle :
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
https://creativecommons.org/licenses/by-nc-nd/4.0/
Séquence 15 :
AI, droit, société et éthique
Jeudi 23 mars,
https://fidle.cnrs.fr
Contact@fidle.cnrs.fr
https://fidle.cnrs.fr/youtube
Merci !

Introduction to Diffusion Models on deep learning

More Related Content

What's hot

Similar to Introduction to Diffusion Models on deep learning

More from angelo119154

Recently uploaded

Introduction to Diffusion Models on deep learning