Variational Autoencoders
Generative Models
 They take in data as input and learn to generate new data points from the same
data distribution.
 They learn the hidden representations using unsupervised learning techniques
Variational Autoencoder
 As the name suggests, it is an auto-encoder, which learns attributes from the data
points and represents them in terms of latent variables.
Problems
 How can we make use of the auto-encoder architecture to generate new data points ?
 Assuming we can pass a vector from that learnt latent space to the decoder, how can
guarantee it’s not going to result in a garbage output.
 VAEs address the above problems
1. How do we use the Auto Encoder Architecture for generation ?
So, if we train the Auto encoder network and somehow learn the data distribution of the latent
space, we can then sample from that latent space and pass it down to decoder and generate
data points
But, there is a problem.
2. How can we make sure that if we sample from our latent space, we are going to get new
and meaningful output ?
 VAEs achieve this by constricting the latent space.
 The encoder and decoder parameters are tuned to accommodate for this setup
But,
Calculating Marginal Probability
 If X = (x1, x2, x3) and Z = (z1, z2)
then,
𝑃 𝑍 𝑋 =
𝑃(𝑋|𝑍)∗𝑃(𝑍)
𝑃(𝑋)
Here, the P(X) is very difficult to calculate, especially in higher dimensions.
It takes the form of 𝑧1
𝑧2
𝑃 𝑋1, 𝑋2, 𝑋3, 𝑍1, 𝑍2 𝑑𝑧1 ∗ 𝑑𝑧2 and is intractable.
There are ways to solving this by using,
1. Using Monte Carlo Integration techniques
2. Variational Inference
Variational Inference
As x is already given, log p(x) is a constant.
And KL(q(z) || p(z|x)) is what we wanted to minimize and it is always
>= 0
0 <= p(x) <= 1 and KL >= 0
So, it is equivalent to maximizing the 3rd term.
It is called Variational Lower bound
This is nothing but,
Expectation of p(x|z) w.r.t q(z|x) +
KL(q(z) || p(z|x))
So, Maximizing lower bound means,
Minimizing KL (as >=0), and
For a given q and z, maximize the the likelihood of observing the x
Reparameterization Trick
VAE Architecture
References
 Lecture by Ali Ghodsi https://www.youtube.com/watch?v=uaaqyVS9-rM
 Lecture by Pascal Poupart https://www.youtube.com/watch?v=DWVlEw0D3gA
Hierarchy of Generative Models
Figure from Ian Goodfellow’s tutorial on GANs, NIPS 2016
Internals of a VAE’s learning algorithm
KL Divergence

Variational Auto Encoder and the Math Behind

  • 1.
  • 2.
    Generative Models  Theytake in data as input and learn to generate new data points from the same data distribution.  They learn the hidden representations using unsupervised learning techniques
  • 3.
    Variational Autoencoder  Asthe name suggests, it is an auto-encoder, which learns attributes from the data points and represents them in terms of latent variables.
  • 4.
    Problems  How canwe make use of the auto-encoder architecture to generate new data points ?  Assuming we can pass a vector from that learnt latent space to the decoder, how can guarantee it’s not going to result in a garbage output.  VAEs address the above problems
  • 5.
    1. How dowe use the Auto Encoder Architecture for generation ? So, if we train the Auto encoder network and somehow learn the data distribution of the latent space, we can then sample from that latent space and pass it down to decoder and generate data points But, there is a problem.
  • 7.
    2. How canwe make sure that if we sample from our latent space, we are going to get new and meaningful output ?  VAEs achieve this by constricting the latent space.  The encoder and decoder parameters are tuned to accommodate for this setup
  • 8.
  • 9.
    Calculating Marginal Probability If X = (x1, x2, x3) and Z = (z1, z2) then, 𝑃 𝑍 𝑋 = 𝑃(𝑋|𝑍)∗𝑃(𝑍) 𝑃(𝑋) Here, the P(X) is very difficult to calculate, especially in higher dimensions. It takes the form of 𝑧1 𝑧2 𝑃 𝑋1, 𝑋2, 𝑋3, 𝑍1, 𝑍2 𝑑𝑧1 ∗ 𝑑𝑧2 and is intractable. There are ways to solving this by using, 1. Using Monte Carlo Integration techniques 2. Variational Inference
  • 10.
  • 11.
    As x isalready given, log p(x) is a constant. And KL(q(z) || p(z|x)) is what we wanted to minimize and it is always >= 0 0 <= p(x) <= 1 and KL >= 0 So, it is equivalent to maximizing the 3rd term. It is called Variational Lower bound
  • 12.
    This is nothingbut, Expectation of p(x|z) w.r.t q(z|x) + KL(q(z) || p(z|x)) So, Maximizing lower bound means, Minimizing KL (as >=0), and For a given q and z, maximize the the likelihood of observing the x
  • 17.
  • 18.
  • 20.
    References  Lecture byAli Ghodsi https://www.youtube.com/watch?v=uaaqyVS9-rM  Lecture by Pascal Poupart https://www.youtube.com/watch?v=DWVlEw0D3gA
  • 21.
    Hierarchy of GenerativeModels Figure from Ian Goodfellow’s tutorial on GANs, NIPS 2016
  • 22.
    Internals of aVAE’s learning algorithm KL Divergence