Latent variable models assume that observed data is generated from hidden (unobserved) variables, and learning means inferring those hidden variables and their relationship to the data.
Here is an latent/unobserved/hidden random variable. Typically is jointly estimated along with the model parameters . For each we assume that there exists a corresponding .
- If is discrete, . Because for every there would be a corresponding , we can create such buckets for each . This is basically what K-Means Clustering and Gaussian Mixture Models aim to do.
- If is continuous, . Typically . Here represents a feature vector corresponding to the given . In doing so, the model is attempting to find a “hidden space”, usually lower dimension, in which the data lives.
TL;DR - If the latent variable is discrete, then the model can be used for clustering. If the latent variable is continuous, then the model can be used for this feature extraction task.
Principle for Learning LVMs
Suppose we have a dataset and a Latent Variable Model . Our goal here would be to estimate the model parameter given . This can be done by minimizing the KL Divergence.
Here is the log-likelihood function of . Thus this optimization problem is called Maximum Likelihood Estimation. Because expectation is a linear function and we are calculating the expectation w.r.t not , we can try to maximize each term independently and then take the expectation of all to get the final output. So let’s just consider the following,
By Jensen’s Inequality we know that . So applying this on the above equation for we get,
Here is called the evidence and thus is called the Evidence Lower Bound (ELBO). is function of both the model parameters and the density on , . is called the Variational Latent Posterior. Similar to VDM, here too we maximize a lower-bound of a value in order to optimize it.
Gaussian Mixture Models (GMM)
In GMMs is discrete, .
In a GMM, , .
Parameters of a GMM are,
Here , , and and (convex combination of Gaussian Distributions). Since our goal is to estimate via ELBO optimization, we can use the Expectation Maximization Algorithm (EM Algorithm) which updates both alternatively.
It can be shown that EM ensures that . This doesn’t ensure that the likelihood function will keep on increasing, but it ensures that it won’t decrease as the parameters get updated.
Applying EM algorithm for the GMM, it can be shown analytically (try once) that,
ELBO can be optimized for an LVM using the EM algorithm, provided can be computed. If can’t be computed, then EM fails.