Latent variable models assume that observed data is generated from hidden (unobserved) variables, and learning means inferring those hidden variables and their relationship to the data.
Here is an latent/unobserved/hidden random variable. Typically is jointly estimated along with the model parameters . For each we assume that there exists a corresponding .
- If is discrete, . Because for every there would be a corresponding , we can create such buckets for each . This is basically what K-Means Clustering and Gaussian Mixture Models aim to do.
- If is continuous, . Typically . Here represents a feature vector corresponding to the given .
Principle for Learning LVMs
Suppose we have a dataset and a Latent Variable Model . Our goal here would be to estimate the model parameter given . This can be done by minimizing the KL Divergence.
Here is the log-likelihood function of . Thus this optimization problem is called Maximum Likelihood Estimation. Instead of solving the entire optimization problem at once, let’s just consider the following,
By Jensen’s Inequality we know that . So applying this on the above equation for we get,
Here is called the Evidence Lower Bound (ELBO). is function of both the model parameters and the density on , . is called the Variational Latent Posterior. Similar to VDM, here too we maximize a lower-bound of a value in order to optimize it.
Gaussian Mixture Models (GMM)
In GMMs is discrete, .
In a GMM, , .
Parameters of a GMM are,
Here , , and and . Since our goal is to estimate via ELBO optimization, we can use the Expectation Maximization Algorithm (EM Algorithm) which updates both alternatively.
It can be shown that EM ensures that . This doesn’t ensure that the likelihood function will keep on increasing, but it ensures that it won’t decrease as the parameters get updated.
Applying EM algorithm for the GMM, it can be shown analytically (try once) that,
ELBO can be optimized for an LVM using the EM algorithm, if can be computed.