The Bayesian model for prediction problems requires setting up different probability structures that go through the process of integration. However, these formulated integrals are not tractable analytically. Moreover, application of Markov Chain Monte Carlo (MCMC) methods to solve these integrals is slow in nature, especially if the parameter space is high dimensional. The key idea behind the Bayesian inference is to marginalize over unknown parameters, rather than make point estimation. This technique avoids severe over-fitting problems and allows direct model comparison.
The Variational methods extend the practicality of Bayesian inference to complex Bayesian models and “medium-sized” data sets. The Variational Bayesian inference (VBI) aims to approximate posterior distribution by a simpler distribution for which marginalization is tractable. Maximum likelihood (ML) estimation is one of the most popular technologies used in the modern classification problem. In general, we can define inference as for the process of obtaining a conclusion based on the information available, which includes observed data as a subset. From this, Bayesian inference can be defined as a process of inference using Bayes’ theorem in which information is used to newly infer the plausibility of a hypothesis. This process produces information that adds to organizational knowledge.
Variational Bayesian inference method has proved powerful in many applications, including brain mapping, Digital Signal Processing (White-light diffraction tomography of unlabeled live cells, Myoblast alignment on 2D wavy patterns, Multimedia Problems, Space-variant blur deconvolution and denoising in the dual exposure problem, Sparse Additive Matrix Factorization), linear and logistic regression(automatic relevance determination, non-conjugate model building), fMRI time series analysis based on the use of general linear models with Autoregressive (AR) error processes. However, we couldn’t take the full advantages of Variational Bayesian Inference model.
There are lots of opportunities to get the full potential of VBI. The main advantage of VBI over-supervised and semi-supervised learning technique is that VBI doesn’t require any labeled data. Moreover, the accuracy values are higher in case of VBI comparing supervised and semi-supervised learning techniques. For instance, for the identification and interpretation of Non-Standard words (NSW) that can be used in text normalization, automatic speech recognition, text to speech conversion, in Bengali news corpus VBI gives overall accuracy of 98.95% whereas naïve Bayes classification (supervised learning) and naïve Bayes with Expectation Maximization algorithm(semi-supervised) can provide 96.74% and 97.52% accuracies respectively.
Bayesian Inference basic
Assume that x = {x1, x2,…, xn} are the observation and θ the unknown parameters of a model that generated x. One of the most popular approaches for parameter estimation is ML. The ML approach is given as
In many cases, the direct assessment of the likelihood function p(x|θ) is complex and is either difficult or impossible to compute it directly or optimize it. In such cases, the computation of the likelihood function is greatly affected by the introduction of the hidden variables zi(i=1…n). These random variables act as links that connect the observations to the unknown parameters via Bayes’ law.
The selection of hidden variables is problem specific. However, as their name suggests, these variables are not observed but they supply enough information about the observations. Therefore the conditional probability p(z|x) is easy to compute. Apart from this role, hidden variables play another important role in statistical modeling. They are an important part of the probabilistic mechanism that is assumed to have generated the observations and can be described very succinctly by a graph that is termed ‘graphical model’.
Once hidden variables and a prior probability, p(z| θ), have been introduced, one can get the likelihood or the marginal likelihood as it is called at times by integrating out (marginalization) the hidden variables according to:
Now, using Bayes rule, we are interested to find the likelihood function and posterior of the hidden variables according to
Once we calculate the posterior, it is possible to calculate the inference for the hidden variables. Despite the simplicity of the above formulation, generally the integral in (2) is either impossible or very difficult to compute in closed form. Thus, the main effort in Bayesian inference is concentrated on techniques that allow us to bypass or approximately evaluate this integral. As we have discussed earlier, EM algorithm is a Bayesian inference methodology that considers the posterior p(z|x;θ) and iteratively maximized the likelihood function without explicitly computing it. A major drawback of this technology is that in many cases posterior is not available. However, current advancement of Bayesian inference allows us to bypass this difficulty by approximating the posterior. This current approximating technique is known as ‘Variational Bayesian approximation’.
We have discussed about Variational Bayesian Inference model which is unsupervised machine learning approach, its importance and application areas. We will discuss the mathematical foundation in the next part.