mixture models

Committees

Preliminaries Basic machine learning concepts Probability Theory concepts expectation correlated random variable Analysis of Committees1 The committee is a native inspiration for how to combine several models(or we can say how to combine the outputs of several models). For example, we can combine all the models by: \[ y_{COM}(X)=\frac{1}{M}\sum_{m=1}^My_m(X)\tag{1} \] Then we want to find out whether this average prediction of models is better than every one of them....

EM Algorithm

Preliminaries Gaussian distribution log-likelihood Calculus partial derivative Lagrange multiplier EM Algorithm for Gaussian Mixture1 Analysis Maximizing likelihood could not be used in the Gaussian mixture model directly, because of its severe defects which we have come across at ‘Maximum Likelihood of Gaussian Mixtures’. With the inspiration of K-means, a two-step algorithm was developed. The objective function is the log-likelihood function: \[ \begin{aligned} \ln \Pr(\mathbf{x}|\mathbf{\pi},\mathbf{\mu},\Sigma)&=\ln (\Pi_{n=1}^N\sum_{j=1}^{K}\pi_k\mathcal{N}(\mathbf{x}|\mathbf{\mu}_k,\Sigma_k))\\ &=\sum_{n=1}^{N}\ln \sum_{j=1}^{K}\pi_j\mathcal{N}(\mathbf{x}_n|\mathbf{\mu}_j,\Sigma_j)\\ \end{aligned}\tag{1} \]...

Maximum Likelihood of Gaussian Mixtures

Preliminaries Probability Theory multiplication principle joint distribution the Bayesian theory Gaussian distribution log-likelihood function ‘Maximum Likelihood Estimation’ Maximum Likelihood1 Gaussian mixtures had been discussed in ‘Mixtures of Gaussians’. And once we have a training data set and a certain hypothesis, what we should do next is estimate the parameters of the model. Both kinds of parameters from a mixture of Gaussians \(\Pr(\mathbf{x})= \sum_{k=1}^{K}\pi_k\mathcal{N}(\mathbf{x}|\mathbf{\mu}_k,\Sigma_k)\): - the parameters of Gaussian: \(\mathbf{\mu}_k,\Sigma_k\) - and latent variables: \(\mathbf{z}\)...

Mixtures of Gaussians

Preliminaries Probability Theory multiplication principle joint distribution the Bayesian theory Gaussian distribution Calculus 1,2 A Formal Introduction to Mixtures of Gaussians1 We have introduced a mixture distribution in the post ‘An Introduction to Mixture Models’. And the example in that post was just two components Gaussian Mixture. However, in this post, we would like to talk about Gaussian mixtures formally. And it severs to motivate the development of the expectation-maximization(EM) algorithm....

K-means Clustering

Preliminaries Numerical Optimization necessary conditions for maximum K-means algorithm Fisher Linear Discriminant Clustering Problem1 The first thing we should do before introducing the algorithm is to make the task clear. A mathematical form is usually the best way. Clustering is a kind of unsupervised learning task. So there is no correct or incorrect solution because there is no teacher or target in the task. Clustering is similar to classification during predicting since the output of clustering and classification are discrete....