Machine Learning

Maximum Likelihood of Gaussian Mixtures

Preliminaries Probability Theory multiplication principle joint distribution the Bayesian theory Gaussian distribution log-likelihood function ‘Maximum Likelihood Estimation’ Maximum Likelihood1 Gaussian mixtures had been discussed in ‘Mixtures of Gaussians’. And once we have a training data set and a certain hypothesis, what we should do next is estimate the parameters of the model. Both kinds of parameters from a mixture of Gaussians \(\Pr(\mathbf{x})= \sum_{k=1}^{K}\pi_k\mathcal{N}(\mathbf{x}|\mathbf{\mu}_k,\Sigma_k)\): - the parameters of Gaussian: \(\mathbf{\mu}_k,\Sigma_k\) - and latent variables: \(\mathbf{z}\)...

Mixtures of Gaussians

Preliminaries Probability Theory multiplication principle joint distribution the Bayesian theory Gaussian distribution Calculus 1,2 A Formal Introduction to Mixtures of Gaussians1 We have introduced a mixture distribution in the post ‘An Introduction to Mixture Models’. And the example in that post was just two components Gaussian Mixture. However, in this post, we would like to talk about Gaussian mixtures formally. And it severs to motivate the development of the expectation-maximization(EM) algorithm....

K-means Clustering

Preliminaries Numerical Optimization necessary conditions for maximum K-means algorithm Fisher Linear Discriminant Clustering Problem1 The first thing we should do before introducing the algorithm is to make the task clear. A mathematical form is usually the best way. Clustering is a kind of unsupervised learning task. So there is no correct or incorrect solution because there is no teacher or target in the task. Clustering is similar to classification during predicting since the output of clustering and classification are discrete....

An Introduction to Mixture Models

Preliminaries linear regression Maximum Likelihood Estimation Gaussian Distribution Conditional Distribution From Supervised to Unsupervised Learning1 We have discussed many machine learning algorithms, including linear regression, linear classification, neural network models, and e.t.c, till now. However, most of them are supervised learning, which means a teacher is leading the models to bias toward a certain task. In these problems our attention was on the probability distribution of parameters given inputs, outputs, and models:...

Logistic Regression

Preliminaries ‘An Introduction to Probabilistic Generative Models for Linear Classification’ Idea of logistic regression1 Logistic sigmoid function(logistic function for short) had been introduced in post ‘An Introduction to Probabilistic Generative Models for Linear Classification’. It has an elegant form: \[ \delta(a)=\frac{1}{1+e^{-a}}\tag{1} \] and when \(a=0\), \(\delta(a)=\frac{1}{2}\) and this is just the half of the range of logistic function. This gives us a strong implication that we can set \(a\) equals to some functions \(y(\mathbf{x})\), and then...