Boosting and AdaBoost

Preliminaries Committee Boosting The committee has an equal weight for every prediction from all models, and it gives little improvement than a single model. Then boosting was built for this problem. Boosting is a technique of combining multiple ‘base’ classifiers to produce a form of the committee that: performances better than any of the base classifiers and each base classifier has a different weight factor Adaboost Adaboost is short for adaptive boosting....

March 7, 2020 · (Last Modification: August 4, 2022) · Anthony Tan

Committees

Preliminaries Basic machine learning concepts Probability Theory concepts expectation correlated random variable Analysis of Committees1 The committee is a native inspiration for how to combine several models(or we can say how to combine the outputs of several models). For example, we can combine all the models by: \[ y_{COM}(X)=\frac{1}{M}\sum_{m=1}^My_m(X)\tag{1} \] Then we want to find out whether this average prediction of models is better than every one of them....

March 7, 2020 · (Last Modification: April 28, 2022) · Anthony Tan

Bayesian Model Averaging(BMA) and Combining Models

Preliminaries Bayesian Theorem Bayesian Model Averaging(BMA)1 Bayesian model averaging(BMA) is another wildly used method that is very like a combining model. However, the difference between BMA and combining models is also significant. A Bayesian model averaging is a Bayesian formula in which the random variable are models(hypothesizes) \(h=1,2,\cdots,H\) with prior probability \(\Pr(h)\), then the marginal distribution over data \(X\) is: \[ \Pr(X)=\sum_{h=1}^{H}\Pr(X|h)\Pr(h) \] And the MBA is used to select a model(hypothesis) that can model the data best through Bayesian theory....

March 7, 2020 · (Last Modification: April 28, 2022) · Anthony Tan

An Introduction to Combining Models

Preliminaries ‘Mixtures of Gaussians’ Basic machine learning concepts Combining Models1 The mixture of Gaussians had been discussed in the post ‘Mixtures of Gaussians’. It was used to introduce the ‘EM algorithm’ but it gave us the inspiration of improving model performance. All models we have studied, besides neural networks, are all single-distribution models. That is just like that, to solve a problem we invite an expert who is very good at this kind of problem, then we just do whatever the expert said....

March 7, 2020 · (Last Modification: April 28, 2022) · Anthony Tan

EM Algorithm

Preliminaries Gaussian distribution log-likelihood Calculus partial derivative Lagrange multiplier EM Algorithm for Gaussian Mixture1 Analysis Maximizing likelihood could not be used in the Gaussian mixture model directly, because of its severe defects which we have come across at ‘Maximum Likelihood of Gaussian Mixtures’. With the inspiration of K-means, a two-step algorithm was developed. The objective function is the log-likelihood function: \[ \begin{aligned} \ln \Pr(\mathbf{x}|\mathbf{\pi},\mathbf{\mu},\Sigma)&=\ln (\Pi_{n=1}^N\sum_{j=1}^{K}\pi_k\mathcal{N}(\mathbf{x}|\mathbf{\mu}_k,\Sigma_k))\\ &=\sum_{n=1}^{N}\ln \sum_{j=1}^{K}\pi_j\mathcal{N}(\mathbf{x}_n|\mathbf{\mu}_j,\Sigma_j)\\ \end{aligned}\tag{1} \]...

March 5, 2020 · (Last Modification: April 30, 2022) · Anthony Tan

Maximum Likelihood of Gaussian Mixtures

Preliminaries Probability Theory multiplication principle joint distribution the Bayesian theory Gaussian distribution log-likelihood function ‘Maximum Likelihood Estimation’ Maximum Likelihood1 Gaussian mixtures had been discussed in ‘Mixtures of Gaussians’. And once we have a training data set and a certain hypothesis, what we should do next is estimate the parameters of the model. Both kinds of parameters from a mixture of Gaussians \(\Pr(\mathbf{x})= \sum_{k=1}^{K}\pi_k\mathcal{N}(\mathbf{x}|\mathbf{\mu}_k,\Sigma_k)\): - the parameters of Gaussian: \(\mathbf{\mu}_k,\Sigma_k\) - and latent variables: \(\mathbf{z}\)...

March 5, 2020 · (Last Modification: April 28, 2022) · Anthony Tan

Mixtures of Gaussians

Preliminaries Probability Theory multiplication principle joint distribution the Bayesian theory Gaussian distribution Calculus 1,2 A Formal Introduction to Mixtures of Gaussians1 We have introduced a mixture distribution in the post ‘An Introduction to Mixture Models’. And the example in that post was just two components Gaussian Mixture. However, in this post, we would like to talk about Gaussian mixtures formally. And it severs to motivate the development of the expectation-maximization(EM) algorithm....

March 5, 2020 · (Last Modification: April 28, 2022) · Anthony Tan

K-means Clustering

Preliminaries Numerical Optimization necessary conditions for maximum K-means algorithm Fisher Linear Discriminant Clustering Problem1 The first thing we should do before introducing the algorithm is to make the task clear. A mathematical form is usually the best way. Clustering is a kind of unsupervised learning task. So there is no correct or incorrect solution because there is no teacher or target in the task. Clustering is similar to classification during predicting since the output of clustering and classification are discrete....

March 4, 2020 · (Last Modification: April 28, 2022) · Anthony Tan

An Introduction to Mixture Models

Preliminaries linear regression Maximum Likelihood Estimation Gaussian Distribution Conditional Distribution From Supervised to Unsupervised Learning1 We have discussed many machine learning algorithms, including linear regression, linear classification, neural network models, and e.t.c, till now. However, most of them are supervised learning, which means a teacher is leading the models to bias toward a certain task. In these problems our attention was on the probability distribution of parameters given inputs, outputs, and models:...

March 4, 2020 · (Last Modification: April 28, 2022) · Anthony Tan

Logistic Regression

Preliminaries ‘An Introduction to Probabilistic Generative Models for Linear Classification’ Idea of logistic regression1 Logistic sigmoid function(logistic function for short) had been introduced in post ‘An Introduction to Probabilistic Generative Models for Linear Classification’. It has an elegant form: \[ \delta(a)=\frac{1}{1+e^{-a}}\tag{1} \] and when \(a=0\), \(\delta(a)=\frac{1}{2}\) and this is just the half of the range of logistic function. This gives us a strong implication that we can set \(a\) equals to some functions \(y(\mathbf{x})\), and then...

February 20, 2020 · (Last Modification: April 28, 2022) · Anthony Tan