machine learning

Boosting and AdaBoost

Preliminaries Committee Boosting The committee has an equal weight for every prediction from all models, and it gives little improvement than a single model. Then boosting was built for this problem. Boosting is a technique of combining multiple ‘base’ classifiers to produce a form of the committee that: performances better than any of the base classifiers and each base classifier has a different weight factor Adaboost Adaboost is short for adaptive boosting....

Committees

Preliminaries Basic machine learning concepts Probability Theory concepts expectation correlated random variable Analysis of Committees1 The committee is a native inspiration for how to combine several models(or we can say how to combine the outputs of several models). For example, we can combine all the models by: \[ y_{COM}(X)=\frac{1}{M}\sum_{m=1}^My_m(X)\tag{1} \] Then we want to find out whether this average prediction of models is better than every one of them....

Bayesian Model Averaging(BMA) and Combining Models

Preliminaries Bayesian Theorem Bayesian Model Averaging(BMA)1 Bayesian model averaging(BMA) is another wildly used method that is very like a combining model. However, the difference between BMA and combining models is also significant. A Bayesian model averaging is a Bayesian formula in which the random variable are models(hypothesizes) \(h=1,2,\cdots,H\) with prior probability \(\Pr(h)\), then the marginal distribution over data \(X\) is: \[ \Pr(X)=\sum_{h=1}^{H}\Pr(X|h)\Pr(h) \] And the MBA is used to select a model(hypothesis) that can model the data best through Bayesian theory....

An Introduction to Combining Models

Preliminaries ‘Mixtures of Gaussians’ Basic machine learning concepts Combining Models1 The mixture of Gaussians had been discussed in the post ‘Mixtures of Gaussians’. It was used to introduce the ‘EM algorithm’ but it gave us the inspiration of improving model performance. All models we have studied, besides neural networks, are all single-distribution models. That is just like that, to solve a problem we invite an expert who is very good at this kind of problem, then we just do whatever the expert said....

EM Algorithm

Preliminaries Gaussian distribution log-likelihood Calculus partial derivative Lagrange multiplier EM Algorithm for Gaussian Mixture1 Analysis Maximizing likelihood could not be used in the Gaussian mixture model directly, because of its severe defects which we have come across at ‘Maximum Likelihood of Gaussian Mixtures’. With the inspiration of K-means, a two-step algorithm was developed. The objective function is the log-likelihood function: \[ \begin{aligned} \ln \Pr(\mathbf{x}|\mathbf{\pi},\mathbf{\mu},\Sigma)&=\ln (\Pi_{n=1}^N\sum_{j=1}^{K}\pi_k\mathcal{N}(\mathbf{x}|\mathbf{\mu}_k,\Sigma_k))\\ &=\sum_{n=1}^{N}\ln \sum_{j=1}^{K}\pi_j\mathcal{N}(\mathbf{x}_n|\mathbf{\mu}_j,\Sigma_j)\\ \end{aligned}\tag{1} \]...