Anthony's Blogs
https://anthony-tan.com/
Recent content on Anthony's BlogsHugo -- gohugo.ioThu, 11 Aug 2022 19:21:20 +0800How to Read Papers
https://anthony-tan.com/how-to-read-papers/
Thu, 11 Aug 2022 19:21:20 +0800https://anthony-tan.com/how-to-read-papers/The following workflow is a conclusion of the video: from Mu Li of Amazon
flowchart TB next("another paper") subgraph p1 ["Pass 1(Selection)"] direction LR title[Title] abs[Abstract] conc[Conclusion] FaT["Figures and Tablets"] title--abs--conc-- FaT end p1.- dec{Decision}.-|"stop(low quality or\nlow relation)"| next dec.-|"go on"| p2 subgraph p2 ["Pass 2(Selection)"] direction TB int_cit["Introduction:\nSome important Citations" ] method_ME["Method:\nwithout Math and Engineering\nfocus on figures and tablets"] end p2.-dec2{Decision} dec2.-|"too hard to understand,\n go to one of\n the citations\nin introduction"Boosting and AdaBoost
https://anthony-tan.com/Boosting-and-AdaBoost/
Sat, 07 Mar 2020 15:40:46 +0000https://anthony-tan.com/Boosting-and-AdaBoost/Preliminaries Committee Boosting The committee has an equal weight for every prediction from all models, and it gives little improvement than a single model. Then boosting was built for this problem. Boosting is a technique of combining multiple ‘base’ classifiers to produce a form of the committee that:
performances better than any of the base classifiers and each base classifier has a different weight factor Adaboost Adaboost is short for adaptive boosting.Committees
https://anthony-tan.com/Committees/
Sat, 07 Mar 2020 13:55:21 +0000https://anthony-tan.com/Committees/Preliminaries Basic machine learning concepts Probability Theory concepts expectation correlated random variable Analysis of Committees1 The committee is a native inspiration for how to combine several models(or we can say how to combine the outputs of several models). For example, we can combine all the models by:
\[ y_{COM}(X)=\frac{1}{M}\sum_{m=1}^My_m(X)\tag{1} \]
Then we want to find out whether this average prediction of models is better than every one of them.Bayesian Model Averaging(BMA) and Combining Models
https://anthony-tan.com/Bayesian-Model-Averaging-and-Combining-Models/
Sat, 07 Mar 2020 13:10:39 +0000https://anthony-tan.com/Bayesian-Model-Averaging-and-Combining-Models/Preliminaries Bayesian Theorem Bayesian Model Averaging(BMA)1 Bayesian model averaging(BMA) is another wildly used method that is very like a combining model. However, the difference between BMA and combining models is also significant.
A Bayesian model averaging is a Bayesian formula in which the random variable are models(hypothesizes) \(h=1,2,\cdots,H\) with prior probability \(\Pr(h)\), then the marginal distribution over data \(X\) is:
\[ \Pr(X)=\sum_{h=1}^{H}\Pr(X|h)\Pr(h) \]
And the MBA is used to select a model(hypothesis) that can model the data best through Bayesian theory.An Introduction to Combining Models
https://anthony-tan.com/An-Introduction-to-Combining-Models/
Sat, 07 Mar 2020 12:04:00 +0000https://anthony-tan.com/An-Introduction-to-Combining-Models/Preliminaries ‘Mixtures of Gaussians’ Basic machine learning concepts Combining Models1 The mixture of Gaussians had been discussed in the post ‘Mixtures of Gaussians’. It was used to introduce the ‘EM algorithm’ but it gave us the inspiration of improving model performance.
All models we have studied, besides neural networks, are all single-distribution models. That is just like that, to solve a problem we invite an expert who is very good at this kind of problem, then we just do whatever the expert said.EM Algorithm
https://anthony-tan.com/EM-Algorithm/
Thu, 05 Mar 2020 20:04:15 +0000https://anthony-tan.com/EM-Algorithm/Preliminaries Gaussian distribution log-likelihood Calculus partial derivative Lagrange multiplier EM Algorithm for Gaussian Mixture1 Analysis Maximizing likelihood could not be used in the Gaussian mixture model directly, because of its severe defects which we have come across at ‘Maximum Likelihood of Gaussian Mixtures’. With the inspiration of K-means, a two-step algorithm was developed.
The objective function is the log-likelihood function:
\[ \begin{aligned} \ln \Pr(\mathbf{x}|\mathbf{\pi},\mathbf{\mu},\Sigma)&=\ln (\Pi_{n=1}^N\sum_{j=1}^{K}\pi_k\mathcal{N}(\mathbf{x}|\mathbf{\mu}_k,\Sigma_k))\\ &=\sum_{n=1}^{N}\ln \sum_{j=1}^{K}\pi_j\mathcal{N}(\mathbf{x}_n|\mathbf{\mu}_j,\Sigma_j)\\ \end{aligned}\tag{1} \]Maximum Likelihood of Gaussian Mixtures
https://anthony-tan.com/Maximum-Likelihood-of-Gaussian-Mixtures/
Thu, 05 Mar 2020 18:54:20 +0000https://anthony-tan.com/Maximum-Likelihood-of-Gaussian-Mixtures/Preliminaries Probability Theory multiplication principle joint distribution the Bayesian theory Gaussian distribution log-likelihood function ‘Maximum Likelihood Estimation’ Maximum Likelihood1 Gaussian mixtures had been discussed in ‘Mixtures of Gaussians’. And once we have a training data set and a certain hypothesis, what we should do next is estimate the parameters of the model. Both kinds of parameters from a mixture of Gaussians \(\Pr(\mathbf{x})= \sum_{k=1}^{K}\pi_k\mathcal{N}(\mathbf{x}|\mathbf{\mu}_k,\Sigma_k)\): - the parameters of Gaussian: \(\mathbf{\mu}_k,\Sigma_k\) - and latent variables: \(\mathbf{z}\)Mixtures of Gaussians
https://anthony-tan.com/Mixtures-of-Gaussians/
Thu, 05 Mar 2020 16:05:50 +0000https://anthony-tan.com/Mixtures-of-Gaussians/Preliminaries Probability Theory multiplication principle joint distribution the Bayesian theory Gaussian distribution Calculus 1,2 A Formal Introduction to Mixtures of Gaussians1 We have introduced a mixture distribution in the post ‘An Introduction to Mixture Models’. And the example in that post was just two components Gaussian Mixture. However, in this post, we would like to talk about Gaussian mixtures formally. And it severs to motivate the development of the expectation-maximization(EM) algorithm.K-means Clustering
https://anthony-tan.com/K-means-Clustering/
Wed, 04 Mar 2020 22:08:03 +0000https://anthony-tan.com/K-means-Clustering/Preliminaries Numerical Optimization necessary conditions for maximum K-means algorithm Fisher Linear Discriminant Clustering Problem1 The first thing we should do before introducing the algorithm is to make the task clear. A mathematical form is usually the best way.
Clustering is a kind of unsupervised learning task. So there is no correct or incorrect solution because there is no teacher or target in the task. Clustering is similar to classification during predicting since the output of clustering and classification are discrete.An Introduction to Mixture Models
https://anthony-tan.com/An-Introduction-to-Mixture-Models/
Wed, 04 Mar 2020 19:30:08 +0000https://anthony-tan.com/An-Introduction-to-Mixture-Models/Preliminaries linear regression Maximum Likelihood Estimation Gaussian Distribution Conditional Distribution From Supervised to Unsupervised Learning1 We have discussed many machine learning algorithms, including linear regression, linear classification, neural network models, and e.t.c, till now. However, most of them are supervised learning, which means a teacher is leading the models to bias toward a certain task. In these problems our attention was on the probability distribution of parameters given inputs, outputs, and models:Logistic Regression
https://anthony-tan.com/Logistic-Regression/
Thu, 20 Feb 2020 21:02:47 +0000https://anthony-tan.com/Logistic-Regression/Preliminaries ‘An Introduction to Probabilistic Generative Models for Linear Classification’ Idea of logistic regression1 Logistic sigmoid function(logistic function for short) had been introduced in post ‘An Introduction to Probabilistic Generative Models for Linear Classification’. It has an elegant form:
\[ \delta(a)=\frac{1}{1+e^{-a}}\tag{1} \]
and when \(a=0\), \(\delta(a)=\frac{1}{2}\) and this is just the half of the range of logistic function. This gives us a strong implication that we can set \(a\) equals to some functions \(y(\mathbf{x})\), and thenAn Introduction to Probabilistic Generative Models
https://anthony-tan.com/An-Introduction-to-Probabilistic-Generative-Models/
Thu, 20 Feb 2020 16:13:30 +0000https://anthony-tan.com/An-Introduction-to-Probabilistic-Generative-Models/Preliminaries Probability Bayesian Formular Calculus Probabilistic Generative Models1 The generative model used for making decisions contains an inference step and a decision step:
Inference step is to calculate \(\Pr(\mathcal{C}_k|\mathbf{x})\) which means the probability of \(\mathbf{x}\) belonging to the class \(\mathcal{C}_k\) given \(\mathbf{x}\) Decision step is to make a decision based on \(\Pr(\mathcal{C}_k|\mathbf{x})\) which was calculated in step 1 In this post, we just give an introduction and a framework for the probabilistic generative model in classification.Fisher Linear Discriminant(LDA)
https://anthony-tan.com/Fisher-Linear-Discriminant/
Wed, 19 Feb 2020 17:01:38 +0000https://anthony-tan.com/Fisher-Linear-Discriminant/Preliminaries linear algebra inner multiplication projection Idea of Fisher linear discriminant1 ‘Least-square method’ in classification can only deal with a small set of tasks. That is because it was designed for the regression task. Then we come to the famous Fisher linear discriminant. This method is also discriminative for it gives directly the class to which the input \(\mathbf{x}\) belongs. Assuming that the linear function
\[ y=\mathbf{w}^T\mathbf{x}+w_0\tag{1} \]Discriminant Functions and Decision Boundary
https://anthony-tan.com/Discriminant-Functions-and-Decision-Boundary/
Mon, 17 Feb 2020 16:15:28 +0000https://anthony-tan.com/Discriminant-Functions-and-Decision-Boundary/Preliminaries convex definition linear algebra vector length vector direction Discriminant Function in Classification The discriminant function or discriminant model is on the other side of the generative model. And we, here, have a look at the behavior of the discriminant function in linear classification.1
In the post ‘Least Squares Classification’, we have seen, in a linear classification task, the decision boundary is a line or hyperplane by which we separate two classes.Least Squares in Classification
https://anthony-tan.com/Least-Squares-in-Classification/
Mon, 17 Feb 2020 12:39:31 +0000https://anthony-tan.com/Least-Squares-in-Classification/Preliminaries A Simple Linear Regression Least Squares Estimation From Linear Regression to Linear Classification pseudo-inverse Least Squares for Classification1 Least-squares for linear regression had been talked about in ‘Simple Linear Regression’. And in this post, we want to find out whether this powerful algorithm can be used in classification.
Recalling the distinction between the properties of classification and regression, two points need to be emphasized again(‘From Linear Regression to Linear Classification’):From Linear Regression to Linear Classification
https://anthony-tan.com/From-Linear-Regression-to-Linear-Classification/
Mon, 17 Feb 2020 11:20:11 +0000https://anthony-tan.com/From-Linear-Regression-to-Linear-Classification/Preliminaries An Introduction to Linear Regression A Simple Linear Regression Bayesian theorem Feature extraction Recall Linear Regression The goal of a regression problem is to find out a function or hypothesis that given an input \(\mathbf{x}\), it can make a prediction \(\hat{y}\) to estimate the target. Both the target \(y\) and prediction \(\hat{y}\) here are continuous. They have the properties of numbers1:
Consider 3 inputs \(\mathbf{x}_1\), \(\mathbf{x}_2\) and \(\mathbf{x}_3\) and their coresponding targets are \(y_1=0\), \(y_2=1\) and \(y_3=2\).Polynomial Regression and Features-Extension of Linear Regression
https://anthony-tan.com/Polynomial-Regression-and-Features-Extension-of-Linear-Regression/
Sat, 15 Feb 2020 22:00:40 +0000https://anthony-tan.com/Polynomial-Regression-and-Features-Extension-of-Linear-Regression/Priliminaries A Simple Linear Regression Least Squares Estimation Extending Linear Regression with Features1 The original linear regression is in the form:
\[ \begin{aligned} y(\mathbf{x})&= b + \mathbf{w}^T \mathbf{x}\\ &=w_01 + w_1x_1+ w_2x_2+\cdots + w_{m+1}x_{m+1} \end{aligned}\tag{1} \]
where the input vector \(\mathbf{x}\) and parameter \(\mathbf{w}\) are \(m\)-dimension vectors whose first components are \(1\) and bias \(w_0=b\) respectively. This equation is linear for both the input vector and parameter vector. Then an idea come to us, if we set \(x_i=\phi_i(\mathbf{x})\) then equation (1) convert to:Maximum Likelihood Estimation
https://anthony-tan.com/Maximum-Likelihood-Estimation/
Sat, 15 Feb 2020 00:41:25 +0000https://anthony-tan.com/Maximum-Likelihood-Estimation/Priliminaries A Simple Linear Regression Least Squares Estimation linear algebra Square Loss Function for Regression1 For any input \(\mathbf{x}\), our goal in a regression task is to give a prediction \(\hat{y}=f(\mathbf{x})\) to approximate target \(t\) where the function \(f(\cdot)\) is the chosen hypothesis or model as mentioned in the post https://anthony-tan.com/A-Simple-Linear-Regression/.
The difference between \(t\) and \(\hat{y}\) can be called ‘error’ or more precisely ‘loss’. Because in an approximation task, ‘error’ occurs by chance and always exists, and ‘loss’ is a good word to represent the difference.Least Squares Estimation
https://anthony-tan.com/Least-Squares-Estimation/
Fri, 14 Feb 2020 11:33:36 +0000https://anthony-tan.com/Least-Squares-Estimation/Priliminaries A Simple Linear Regression the column space Another Example of Linear Regression 1 In the blog A Simple Linear Regression, squares of the difference between the output of a predictor and the target were used as a loss function in a regression problem. And it could be also written as:
\[ \ell(\hat{\mathbf{y}}_i,\mathbf{y}_i)=(\hat{\mathbf{y}}_i-\mathbf{y}_i)^T(\hat{\mathbf{y}}_i-\mathbf{y}_i) \tag{1} \]
The linear regression model in a matrix form is:
\[ y=\mathbf{w}^T\mathbf{x}+\mathbf{b}\tag{2} \]
What we do in this post is analyze the least-squares methods from two different viewpointsDrawbacks of Backpropagation
https://anthony-tan.com/Drawbacks-of-Backpropagation/
Tue, 07 Jan 2020 10:14:53 +0000https://anthony-tan.com/Drawbacks-of-Backpropagation/Preliminaries ‘An Introduction to Backpropagation and Multilayer Perceptrons’ ‘The Backpropagation Algorithm’ Speed Backpropagation up 1 BP algorithm has been described in ‘An Introduction to Backpropagation and Multilayer Perceptrons’. And the implementation of the BP algorithm has been recorded at ‘The Backpropagation Algorithm’. BP has worked in many applications for many years, but there are too many drawbacks in the process. The basic BP algorithm is too slow for most practical applications that it might take days or even weeks in training.Backpropagation, Batch Training, and Incremental Training
https://anthony-tan.com/Backpropagation-Batch-Training-and-Incremental-Training/
Thu, 02 Jan 2020 17:49:55 +0000https://anthony-tan.com/Backpropagation-Batch-Training-and-Incremental-Training/Preliminaries Calculus 1,2 Linear Algebra Batch v.s. Incremental Training1 In both LMS and BP algorithms, the error in each update process step is not MSE but SE \(e=t_i-a_i\) which is calculated just by a data point of the training set. This is called a stochastic gradient descent algorithm. And why it is called ‘stochastic’ is because error at every iterative step is approximated by randomly selected train data points but not the whole data set.The Backpropagation Algorithm
https://anthony-tan.com/The-Backpropagation-Algorithm/
Wed, 01 Jan 2020 14:26:55 +0000https://anthony-tan.com/The-Backpropagation-Algorithm/Preliminaries An Introduction to Backpropagation and Multilayer Perceptrons Culculus 1,2 Linear algebra Jacobian matrix Architecture and Notations1 We have seen a three-layer network is flexible in approximating functions(An Introduction to Backpropagation and Multilayer Perceptrons). If we had a more-than-three-layer network, it could be used to approximate any functions as accurately as we want. However, another trouble that came to us is the learning rules. This problem almost killed neural networks in the 1970s.An Introduction to Backpropagation and Multilayer Perceptrons
https://anthony-tan.com/An-Introduction-to-Backpropagation-and-Multilayer-Perceptrons/
Tue, 31 Dec 2019 10:29:33 +0000https://anthony-tan.com/An-Introduction-to-Backpropagation-and-Multilayer-Perceptrons/Preliminaries Performance learning Perceptron learning rule Supervised Hebbian learning LMS Form LMS to Backpropagation1 The LMS algorithm is a kind of ‘performance learning’. And we have studied several learning rules(algorithms) till now, such as ‘Perceptron learning rule’ and ‘Supervised Hebbian learning’. And they were based on the idea of the physical mechanism of biological neuron networks.
Then performance learning was represented. Because of its outstanding performance, we go further and further away from natural intelligence into performance learning.Widrow-Hoff Learning
https://anthony-tan.com/Widrow-Hoff-Learning/
Mon, 23 Dec 2019 18:51:59 +0000https://anthony-tan.com/Widrow-Hoff-Learning/Preliminaries ‘Performance Surfaces and Optimum Points’ Linear algebra stochastic approximation Probability Theory ADALINE, LMS, and Widrow-Hoff learning1 Performance learning had been discussed. But we have not used it in any neural network. In this post, we talk about an important application of performance learning. And this new neural network was invented by Frank Widrow and his graduate student Marcian Hoff in 1960. It was almost the same time as Perceptron was developed which had been discussed in ‘Perceptron Learning Rule’.Conjugate Gradient
https://anthony-tan.com/Conjugate-Gradient/
Sat, 21 Dec 2019 13:40:24 +0000https://anthony-tan.com/Conjugate-Gradient/Preliminaries ‘steepest descent method’ “Newton’s method” Conjugate Gradient1 We have learned ‘steepest descent method’ and “Newton’s method”. The main advantage of Newton’s method is the speed, it converges quickly. And the main advantage of the steepest descent method guarantees to converge to a local minimum. But the limit of Newton’s method is that it needs too many resources for both computation and storage when the number of parameters is large.Newton's Method
https://anthony-tan.com/Newton_s-Method/
Sat, 21 Dec 2019 11:39:56 +0000https://anthony-tan.com/Newton_s-Method/Preliminaries ‘steepest descent algorithm’ Linear Algebra Calculus 1,2 Newton’s Method1 Taylor series gives us the conditions for minimum points based on both first-order items and the second-order item. And first-order item approximation of a performance index function produced a powerful algorithm for locating the minimum points which we call ‘steepest descent algorithm’.
Now we want to have an insight into the second-order approximation of a function to find out whether there is an algorithm that can also work as a guide to the minimum points.Steepest Descent Method
https://anthony-tan.com/Steepest-Descent-Method/
Fri, 20 Dec 2019 11:39:19 +0000https://anthony-tan.com/Steepest-Descent-Method/Preliminaries ‘An Introduction to Performance Optimization’ Linear algebra Calculus 1,2 Direction Based Algorithm and a Variation1 This post describes a direction searching algorithm(\(\mathbf{x}_{k}\)). And its variation gives a way to estimate step length (\(\alpha_k\)).
Steepest Descent To find the minimum points of a performance index by an iterative algorithm, we want to decrease the value of the performance index step by step which looks like going down from the top of the hill.An Introduction to Performance Optimization
https://anthony-tan.com/An-Introduction-to-Performance-Optimization/
Fri, 20 Dec 2019 11:38:50 +0000https://anthony-tan.com/An-Introduction-to-Performance-Optimization/Preliminaries Nothing Performance Optimization1 Taylor series had been used for analyzing the performance surface and locating the optimum points of a certain performance index. This short post is a brief introduction to performance optimization and the following posts are the samples of three optimization algorithms categories:
‘Steepest Descent’ “Newton’s Method” ‘Conjugate Gradient’ Recall the analysis of the performance index, which is a function of the parameters of the model.Quadratic Functions
https://anthony-tan.com/Quadratic-Functions/
Thu, 19 Dec 2019 15:45:37 +0000https://anthony-tan.com/Quadratic-Functions/Preliminaries Linear algebra Calculus 1,2 Taylor series Quadratic Functions1 Quadratic function, a type of performance index, is universal. One of its key properties is that it can be represented in a second-order Taylor series precisely.
\[ F(\mathbf{x})=\frac{1}{2}\mathbf{x}^TA\mathbf{x}+\mathbf{d}\mathbf{x}+c\tag{1} \]
where \(A\) is a symmetric matrix(if it is not symmetric, it can be easily converted into symmetric). And recall the property of gradient:
\[ \nabla (\mathbf{h}^T\mathbf{x})=\nabla (\mathbf{x}^T\mathbf{h})=\mathbf{h}\tag{2} \]
and
\[ \nabla (\mathbf{x}^TQ\mathbf{x})=Q\mathbf{x}+Q^T\mathbf{x}=2Q\mathbf{x}\tag{3} \]Performance Surfaces and Optimum Points
https://anthony-tan.com/Performance-Surfaces-and-Optimum-Points/
Thu, 19 Dec 2019 08:57:53 +0000https://anthony-tan.com/Performance-Surfaces-and-Optimum-Points/Preliminaries Perceptron learning algorithm Hebbian learning algorithm Linear algebra Neural Network Training Technique1 Several architectures of the neural networks had been introduced. And each neural network had its own learning rule, like, the perceptron learning algorithm, and the Hebbian learning algorithm. When more and more neural network architectures were designed, some general training methods were necessary. Up to now, we can classify all training rules in three categories in a general way:Supervised Hebbian Learning
https://anthony-tan.com/Supervised-Hebbian-Learning/
Tue, 17 Dec 2019 18:24:40 +0000https://anthony-tan.com/Supervised-Hebbian-Learning/Preliminaries Linear algebra Hebb Rule1 Hebb rule is one of the earliest neural network learning laws. It was published in 1949 by Donald O. Hebb, a Canadian psychologist, in his work ’ The Organization of Behavior’. In this great book, he proposed a possible mechanism for synaptic modification in the brain. And this rule then was used in training the artificial neural networks for pattern recognition.
’ The Organization of Behavior’ The main premise of the book is that behavior could be explained by the action of a neuron.Implement of Perceptron
https://anthony-tan.com/Implement-of-Perceptron/
Thu, 12 Dec 2019 13:10:28 +0000https://anthony-tan.com/Implement-of-Perceptron/Preliminaries An Introduction to Neural Networks Neuron Model and Network Architecture Perceptron Learning Rule Implement of Perceptron1 What we need to do next is to implement the algorithm described in ‘Perceptron Learning Rule’ and observe the effect of 1. different parameters, 2. different training sets, 3. and different transfer functions.
A single neuron perceptron consists of a linear combination and a threshold operation simply. So we note its capacity is close to a linear classification.Learning Rules and Perceptron Learning Rule
https://anthony-tan.com/Learning-Rules-and-Perceptron-Learning-Rule/
Wed, 11 Dec 2019 21:30:42 +0000https://anthony-tan.com/Learning-Rules-and-Perceptron-Learning-Rule/Preliminaries supervised learning unsupervised learning reinforcement learning ‘An Introduction to Neural Networks’ Learning Rules1 We have built some neural network models in the post ‘An Introduction to Neural Networks’ and as we know architectures and learning rules are two main aspects of designing a useful network. The architectures we have introduced could not be used yet. What we are going to do is to investigate the learning rules for different architectures.Neuron Model and Network Architecture
https://anthony-tan.com/Neuron-Model-and-Network-Architecture/
Tue, 10 Dec 2019 10:54:57 +0000https://anthony-tan.com/Neuron-Model-and-Network-Architecture/Preliminaries linear classifier An Introduction to Neural Networks Theory and Notation1 We are not able to build any artificial cells up to now. It seems impossible to build a neuron network through biological materials manually, either. To investigate the ability of neurons we have built mathematical models of the neuron. These models have been assigned a number of neuron-like properties. However, there must be a balance between the number of properties contained by the mathematical models and the current computational abilities of the machines.An Introduction to Neural Networks
https://anthony-tan.com/An-Introduction-to-Neural-Networks/
Sun, 08 Dec 2019 19:01:32 +0000https://anthony-tan.com/An-Introduction-to-Neural-Networks/Preliminaries Nothing Neural Networks1 Neural Networks are a model of our brain that is built with neurons and is considered the source of intelligence. There are almost \(10^{11}\) neurons in the human brain and \(10^4\) connections of each neuron to other neurons. Some of these brilliant structures were given when we were born. Some other structures could be established by experience, and this progress is called learning. Learning is also considered as the establishment or modification of the connections between neurons.A Simple Linear Regression
https://anthony-tan.com/A-Simple-Linear-Regression/
Fri, 11 Oct 2019 20:35:27 +0000https://anthony-tan.com/A-Simple-Linear-Regression/Preliminaries Linear Algebra(the concepts of space, vector) Calculus An Introduction to Linear Regression Notations of Linear Regression1 We have already created a simple linear model in the post “An Introduction to Linear Regression”. According to the definition of linearity, we can develop the simplest linear regression model:
\[ Y\sim w_1X+w_0\tag{1} \]
where the symbol \(\sim\) is read as “is approximately modeled as”. Equation (1) can also be described as “regressing \(Y\) on \(X\)(or \(Y\) onto \(X\))”.An Introduction to Linear Regression
https://anthony-tan.com/An-Introduction-to-Linear-Regression/
Wed, 09 Oct 2019 18:36:40 +0000https://anthony-tan.com/An-Introduction-to-Linear-Regression/Preliminariess Linear Algebra(the concepts of space, vector) Calculus What is Linear Regression Linear regression is a basic idea in statistical and machine learning based on the linear combination. And it was usually used to predict some responses to some inputs(predictors).
Machine Learning and Statistical Learning Machine learning and statistical learning are similar but have some distinctions. In machine learning, models, regression models, or classification models, are used to predict the outputs of the new incoming inputs.
https://anthony-tan.com/about/
Mon, 01 Jan 0001 00:00:00 +0000https://anthony-tan.com/about/About Me Hi, I’m Anthony Tan(谭升). I am now living in Shenzhen, China. I’m a full-time computer vision algorithm engineer and a part-time individual reinforcement learning researcher. I have had a great interest in artificial intelligence since I watched the movie “Iron man” when I was a middle school student. And to get deeper into these subjects, I’d like to apply for a Ph.D. project on reinforcement learning in the following years.