Preliminaries ‘An Introduction to Backpropagation and Multilayer Perceptrons’ ‘The Backpropagation Algorithm’ Speed Backpropagation up 1 BP algorithm has been described in ‘An Introduction to Backpropagation and Multilayer Perceptrons’. And the implementation of the BP algorithm has been recorded at ‘The Backpropagation Algorithm’. BP has worked in many applications for many years, but there are too many drawbacks in the process. The basic BP algorithm is too slow for most practical applications that it might take days or even weeks in training....

Preliminaries Calculus 1,2 Linear Algebra Batch v.s. Incremental Training1 In both LMS and BP algorithms, the error in each update process step is not MSE but SE \(e=t_i-a_i\) which is calculated just by a data point of the training set. This is called a stochastic gradient descent algorithm. And why it is called ‘stochastic’ is because error at every iterative step is approximated by randomly selected train data points but not the whole data set....

Preliminaries An Introduction to Backpropagation and Multilayer Perceptrons Culculus 1,2 Linear algebra Jacobian matrix Architecture and Notations1 We have seen a three-layer network is flexible in approximating functions(An Introduction to Backpropagation and Multilayer Perceptrons). If we had a more-than-three-layer network, it could be used to approximate any functions as accurately as we want. However, another trouble that came to us is the learning rules. This problem almost killed neural networks in the 1970s....

Preliminaries Performance learning Perceptron learning rule Supervised Hebbian learning LMS Form LMS to Backpropagation1 The LMS algorithm is a kind of ‘performance learning’. And we have studied several learning rules(algorithms) till now, such as ‘Perceptron learning rule’ and ‘Supervised Hebbian learning’. And they were based on the idea of the physical mechanism of biological neuron networks.
Then performance learning was represented. Because of its outstanding performance, we go further and further away from natural intelligence into performance learning....

Preliminaries ‘Performance Surfaces and Optimum Points’ Linear algebra stochastic approximation Probability Theory ADALINE, LMS, and Widrow-Hoff learning1 Performance learning had been discussed. But we have not used it in any neural network. In this post, we talk about an important application of performance learning. And this new neural network was invented by Frank Widrow and his graduate student Marcian Hoff in 1960. It was almost the same time as Perceptron was developed which had been discussed in ‘Perceptron Learning Rule’....

Preliminaries ‘steepest descent method’ “Newton’s method” Conjugate Gradient1 We have learned ‘steepest descent method’ and “Newton’s method”. The main advantage of Newton’s method is the speed, it converges quickly. And the main advantage of the steepest descent method guarantees to converge to a local minimum. But the limit of Newton’s method is that it needs too many resources for both computation and storage when the number of parameters is large....

Preliminaries ‘steepest descent algorithm’ Linear Algebra Calculus 1,2 Newton’s Method1 Taylor series gives us the conditions for minimum points based on both first-order items and the second-order item. And first-order item approximation of a performance index function produced a powerful algorithm for locating the minimum points which we call ‘steepest descent algorithm’.
Now we want to have an insight into the second-order approximation of a function to find out whether there is an algorithm that can also work as a guide to the minimum points....

Preliminaries ‘An Introduction to Performance Optimization’ Linear algebra Calculus 1,2 Direction Based Algorithm and a Variation1 This post describes a direction searching algorithm(\(\mathbf{x}_{k}\)). And its variation gives a way to estimate step length (\(\alpha_k\)).
Steepest Descent To find the minimum points of a performance index by an iterative algorithm, we want to decrease the value of the performance index step by step which looks like going down from the top of the hill....

Preliminaries Nothing Performance Optimization1 Taylor series had been used for analyzing the performance surface and locating the optimum points of a certain performance index. This short post is a brief introduction to performance optimization and the following posts are the samples of three optimization algorithms categories:
‘Steepest Descent’ “Newton’s Method” ‘Conjugate Gradient’ Recall the analysis of the performance index, which is a function of the parameters of the model....

Preliminaries Linear algebra Calculus 1,2 Taylor series Quadratic Functions1 Quadratic function, a type of performance index, is universal. One of its key properties is that it can be represented in a second-order Taylor series precisely.
\[ F(\mathbf{x})=\frac{1}{2}\mathbf{x}^TA\mathbf{x}+\mathbf{d}\mathbf{x}+c\tag{1} \]
where \(A\) is a symmetric matrix(if it is not symmetric, it can be easily converted into symmetric). And recall the property of gradient:
\[ \nabla (\mathbf{h}^T\mathbf{x})=\nabla (\mathbf{x}^T\mathbf{h})=\mathbf{h}\tag{2} \]
and
\[ \nabla (\mathbf{x}^TQ\mathbf{x})=Q\mathbf{x}+Q^T\mathbf{x}=2Q\mathbf{x}\tag{3} \]...