1.Using neural nets to recognize handwritten digits
Micheal Nielsen, Jan 2016
Humans recognize digits effortlessly because we humans have primary visual cortex with millions of connections between them but its not the case for the machines. Making a machine to recognize digits by writing a algorithm is not an easy task. Neural networks try to train and learn to recognize digits. This can be done by making use of a neuron it can be either a perceptron or a sigmoid neuron, then a learning algorithm called Stochastic Gradient Descent(SGD). To make the neural network work efficiently we need to focus on number of parameters including the type of neuron and implement new ideas to minimize the cost function.
Types of Neuron:
Peceptron: It takes several binary inputs and produces a single binary output. The weights are expressed in real numbers and the output of the neuron is either a 1 or 0. The neuron outputs 0 if the value is less than the threshold value and outputs 1 if the value is above the threshold value.
Sigmoid: A small change is made in assigning weights to the network which indeed causes a change in the output. Unlike perceptrons the inputs can take any values and not just 0 or 1 and the output is not just binary number.
Introduction to the architecture of the NN:
It usually includes three layers: first layer called the input layer, middle layer is the hidden layer and the last layer is the output layer. The network with multiple hidden layers are called MultiLayer Perceptrons(MLP) regardless of using sigmoid neurons. Networks with this kind of many layer structure are called deep neural networks.
Stochastic Gradient Descent Algorithm:
To learn or train a network, a training data set is needed. The output from the network is approximated for all training inputs which is defined by a cost function. This cost function is also known as Mean Square Error(MSE). The training algorithm called gradient descent aims to keep the cost function as small as possible. The cost function helps to figure out the performance of the network. High cost function results in poor performance of the network so it is necessary to keep the cost function as small as possible. To make the training algorithm work correctly, learning rate (eta) has to be small enough. If eta is too small then the gradient descent algorithm works slowly.
To compute the gradient, the gradients has to be computed separately for each input which will long time if the training input is very large and learning becomes slow. To speed up the learning, stochastic gradient descent is used. SGD algorithm computes the gradient for a small sample of randomly chosen training inputs. By averaging a good estimate of the gradient can be obtained which in turn speeds up the learning. Backpropagation algorithm computes the gradient of the cost function faster.
By properly selecting the parameters the neural networks can train effectively with high performance and low cost.
Types of Neuron:
Peceptron: It takes several binary inputs and produces a single binary output. The weights are expressed in real numbers and the output of the neuron is either a 1 or 0. The neuron outputs 0 if the value is less than the threshold value and outputs 1 if the value is above the threshold value.
Sigmoid: A small change is made in assigning weights to the network which indeed causes a change in the output. Unlike perceptrons the inputs can take any values and not just 0 or 1 and the output is not just binary number.
Introduction to the architecture of the NN:
It usually includes three layers: first layer called the input layer, middle layer is the hidden layer and the last layer is the output layer. The network with multiple hidden layers are called MultiLayer Perceptrons(MLP) regardless of using sigmoid neurons. Networks with this kind of many layer structure are called deep neural networks.
Stochastic Gradient Descent Algorithm:
To learn or train a network, a training data set is needed. The output from the network is approximated for all training inputs which is defined by a cost function. This cost function is also known as Mean Square Error(MSE). The training algorithm called gradient descent aims to keep the cost function as small as possible. The cost function helps to figure out the performance of the network. High cost function results in poor performance of the network so it is necessary to keep the cost function as small as possible. To make the training algorithm work correctly, learning rate (eta) has to be small enough. If eta is too small then the gradient descent algorithm works slowly.
To compute the gradient, the gradients has to be computed separately for each input which will long time if the training input is very large and learning becomes slow. To speed up the learning, stochastic gradient descent is used. SGD algorithm computes the gradient for a small sample of randomly chosen training inputs. By averaging a good estimate of the gradient can be obtained which in turn speeds up the learning. Backpropagation algorithm computes the gradient of the cost function faster.
By properly selecting the parameters the neural networks can train effectively with high performance and low cost.