3.Improving the way neural networks learn
Micheal Nielsen, Jan 2016
The implementation of backpropagation algorithm can be improved to improve the way the network can learn. The techniques include: choice of cost function and four regularization methods(L1 and L2 regularization,dropout and artificial expansion of training data). These techniques generalize the network beyond training data, a better method to initialize weights and to choose better hyper parameters for the network.
Cost Function:
The quadratic cost function is replaced with cross entropy function, because the neuron will learn faster if the error is large. Unlike, quadratic cost it avoids the problem of learning slow down. Cross-entropy is a better choice provided the neurons are sigmoid.
Softmax:
Softmax is a new type of output layer for neural networks. Instead of applying sigmoid function to get the output, the softmax function is applied.The output from the softmax gives the probability distribution whereas the output obtained from the sigmoid layer cannot form probability distribution.
Overfitting & Regularization:
If the cost of the model is better but the accuracy results are not convincing then the network is overfitting or over training. It can be reduced by increasing the size of the training data or reducing the network size.
With fixed training data and fixed network, regularization techniques like weight decay or L2 regularization can be used to reduce the problem of overfitting.
The L2 regularization adds an extra term to the cost function called regularization parameter (lambda). The effect of regularization is to make the network prefer to learn small weights. When lambda is small preference is given to minimize the original cost function, when lambda is large prefer small weights.
L1 regularization modifies the unregularized cost function by adding the sum of the absolute values of the weights. The effect of regularization is to shrink the weights.
Dropout modifies the network itself. It is done by randomly deleting half the hidden neurons in the network while leaving the input and output neurons untouched. The neurons which was temporarily deleted is still ghosted. The process is repeated by first restoring the dropout neurons then choosing a new random subset of hidden neurons to delete, estimating the gradient for different batch and updating the weights and biases in the network.
By altering the cost function and introducing the regularization techniques can improve the neural network to learn better.
Cost Function:
The quadratic cost function is replaced with cross entropy function, because the neuron will learn faster if the error is large. Unlike, quadratic cost it avoids the problem of learning slow down. Cross-entropy is a better choice provided the neurons are sigmoid.
Softmax:
Softmax is a new type of output layer for neural networks. Instead of applying sigmoid function to get the output, the softmax function is applied.The output from the softmax gives the probability distribution whereas the output obtained from the sigmoid layer cannot form probability distribution.
Overfitting & Regularization:
If the cost of the model is better but the accuracy results are not convincing then the network is overfitting or over training. It can be reduced by increasing the size of the training data or reducing the network size.
With fixed training data and fixed network, regularization techniques like weight decay or L2 regularization can be used to reduce the problem of overfitting.
The L2 regularization adds an extra term to the cost function called regularization parameter (lambda). The effect of regularization is to make the network prefer to learn small weights. When lambda is small preference is given to minimize the original cost function, when lambda is large prefer small weights.
L1 regularization modifies the unregularized cost function by adding the sum of the absolute values of the weights. The effect of regularization is to shrink the weights.
Dropout modifies the network itself. It is done by randomly deleting half the hidden neurons in the network while leaving the input and output neurons untouched. The neurons which was temporarily deleted is still ghosted. The process is repeated by first restoring the dropout neurons then choosing a new random subset of hidden neurons to delete, estimating the gradient for different batch and updating the weights and biases in the network.
By altering the cost function and introducing the regularization techniques can improve the neural network to learn better.