Blog Archives

Neural Nets & Deep learning..

5/23/2016

3.Improving the way neural networks learn
Micheal Nielsen, Jan 2016

The implementation of backpropagation algorithm can be improved to improve the way the network can learn. The techniques include: choice of cost function and four regularization methods(L1 and L2 regularization,dropout and artificial expansion of training data). These techniques generalize the network beyond training data, a better method to initialize weights and to choose better hyper parameters for the network.
Cost Function:
   The quadratic cost function is replaced with cross entropy function, because the neuron will learn faster if the error is large. Unlike, quadratic cost it avoids the problem of learning slow down. Cross-entropy is a better choice provided the neurons are sigmoid.
Softmax:
Softmax is a new type of output layer for neural networks. Instead of applying sigmoid function to get the output, the softmax function is applied.The output from the softmax gives the probability distribution whereas the output obtained from the sigmoid layer cannot form probability distribution.
Overfitting & Regularization:
If the cost of the model is better but the accuracy results are not convincing then the network is overfitting or over training. It can be reduced by increasing the size of the training data or reducing the network size.
With fixed training data and fixed network, regularization techniques like weight decay or L2 regularization can be used to reduce the problem of overfitting.
The L2 regularization adds an extra term to the cost function called regularization parameter (lambda). The effect of regularization is to make the network prefer to learn small weights. When lambda is small preference is given to minimize the original cost function, when lambda is large prefer small weights.
L1 regularization modifies the unregularized cost function by adding the sum of the absolute values of the weights. The effect of regularization is to shrink the weights.
   Dropout modifies the network itself. It is done by randomly deleting half the hidden neurons in the network while leaving the input and output neurons untouched. The neurons which was temporarily deleted is still ghosted. The process is repeated by first restoring the dropout neurons then choosing a new random subset of hidden neurons to delete, estimating the gradient for different batch and updating the weights and biases in the network.
   By altering the cost function and introducing the regularization techniques can improve the neural network to learn better.

0 Comments

Neural Nets & Deep learning

5/22/2016

0 Comments

2.How backpropagation algorithm works
Micheal Nielsen, Jan 2016

The backpropagation algorithm computes the gradient of the cost function faster. But how? The backpropagation computes the partial derivative of the cost function with respect to any weight or bias in the network. The expression also reveals how quick the cost changes with respective to the change in weights or bias which in turn changes the overall behavior of the network. To compute the partial derivatives of cost function with respect to weights or bias it is necessary to compute the error which is related to partial derivatives.
Backpropagation algorithm is constructed from four basic equations to compute error and the gradient of the cost function.

Equation to compute error in the output layer.
Equation to compute error in layer 1 in terms of error in the next layer.
Equation to compute rate of change of cost with respect any bias in the network.
Equation to compute rate of change of cost with respect any weight in the network.

# If the activation is small the gradient of the cost function is small and the weight learns slowly.
# If the output neuron is saturated the learning is slow.
# weights input to the saturated neuron learns slow.
weight will learn slowly if either the input neuron has low activation or if the output neuron is saturated due to high or low activation.
Steps:

Input: set activation for input layer
Feedforward
output error
Backpropagate the error
output: calculate the gradient of the cost function.

0 Comments

Neural Nets & Deep learning

5/21/2016

0 Comments

1.Using neural nets to recognize handwritten digits
Micheal Nielsen, Jan 2016

Humans recognize digits effortlessly because we humans have primary visual cortex with millions of connections between them but its not the case for the machines. Making a machine to recognize digits by writing a algorithm is not an easy task. Neural networks try to train and learn to recognize digits. This can be done by making use of a neuron it can be either a perceptron or a sigmoid neuron, then a learning algorithm called Stochastic Gradient Descent(SGD). To make the neural network work efficiently we need to focus on number of parameters including the type of neuron and implement new ideas to minimize the cost function.
Types of Neuron:
Peceptron: It takes several binary inputs and produces a single binary output. The weights are expressed in real numbers and the output of the neuron is either a 1 or 0. The neuron outputs 0 if the value is less than the threshold value and outputs 1 if the value is above the threshold value.
Sigmoid:   A small change is made in assigning weights to the network which indeed causes a change in the output. Unlike perceptrons the inputs can take any values and not just 0 or 1 and the output is not just binary number.
Introduction to the architecture of the NN:
   It usually includes three layers: first layer called the input layer, middle layer is the hidden layer and the last layer is the output layer. The network with multiple hidden layers are called MultiLayer Perceptrons(MLP) regardless of using sigmoid neurons. Networks with this kind of many layer structure are called deep neural networks.
Stochastic Gradient Descent Algorithm:
To learn or train a network, a training data set is needed. The output from the network is approximated for all training inputs which is defined by a cost function. This cost function is also known as Mean Square Error(MSE). The training algorithm called gradient descent aims to keep the cost function as small as possible. The cost function helps to figure out the performance of the network. High cost function results in poor performance of the network so it is necessary to keep the cost function as small as possible. To make the training algorithm work correctly, learning rate (eta) has to be small enough. If eta is too small then the gradient descent algorithm works slowly.
To compute the gradient, the gradients has to be computed separately for each input which will long time if the training input is very large and learning becomes slow. To speed up the learning, stochastic gradient descent is used. SGD algorithm computes the gradient for a small sample of randomly chosen training inputs. By averaging a good estimate of the gradient can be obtained which in turn speeds up the learning. Backpropagation algorithm computes the gradient of the cost function faster.
   By properly selecting the parameters the neural networks can train effectively with high performance and low cost.

0 Comments

Neural Nets & Deep learning..

3.Improving the way neural networks learnMicheal Nielsen, Jan 2016

Neural Nets & Deep learning﻿

2.How backpropagation algorithm worksMicheal Nielsen, Jan 2016

Neural Nets & Deep learning

1.Using neural nets to recognize handwritten digits ​Micheal Nielsen, Jan 2016

Archives

Categories

3.Improving the way neural networks learn
Micheal Nielsen, Jan 2016

Neural Nets & Deep learning

2.How backpropagation algorithm works
Micheal Nielsen, Jan 2016

1.Using neural nets to recognize handwritten digits
Micheal Nielsen, Jan 2016