Neural Network has a more sophisticated Neuron structure similar to our brain. Neural Network is the mathematical function which transfers input variables to the target variable. Neural Network is consists of the stack of multiple layers.
If you want to know more about Neural Network architecture, please refer to this tutorial.
This tutorial has explained how Neural Network works with an example. There are a total of 5 steps performed while training the network.
- Initialize the weight parameters of the model
- Forward Propagation
- Measure the loss
- Backward Propagation
- Update weights
Here, we will look at each step in brief details with example.
Data
Let’s define the data to train the model. we will consider the below data for demonstration purpose.
X | Y(target) |
-10 | 14 |
0 | 32 |
8 | 46 |
15 | 59 |
22 | 72 |
38 | 100 |
Prepare Neural Network Architecture
Neural network architecture is consist of the stack of layers. The first layer is the input layer, then multiple hidden layers and followed by an output layer. Each layer of the network consists of multiple neurons.
The two main hyperparameters that control the entire Neural network are:
- The number of layers in the Network.
- The number of Neuron(node) in each hidden layer.
The input layer is the beginning of the Neural Network which brings the input data to model for further processing by subsequent layers of the Neural Network. The number of neuron in the input layer and output layer is pre-specified where the number of neuron in the hidden layers may vary.
Input layer contains the number of neurons the same as the number of input features. And the output layer has as many neurons as there are outputs to the classification problem.
Each neuron has assigned the weight parameter that needs to be maintained. The below image depicts our Neural Network architecture.
Here, w01, w02, w11 and w21 are the weight parameters of the neurons of the input layer and the hidden layer. And b0 and b1 are the bias parameter of the input layer and hidden layer respectively.
X is the input variable that will be feed to the input layer. z1, z2 and z3 are the intermediate results of the neurons of the hidden layer and output layer respectively. To learn the non-linearity pattern between input and target output value, we need to apply the activation function to the intermediate result of the neurons. a1, a2 and a3 are the activation value of the neurons of hidden and output layer respectively.
Let’s calculate the intermediate result and activation values at each neuron of the hidden and output layer.
z1 = w01x + b0 # Neuron value of hidden node-1 a1 = σ(z1) # Activation value of hidden node-1 z2 = w02x + b0 # Neuron value of hidden node-2 a2 = σ(z2) # Activation value of hidden node-2 z3 = w11a1 + w21a2 + b1 # Neuron value of output node a3 = σ(z3) # Activation value of output node
Sigmoid Activation Function
The activation function allows the neural network to learn a non-linear pattern between inputs and target output variable. To learn more about the activation function, please refer to this tutorial.
Here, we will use the sigmoid activation function. The mathematical equation of the sigmoid function is as follow:
Initialize the weight and bias parameters:
Let’s initialized the weight and bias parameters randomly.
w01 = 0.06 w02 = 0.50 w11 = 0.25 w21 = 0.12 b0 = 0.44 b1 = 0.21
Forward propagation
Forward propagation calculates the output for each neuron. Each neuron performs the mathematical operation on its input and generates the output. The output is further passed to the neuron of the next layer.
Let’s consider the first input of the data for illustration. X = -10 & Y = 14
z1 = w01x + b0 z1 = 0.06(-10)+0.44 z1 = -0.1599 a1 = σ(z1) = 1 / (1+e-z1) = 0.4601 z2 = w02x + b0 z2 = 0.50(-10)+0.44 z2 = -4.56 a2 = σ(z2) = 1 / (1+e-z2) = 0.0103 z3 = w11a1 + w21a2 + b1 z3 = 0.3262 pred_y = a3 = σ(z3) = 1 / (1+e-z3) = 0.5808
Measure Loss(Error)
Next step, the error is calculated between the predicted target value and the actual target value. The main motive of the network is to get the minimum error as the predicted value is nearly equal to the actual target value.
Let’s calculate the Mean Squared Error (MSE) :
MSE = (true_y - pred_y)2 = (14 - 0.5808)2 = 180.0749
Backward propagation
Backward propagation or backpropagation is the process of propagating the error(loss) back to the neural network and update the weights of each neuron subsequently by adjusting the weight and bias parameters.
Back-propagation plays an important role in the Neural Network. It performs several mathematical operations. The parameter updation is determined by finding the partial derivation of the loss function with respect to those parameters.
The derivation represents the rate of changes relative to the parameter changes. By performing derivation, we can determine how sensitive is the loss function to each weight & bias parameters.
Let’s perform the back-propagation process for weight w11. we need to calculate the partial derivation of the loss function with respect to weight parameter w11.
Here, we are unable to calculate the partial derivation of the loss function with respect to parameter w11. Hence we need to use the chain rule.
Partial derivation of the sigmoid activation function is as follow : σ'(z) = σ(z)(1 – σ(z))
Weight Update
The weight and bias parameters are updated by subtracting the partial derivation of the loss function with respect to those parameters.
The derivation represents the rate of changes relative to the parameter changes. Here α is the learning rate that represents the step size. It controls how much to update the parameter. The value of α is between 0 to 1.
If you want to understand more about the learning rate, please refer to this tutorial.
Let’s update the weight parameter w1 :
w11 = 0.25 - 0.1 * (-3.0055) # Previous value of w11 = 0.25 w11 = 0.5505
With the same procedure, update the all weight and bias parameters through the back-propagation method. A single iteration is finished, after updating each parameter.
Neural Network needs to repeat this process on different samples for a certain number of times to achieve better accuracy. This process is called the training of the neural network.
. . .