How Neural Network works?

Neural Network has a more sophisticated Neuron structure similar to our brain. Neural Network is the mathematical function which transfers input variables to the target variable. Neural Network is consists of the stack of multiple layers.

If you want to know more about Neural Network architecture, please refer to this tutorial.

This tutorial has explained how Neural Network works with an example. There are a total of 5 steps performed while training the network.

Initialize the weight parameters of the model
Forward Propagation
Measure the loss
Backward Propagation
Update weights

Here, we will look at each step in brief details with example.

Data

Let’s define the data to train the model. we will consider the below data for demonstration purpose.

X	Y(target)
-10	14
0	32
8	46
15	59
22	72
38	100

Prepare Neural Network Architecture

Neural network architecture is consist of the stack of layers. The first layer is the input layer, then multiple hidden layers and followed by an output layer. Each layer of the network consists of multiple neurons.

The two main hyperparameters that control the entire Neural network are:

The number of layers in the Network.
The number of Neuron(node) in each hidden layer.

The input layer is the beginning of the Neural Network which brings the input data to model for further processing by subsequent layers of the Neural Network. The number of neuron in the input layer and output layer is pre-specified where the number of neuron in the hidden layers may vary.

Input layer contains the number of neurons the same as the number of input features. And the output layer has as many neurons as there are outputs to the classification problem.

Each neuron has assigned the weight parameter that needs to be maintained. The below image depicts our Neural Network architecture.

Here, w₀₁, w₀₂, w₁₁ and w₂₁ are the weight parameters of the neurons of the input layer and the hidden layer. And b₀ and b₁ are the bias parameter of the input layer and hidden layer respectively.

X is the input variable that will be feed to the input layer. z₁, z₂ and z₃ are the intermediate results of the neurons of the hidden layer and output layer respectively. To learn the non-linearity pattern between input and target output value, we need to apply the activation function to the intermediate result of the neurons. a₁, a₂ and a₃ are the activation value of the neurons of hidden and output layer respectively.

Let’s calculate the intermediate result and activation values at each neuron of the hidden and output layer.

z₁ = w₀₁x + b₀# Neuron value of hidden node-1
a₁ = σ(z₁)                            #  Activation value of hidden node-1

z₂ = w₀₂x + b₀# Neuron value of hidden node-2
a₂ = σ(z₂)                            # Activation value of hidden node-2

z₃ = w₁₁a₁ + w₂₁a₂ + b₁# Neuron value of output node
a₃ = σ(z₃)                           # Activation value of output node

Sigmoid Activation Function

The activation function allows the neural network to learn a non-linear pattern between inputs and target output variable. To learn more about the activation function, please refer to this tutorial.

Here, we will use the sigmoid activation function. The mathematical equation of the sigmoid function is as follow:

Initialize the weight and bias parameters:

Let’s initialized the weight and bias parameters randomly.

w₀₁ = 0.06
w₀₂ = 0.50
w₁₁ = 0.25
w₂₁ = 0.12
b₀ = 0.44
b₁ = 0.21

Forward propagation

Forward propagation calculates the output for each neuron. Each neuron performs the mathematical operation on its input and generates the output. The output is further passed to the neuron of the next layer.

Let’s consider the first input of the data for illustration. X = -10 & Y = 14

z₁ = w₀₁x + b₀
z₁ = 0.06(-10)+0.44
z₁ = -0.1599

a₁ = σ(z₁) = 1 / (1+e^-z₁) = 0.4601

z₂ = w₀₂x + b₀
z₂ = 0.50(-10)+0.44
z₂ = -4.56

a₂ = σ(z₂) = 1 / (1+e^-z₂) = 0.0103

z₃ = w₁₁a₁ + w₂₁a₂ + b₁
z₃ = 0.3262
 
pred_y = a₃ = σ(z₃) = 1 / (1+e^-z₃) = 0.5808

Measure Loss(Error)

Next step, the error is calculated between the predicted target value and the actual target value. The main motive of the network is to get the minimum error as the predicted value is nearly equal to the actual target value.

Let’s calculate the Mean Squared Error (MSE) :

MSE  = (true_y - pred_y)²
     = (14 - 0.5808)²
     =  180.0749

Backward propagation

Backward propagation or backpropagation is the process of propagating the error(loss) back to the neural network and update the weights of each neuron subsequently by adjusting the weight and bias parameters.

Back-propagation plays an important role in the Neural Network. It performs several mathematical operations. The parameter updation is determined by finding the partial derivation of the loss function with respect to those parameters.

The derivation represents the rate of changes relative to the parameter changes. By performing derivation, we can determine how sensitive is the loss function to each weight & bias parameters.

Let’s perform the back-propagation process for weight w₁₁. we need to calculate the partial derivation of the loss function with respect to weight parameter w₁₁.

Here, we are unable to calculate the partial derivation of the loss function with respect to parameter w₁₁. Hence we need to use the chain rule.

Partial derivation of the sigmoid activation function is as follow : σ'(z) = σ(z)(1 – σ(z))

Weight Update

The weight and bias parameters are updated by subtracting the partial derivation of the loss function with respect to those parameters.

The derivation represents the rate of changes relative to the parameter changes. Here α is the learning rate that represents the step size. It controls how much to update the parameter. The value of α is between 0 to 1.

If you want to understand more about the learning rate, please refer to this tutorial.

Let’s update the weight parameter w₁:

w₁₁ = 0.25 - 0.1 * (-3.0055)       # Previous value of w₁₁ = 0.25
w₁₁ = 0.5505

With the same procedure, update the all weight and bias parameters through the back-propagation method. A single iteration is finished, after updating each parameter.

Neural Network needs to repeat this process on different samples for a certain number of times to achieve better accuracy. This process is called the training of the neural network.

. . .