How Neural Network works?

Neural Network has a more sophisticated Neuron structure similar to our brain. Neural Network is the mathematical function which transfers input variables to the target variable. Neural Network is consists of the stack of multiple layers.

If you want to know more about Neural Network architecture, please refer to this tutorial.

This tutorial has explained how Neural Network works with an example. There are a total of 5 steps performed while training the network.

  • Initialize the weight parameters of the model
  • Forward Propagation
  • Measure the loss
  • Backward Propagation
  • Update weights

 

Here, we will look at each step in brief details with example.

Data

Let’s define the data to train the model. we will consider the below data for demonstration purpose.

X Y(target)
-10 14
0 32
8 46
15 59
22 72
38 100

Prepare Neural Network Architecture

Neural network architecture is consist of the stack of layers. The first layer is the input layer, then multiple hidden layers and followed by an output layer. Each layer of the network consists of multiple neurons.

The two main hyperparameters that control the entire Neural network are:

  • The number of layers in the Network.
  • The number of Neuron(node) in each hidden layer.

 

The input layer is the beginning of the Neural Network which brings the input data to model for further processing by subsequent layers of the Neural Network. The number of neuron in the input layer and output layer is pre-specified where the number of neuron in the hidden layers may vary.

Input layer contains the number of neurons the same as the number of input features. And the output layer has as many neurons as there are outputs to the classification problem.

Each neuron has assigned the weight parameter that needs to be maintained. The below image depicts our Neural Network architecture.

Here, w01, w02, w11 and w21 are the weight parameters of the neurons of the input layer and the hidden layer. And b0 and b1 are the bias parameter of the input layer and hidden layer respectively.

X is the input variable that will be feed to the input layer. z1, z2 and z3 are the intermediate results of the neurons of the hidden layer and output layer respectively. To learn the non-linearity pattern between input and target output value, we need to apply the activation function to the intermediate result of the neurons. a1, a2 and a3 are the activation value of the neurons of hidden and output layer respectively.

Let’s calculate the intermediate result and activation values at each neuron of the hidden and output layer.

z1 = w01x + b0                         # Neuron value of hidden node-1
a1 = σ(z1)                            #  Activation value of hidden node-1

z2 = w02x + b0                         # Neuron value of hidden node-2
a2 = σ(z2)                            # Activation value of hidden node-2

z3 = w11a1 + w21a2 + b1                # Neuron value of output node
a3 = σ(z3)                           # Activation value of output node

Sigmoid Activation Function

The activation function allows the neural network to learn a non-linear pattern between inputs and target output variable. To learn more about the activation function, please refer to this tutorial.

Here, we will use the sigmoid activation function. The mathematical equation of the sigmoid function is as follow:

Initialize the weight and bias parameters:

Let’s initialized the weight and bias parameters randomly.

w01 = 0.06
w02 = 0.50
w11 = 0.25
w21 = 0.12
b0 = 0.44
b1 = 0.21

Forward propagation

Forward propagation calculates the output for each neuron. Each neuron performs the mathematical operation on its input and generates the output. The output is further passed to the neuron of the next layer.

Let’s consider the first input of the data for illustration. X = -10 & Y = 14

z1 = w01x + b0
z1 = 0.06(-10)+0.44
z1 = -0.1599

a1 = σ(z1) = 1 / (1+e-z1) = 0.4601

z2 = w02x + b0
z2 = 0.50(-10)+0.44
z2 = -4.56

a2 = σ(z2) = 1 / (1+e-z2) = 0.0103

z3 = w11a1 + w21a2 + b1
z3 = 0.3262
 
pred_y = a3 = σ(z3) = 1 / (1+e-z3) = 0.5808

Measure Loss(Error)

Next step, the error is calculated between the predicted target value and the actual target value. The main motive of the network is to get the minimum error as the predicted value is nearly equal to the actual target value.

Let’s calculate the Mean Squared Error (MSE) :

MSE  = (true_y - pred_y)2
     = (14 - 0.5808)2
     =  180.0749

Backward propagation

Backward propagation or backpropagation is the process of propagating the error(loss) back to the neural network and update the weights of each neuron subsequently by adjusting the weight and bias parameters.

Back-propagation plays an important role in the Neural Network. It performs several mathematical operations. The parameter updation is determined by finding the partial derivation of the loss function with respect to those parameters.

The derivation represents the rate of changes relative to the parameter changes. By performing derivation, we can determine how sensitive is the loss function to each weight & bias parameters.

Let’s perform the back-propagation process for weight w11. we need to calculate the partial derivation of the loss function with respect to weight parameter w11.

Here, we are unable to calculate the partial derivation of the loss function with respect to parameter w11. Hence we need to use the chain rule.

Partial derivation of the sigmoid activation function is as follow : σ'(z) = σ(z)(1 – σ(z))

 

Weight Update

The weight and bias parameters are updated by subtracting the partial derivation of the loss function with respect to those parameters.

The derivation represents the rate of changes relative to the parameter changes. Here α is the learning rate that represents the step size. It controls how much to update the parameter. The value of α is between 0 to 1.

If you want to understand more about the learning rate, please refer to this tutorial.

 

 

Let’s update the weight parameter w1 :

w11 = 0.25 - 0.1 * (-3.0055)       # Previous value of w11 = 0.25
w11 = 0.5505

With the same procedure, update the all weight and bias parameters through the back-propagation method. A single iteration is finished, after updating each parameter.

Neural Network needs to repeat this process on different samples for a certain number of times to achieve better accuracy. This process is called the training of the neural network.

.     .     .

Leave a Reply

Your email address will not be published. Required fields are marked *

Machine Learning Model Tutorials

Content-Based Recommendation System

Face verification on Live CCTV IP camera feed using Amazon Rekognition Video and Kinesis Video Streams

AWS Rekognition: Face detection & analysis of an image

Stream CCTV IP camera (RTSP) feed into AWS Kinesis Video Streams

Model Quantization Methods In TensorFlow Lite

Introduction to TensorFlow Lite

TensorFlow : Prepare Custom Neural Network Model with Custom Layers

Regularization Techniques: To avoid Overfitting in Neural Network

Setting Dynamic Learning Rate While Training the Neural Network

Neural Network: Introduction to Learning Rate

Mathematics behind the Neural Network

Implementation of Neural Network from scratch using NumPy

Gradient Descent with Momentum in Neural Network

Gradient Descent in Neural Network

Activation Functions in Neural Network

Introduction to Neural Network

K-Nearest Neighbors (KNN)

Support Vector Machine (SVM)

Logistic Regression

Linear Regression

Random Forest

Decision Tree

Introduction to Machine Learning Model

Performance Measurement Metrics to Evaluate Machine Learning Model

Essential Mathematics for Machine Learning

Applications of Machine Learning