Implementation of Neural Network from scratch using NumPy

Neural Network is a sophisticated architecture. It consists of the stack of multiple layers and neurons in each layer. If you are new to Neural Network, please refer these tutorials that will help to grasp the terminology of the Neural Network.

As an outsider, Neural Network may look like a magical black-box. Neural Network comprises lots of mathematical calculation. However, whenever you get a deeper understanding of it, it will be more clear.

It is very easy to build a complex model using the high-level API such as TensorFlow, Keras, PyTorch, etc. However, it worth to create your own neural network to get a clear understanding of it.

This tutorial has explained about developing the neural network from scratch using NumPy library. Mainly the neural network consists of the two processes forward-propagation and back-propagation.

Please refer this tutorial about how to derive the equations of the forward-propagation and back-propagation. Below section represents the equations of forward-propagation and back-propagation that we will implement in Python code.

Forward-propagation

Back-Propagation

Implementation 

Let’s implement the 2-layer Neural Network (Input Layer, 1-hidden layer and Output Layer).

Data – Here, we will use this data for the demonstration purpose. This data has two input features called x1, x2 and an output variable called target. It is a binary classification problem.

Download the data file in your current working directory.

In [1]:
# Import required packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(2)
In [2]:
# Read the data
df = pd.read_csv("dataset.csv")
df.shape
Out[2]:
(200, 3)
In [3]:
df.head()
Out[3]:
         x1        x2  target
0  1.065701  1.645795     1.0
1  0.112153  1.005711     1.0
2 -1.469113  0.598036     1.0
3 -1.554499  1.034249     1.0
4 -0.097040 -0.146800     0.0
In [4]:
# Let's print the distribution of the target variable in class 0 & 1
df['target'].value_counts()
Out[4]:
0.0    103
1.0     97
Name: target, dtype: int64
In [5]:
# Let's plot the distribution of the target variable
plt.scatter(df['x1'], df['x2'], c=df['target'].values.reshape(200,), s=40, cmap=plt.cm.Spectral)
plt.title('Distribution of the target variable')

In [6]:
# Let's prepare the data for model training
X = df[['x1','x2']].values.T
Y = df['target'].values.reshape(1,-1)
X.shape,Y.shape

Out[6]:
((2, 200), (1, 200))
In [7]:
m = X.shape[1]             # m - No. of training samples

# Set the hyperparameters
n_x = 2                    # No. of neurons in first layer
n_h = 10                   # No. of neurons in hidden layer
n_y = 1                    # No. of neurons in output layer
num_of_iters = 1000
learning_rate = 0.3
In [8]:
# Define the sigmoid activation function
def sigmoid(z):
    return 1/(1 + np.exp(-z))
In [9]:
# Initialize weigth & bias parameters
def initialize_parameters(n_x, n_h, n_y):
    W1 = np.random.randn(n_h, n_x)
    b1 = np.zeros((n_h, 1))
    W2 = np.random.randn(n_y, n_h)
    b2 = np.zeros((n_y, 1))

    parameters = {
        "W1": W1,
        "b1" : b1,
        "W2": W2,
        "b2" : b2
      }
    return parameters
In [10]:
# Function for forward propagation
def forward_prop(X, parameters):
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]

    Z1 = np.dot(W1, X) + b1
    A1 = np.tanh(Z1)
    Z2 = np.dot(W2, A1) + b2
    A2 = sigmoid(Z2)

    cache = {
      "A1": A1,
      "A2": A2
     }
    return A2, cache
In [11]:
# Function to calculate the loss
def calculate_cost(A2, Y):
    cost = -np.sum(np.multiply(Y, np.log(A2)) +  np.multiply(1-Y, np.log(1-A2)))/m
    cost = np.squeeze(cost)
    return cost
In [12]:
# Function for back-propagation
def backward_prop(X, Y, cache, parameters):
    A1 = cache["A1"]
    A2 = cache["A2"]

    W2 = parameters["W2"]

    dZ2 = A2 - Y
    dW2 = np.dot(dZ2, A1.T)/m
    db2 = np.sum(dZ2, axis=1, keepdims=True)/m
    dZ1 = np.multiply(np.dot(W2.T, dZ2), 1-np.power(A1, 2))
    dW1 = np.dot(dZ1, X.T)/m
    db1 = np.sum(dZ1, axis=1, keepdims=True)/m

    grads = {
    "dW1": dW1,
    "db1": db1,
    "dW2": dW2,
    "db2": db2
    }

    return grads
In [13]:
# Function to update the weigth & bias parameters
def update_parameters(parameters, grads, learning_rate):
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]

    dW1 = grads["dW1"]
    db1 = grads["db1"]
    dW2 = grads["dW2"]
    db2 = grads["db2"]

    W1 = W1 - learning_rate*dW1
    b1 = b1 - learning_rate*db1
    W2 = W2 - learning_rate*dW2
    b2 = b2 - learning_rate*db2

    new_parameters = {
    "W1": W1,
    "W2": W2,
    "b1" : b1,
    "b2" : b2
    }

    return new_parameters
In [14]:
# Define the Model
def model(X, Y, n_x, n_h, n_y, num_of_iters, learning_rate,display_loss=False):
    parameters = initialize_parameters(n_x, n_h, n_y)

    for i in range(0, num_of_iters+1):
        a2, cache = forward_prop(X, parameters)

        cost = calculate_cost(a2, Y)

        grads = backward_prop(X, Y, cache, parameters)

        parameters = update_parameters(parameters, grads, learning_rate)
        
        if display_loss:
            if(i%100 == 0):
                print('Cost after iteration# {:d}: {:f}'.format(i, cost))

    return parameters
In [15]:
trained_parameters = model(X, Y, n_x, n_h, n_y, num_of_iters, learning_rate,display_loss=True)

Out[15]:
Cost after iteration# 0: 0.727895
Cost after iteration# 100: 0.438707
Cost after iteration# 200: 0.308236
Cost after iteration# 300: 0.239390
Cost after iteration# 400: 0.200191
Cost after iteration# 500: 0.175058
Cost after iteration# 600: 0.157424
Cost after iteration# 700: 0.144189
Cost after iteration# 800: 0.133626
Cost after iteration# 900: 0.124717
Cost after iteration# 1000: 0.116933
In [16]:
# Define function for prediction
def predict(parameters, X):
    A2, cache = forward_prop(X,parameters)
    predictions = A2 > 0.5
    
    return predictions
In [17]:
# Define function to plot the decision boundary
def plot_decision_boundary(model, X, y):
    # Set min and max values and give it some padding
    x_min, x_max = X[0, :].min() - 1, X[0, :].max() + 1
    y_min, y_max = X[1, :].min() - 1, X[1, :].max() + 1
    h = 0.01
    # Generate a grid of points with distance h between them
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
    # Predict the function value for the whole grid
    Z = model(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    # Plot the contour and training examples
    plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral)
    plt.ylabel('x2')
    plt.xlabel('x1')
    plt.scatter(X[0, :], X[1, :], c=y.reshape(200,), cmap=plt.cm.Spectral)
In [18]:
# Plot the decision boundary
plot_decision_boundary(lambda x: predict(trained_parameters, x.T), X, Y)

In [19]:
# Let's see how our Neural Network work with different hidden layer sizes
plt.figure(figsize=(15, 10))
hidden_layer_sizes = [1, 2, 3, 5, 10,20]
for i, n_h in enumerate(hidden_layer_sizes):
    plt.subplot(2, 3, i+1)
    plt.title('Hidden Layer of size %d' % n_h)
    
    parameters = model(X, Y, n_x, n_h, n_y, num_of_iters, learning_rate)
    plot_decision_boundary(lambda x: predict(parameters, x.T), X, Y)

Out[19]:

From the above results, we can say that the model gives better performance with the more hidden units. But, sometimes the more hidden units overfit the data.

Overfitted model works best on training data but reduces the performance on test data. However, the model architecture (no of hidden layer + no of neuron in each hidden layer) is also dependent on the training dataset.

To find suitable hidden units is a tedious task. In the above example, the three red isolated data-points might be an outlier. If they are the outlier, model overfit with the hidden layer size 10 and 20. In that case, the best hidden layer size seems to be 3.

.     .     .

Leave a Reply

Your email address will not be published. Required fields are marked *

Machine Learning Model Tutorials