TensorFlow : Prepare Custom Neural Network Model with Custom Layers

TensorFlow is a Deep Learning library. Generally, Deep Learning practitioner uses Keras Sequential or Functional API to build a deep neural network architecture. We can easily create the neural network model by stacking multiple layers using Keras. However, all the Keras layers have their default behaviour. You don’t have any control over it.

The Good news is that TensorFlow also supports customization with more flexibility to build a model using subclassing the Model class. Where you can create your own feed-forward model with your custom layers design. The beauty of the model customization is that you have full control over every nuance of the model. However, It’s hard to develop a model with customization. But it is worth getting the entire control over the model.

In this tutorial, you will get to know how to create a custom model with custom layers in TensorFlow. You will also discover the custom training and evaluation of the model. This entire tutorial is work under TensorFlow version 2.1.0. Let’s start to build the model with importing TensorFlow package.

Import Required Packages

import tensorflow as tf
print(tf.__version__)

2.1.0

Import dataset

For the experimental purpose, here we use iris flower dataset that consists of 3 different types of irises’ (Setosa, Versicolour, and Virginica). This dataset has 4 features length and width of the Sepals and Petals. It is a multi-class classification problem. We need to prepare a Machine Learning model to classify Iris flowers by species.

Let’s load iris dataset. The dataset has 50 samples for each class.

from sklearn import datasets

iris = datasets.load_iris()
X = iris.data
y = iris.target

print(f"X.shape = {X.shape}")
print(f"y.shape = {y.shape}")

X.shape = (150, 4)
y.shape = (150,)

Build the Model

In this section, we create a custom linear layer and model using TensorFlow’s Keras API. To create the custom layer, we will use the Layer class where weight w and b are initialized and also define the computation. And use the Model class to define the custom neural network architecture.

The Layer class

To create a dense (linear) layer, we inherit the Layer class of Keras API. It has weight w and b parameters. You can initialize the weight and bias parameter in both __init__ or build method. However, the weight and bias coefficient are ideally initialized in a build method and some computation is defined in the call method.

This custom dense layer takes a dimension of the input and number of neuron unit as a parameter, that used to defined the shape of the weight and bias coefficient in the build method.

class Dense_Layer(tf.keras.layers.Layer):

    def __init__(self, units):
        super(Dense_Layer, self).__init__()
        self.units = units

    def build(self,input_shape):
        self.w = self.add_weight(shape=(input_shape[-1], self.units),
                                 initializer='random_normal',
                                 trainable=True)
        self.b = self.add_weight(shape=(self.units,),
                                 initializer='random_normal', 
                                 trainable=True)

    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b

The Model class

TensorFlow’s Keras API provides Model class that is used to define the model architecture. The Model class comprise multiple subclassing layers via Layer class.

Let’s create the custom model by inheriting the Model class of Keras API. Same as Layer class, the subclassing inner layers are defined in __init__ or build method and computations are defined in the call method.

Below, the example is the three hidden layer architecture where layer initialization took place in __init__ with specified neuron size. These layers are stacked in the call method.

class Custom_Model(tf.keras.Model):

    def __init__(self):
        super(Custom_Model, self).__init__()

        self.dense1 = Dense_Layer(50)
        self.dense2 = Dense_Layer(12)
        self.dense3  = Dense_Layer(3)
        self.dropout = tf.keras.layers.Dropout(0.2)

    def call(self,input_tensor,training=False):

        x = self.dense1(input_tensor)
        x = tf.nn.relu(x)

        if training:
            x = self.dropout(x, training=training)
        x = self.dense2(x)
        x = tf.nn.relu(x)

        x = self.dense3(x)
        x = tf.nn.softmax(x)

        return x

model = Custom_Model()

Prepare the training data

As we are going to train the model on iris data that is a multi-class classification problem. So, we need to apply the one-hot encoding of the target variable.

from keras.utils import to_categorical
one_hot_Y = to_categorical(y)
print("One hot Encoding --> target shape :",one_hot_Y.shape)

One hot Encoding --> target shape : (150, 3)

Split train & validation data

from sklearn.model_selection import train_test_split
X_train, X_val, Y_train, Y_val = train_test_split(X,one_hot_Y,test_size=0.2,random_state = 42)

print(f"Training data   : X_train.shape : {X_train.shape}  &  Y_train.shape : {Y_train.shape}")
print(f"Validation data : X_val.shape   : {X_val.shape}   &  Y_val.shape   : {Y_val.shape}")

Training data   : X_train.shape : (120, 4)  &  Y_train.shape : (120, 3)
Validation data : X_val.shape   : (30, 4)   &  Y_val.shape   : (30, 3)

Define Loss function

To train the model, we need loss function that helps to evaluate the model performance. That measure how much the difference between the model’s predicted target value and actual target value. Our goal is to minimize this loss value while model training.

Here, we use CategoricalCrossentropy loss function that takes the model’s predicted class probability and actual target value as input and return the loss value.

loss_fn = tf.keras.losses.CategoricalCrossentropy()

Define a Gradient function

Calculate the gradients of the loss function that is partial derivatives with respect to the model’s parameters. Gradients are used to optimize the model. The tf.GradientTape() use to calculate gradient. Let’s set up the Gradient function.

with tf.GradientTape() as tape:
    logits = model(x_batch_train)                           # Model's prediction
    loss_value = loss_fn(y_batch_train, logits)             # Calcuate Loss

grads = tape.gradient(loss_value, model.trainable_weights)  # Calculate Gradients

Define an Optimizer

We required optimizer to minimize the loss. Optimizer use computed gradient of a loss function to minimize the loss. TensorFlow provides various optimization algorithms. Here we use the Adam optimizer with setting learning rate 0.01

optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)

Define an Accuracy Metric

Let’s set up an accuracy metric of training and validation to observe the performance of the model.

train_acc_metric = tf.keras.metrics.CategoricalAccuracy()
val_acc_metric = tf.keras.metrics.CategoricalAccuracy()

Define Batch dataset of train & validation

To train the model, the number of samples is executed at a time that is called a batch sample. We iteratively execute the train function on each batch. Let’s define the batch size is 16, that means 16 samples in a batch.

# Prepare the training dataset.
batch_size = 16
train_dataset = tf.data.Dataset.from_tensor_slices((X_train, Y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)

val_dataset = tf.data.Dataset.from_tensor_slices((X_val, Y_val))
val_dataset = val_dataset.batch(batch_size)

Define Training Loop

Let’s set up the training loop with many steps.

Iterate each epoch. one epoch means one pass through entire train dataset.
Within each epoch iteration, model training calculation will be performed on each train batches.
Run the forward-pass calculation and predict the model output.
Calculate the Loss and gradient of the model.
Run an optimizer to update the parameters of the model.
Display Log record of Loss value.
Repeat the same procedure for each epoch.

# defne epoch
epochs = 5

for epoch in range(epochs):

    print("\n")
    print(f"Epoch : {epoch+1}")

    # Iterate over the batches of the train dataset
    for step, (x_batch_train, y_batch_train) in enumerate(train_dataset): 

        # During forward pass, Open GradientTape to calculate the gradient
        with tf.GradientTape() as tape:
            logits = model(x_batch_train,training=True)  # Calcuate Model's Prediction
            loss_value = loss_fn(y_batch_train, logits)  # Calculate Loss value

        # Retrieve Gradient Calculation
        grads = tape.gradient(loss_value, model.trainable_weights)  

        # Run the Optimizer, that update the model parameters to minimize the loss
        optimizer.apply_gradients(zip(grads, model.trainable_weights))

        # Update training accuracy metric
        train_acc_metric(y_batch_train, logits)

        # Print Log of loss value at every 5th step
        if step % 5 == 0:
            print(f"Training Loss at step {step} : {loss_value:.3f}")

    print()      
   
    # print training accuracy at the end of each epoch
    train_acc = train_acc_metric.result()
    print(f"Training Accuracy   : {train_acc:.3f}")
    # Reset training metrics at the end of each epoch
    train_acc_metric.reset_states()

    # Run model on validation data at the end of each epoch
    for x_batch_val, y_batch_val in val_dataset:
        val_logits = model(x_batch_val)
        val_acc_metric(y_batch_val, val_logits) 

    # Display validation accuracy 
    val_acc = val_acc_metric.result()
    # Reset validation metric
    val_acc_metric.reset_states()
    print(f"Validation Accuracy : {val_acc:.3f}")

Epoch : 1
Training Loss at step 0 : 1.092
Training Loss at step 5 : 1.049

Training Accuracy   : 0.367
Validation Accuracy : 0.533

Epoch : 2
Training Loss at step 0 : 0.978
Training Loss at step 5 : 0.837

Training Accuracy   : 0.642
Validation Accuracy : 0.700

Epoch : 3
Training Loss at step 0 : 0.740
Training Loss at step 5 : 0.545

Training Accuracy   : 0.658
Validation Accuracy : 0.700

Epoch : 4
Training Loss at step 0 : 0.466
Training Loss at step 5 : 0.441

Training Accuracy   : 0.783
Validation Accuracy : 0.967

Epoch : 5
Training Loss at step 0 : 0.332
Training Loss at step 5 : 0.291

Training Accuracy   : 0.917
Validation Accuracy : 0.967

Evaluate the Model

Let’s evaluate the model performance on the test data set. Here, is the test samples.

test_dataset = tf.convert_to_tensor([
    [5.8, 2.7, 3.9, 1.2],
    [4.7, 3.2, 1.6, 0.2],
    [7.7, 2.6, 6.9, 2.3],
    [4.8, 3. , 1.4, 0.1],
    [6.7, 2.5, 5.8, 1.8]
])
test_dataset.shape

TensorShape([5, 4])

Class_Label = ['Iris setosa', 'Iris versicolor', 'Iris virginica']

# if the model contain layers with different behaviour 
# during training and inference such as Dropout, use training=False

test_predictions = model(test_dataset, training=False)
test_probabilities = tf.nn.softmax(test_predictions)
predicted_class = tf.argmax(test_probabilities, 1)
predicted_class_label = tf.gather(Class_Label, predicted_class)

for ex,pred in zip(test_dataset,predicted_class_label):
    tf.print(ex,pred)

[5.8 2.7 3.9 1.2] "Iris versicolor"
[4.7 3.2 1.6 0.2] "Iris setosa"
[7.7 2.6 6.9 2.3] "Iris virginica"
[4.8 3 1.4 0.1] "Iris setosa"
[6.7 2.5 5.8 1.8] "Iris virginica"

In this tutorial, you have explored to build custom neural network model with custom layers using TensorFlow’s Keras API.

. . .