This tutorial has explained the construction of Convolutional Neural Network (CNN) on MNIST handwritten digits dataset using Keras Deep Learning library. The MNIST handwritten digits dataset is the standard dataset used as the basis for learning Neural Network for image classification in computer vision and deep learning.
The MNIST dataset contains 28*28 pixel grayscale images of handwritten digits between 0 to 9. It has 60,000 samples for training and 10,000 samples for testing.
. . .
Develop a Baseline Model
Keras API provides the built-in MNIST dataset. Let’s load the MNIST dataset using Keras in Python.
In [1]: from keras.datasets import mnist (trainX, trainy), (testX, testy) = mnist.load_data() print('Train Data : X={} Y={}'.format(trainX.shape, trainy.shape)) print('Test Data : X={} y={}'.format(testX.shape, testy.shape)) Out[1]: Train Data : X=(60000, 28, 28) Y=(60000,) Test Data : X=(10000, 28, 28) y=(10000,)
Let’s plot the few samples from a dataset.
In [2]: import matplotlib.pyplot as plt for i in range(9): plt.subplot(330 + 1 + i) plt.imshow(trainX[i], cmap=plt.get_cmap('gray')) plt.show() Out[2]:
In order to develop a baseline model for handwritten digit recognition, we further divide train dataset into twp parts one as train dataset and one as validation dataset. The Keras API supports this by specifying the “validation_data” parameter to the model.fit() method when training the model.
Keras API also Provides “validation_split” parameter in the model.fit() method which directly split the dataset into a train and validation set. We do not need to provide the validation dataset explicitly.
# Validation by specifying validation_data model.fit(..., validation_data=(valX, valY)) # Validation by specifying validation_split model.fit(..., validation_split=0.2) this will split the train dataset and consider 20% data for validation
We need to reshape the data arrays to have a single color channel.
In [3]: trainX = trainX.reshape((trainX.shape[0], 28, 28, 1)) testX = testX.reshape((testX.shape[0], 28, 28, 1)) print('trainX : {} '.format(trainX.shape)) print('testX : {} '.format(testX.shape)) Out[3]: trainX : (60000, 28, 28, 1) testX : (10000, 28, 28, 1)
There are a total of 10 classes for digit between 0 to 1. We use one-hot encoding for class labels. Keras API provides the utility function to_categorical() for one-hot encoding.
In [4]: from keras.utils import to_categorical trainY = to_categorical(trainy) testY = to_categorical(testy) print('trainY shape : {} '.format(trainY.shape)) print('testY shape : {} '.format(testY.shape)) Out[4]: trainY shape : (60000, 10) testY shape : (10000, 10)
Pixel values of an image are in the range between 0 to 255. Generally, to achieve the better performance we need to feed normalized input values to the neural network.
Let’s normalized each pixel values to the range [0,1]. we can normalize input data by first converting the data types to float and followed by dividing pixel values by the maximum value.
In [5]: train_norm = trainX.astype('float32') test_norm = testX.astype('float32') # normalize to range [0,1] train_norm = train_norm / 255.0 test_norm = test_norm / 255.0
Let’s define a baseline a Convolutional neural network model and train it.
In [6]: from keras.models import Sequential from keras.layers import Conv2D from keras.layers import MaxPooling2D from keras.layers import Dense from keras.layers import Flatten from keras.layers import Dense, Dropout, Flatten from keras.optimizers import SGD num_classes = 10 def prepare_model(): model = Sequential() model.add(Conv2D(32,kernel_size=(3,3),activation='relu',input_shape=(28, 28, 1))) model.add(Conv2D(64, (3, 3), activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) model.add(Flatten()) model.add(Dense(128, activation='relu')) model.add(Dropout(0.5)) model.add(Dense(num_classes, activation='softmax')) model.compile(loss="categorical_crossentropy",optimizer="adam",metrics=['accuracy']) return model In [7]: model = prepare_model() model.fit(train_norm, trainY, batch_size=128,validation_split=0.2,epochs=3,verbose=1) Out[7]: Train on 48000 samples, validate on 12000 samples Epoch 1/3 48000/48000 [==============================] - 163s 3ms/step - loss: 0.2748 - acc: 0.9157 - val_loss: 0.0685 - val_acc: 0.9801 Epoch 2/3 48000/48000 [==============================] - 159s 3ms/step - loss: 0.0937 - acc: 0.9721 - val_loss: 0.0457 - val_acc: 0.9872 Epoch 3/3 48000/48000 [==============================] - 181s 4ms/step - loss: 0.0707 - acc: 0.9784 - val_loss: 0.0420 - val_acc: 0.9879
Learning curves
Let’s take a look at the learning curves of the training and validation accuracy and loss.
In [8]: acc = history.history['acc'] val_acc = history.history['val_acc'] loss = history.history['loss'] val_loss = history.history['val_loss'] plt.figure(figsize=(8, 8)) plt.subplot(2, 1, 1) plt.plot(acc, label='Training Accuracy') plt.plot(val_acc, label='Validation Accuracy') plt.legend(loc='lower right') plt.ylabel('Accuracy') plt.ylim([min(plt.ylim()),1]) plt.title('Training and Validation Accuracy') plt.subplot(2, 1, 2) plt.plot(loss, label='Training Loss') plt.plot(val_loss, label='Validation Loss') plt.legend(loc='upper right') plt.ylabel('Cross Entropy') plt.ylim([0,1.0]) plt.title('Training and Validation Loss') plt.xlabel('epoch') plt.show() Out[8]:
Model Evaluation
Let’s evaluate the trained model on test data and observe the accuracy.
In [9]: score = model.evaluate(test_norm, testY, verbose=0) print('Test loss:', score[0]) print('Test accuracy:', score[1]) Out[9]: Test loss: 0.0346724861223207 Test accuracy: 0.9887
Training very deep neural network on a large dataset takes a lot amount of time sometimes it takes a day, weeks. Instead of training model each time, we should save the trained model and used it for prediction.
Please refer to this tutorial to save the trained model and load that model to make a prediction on a new test sample.
. . .