Keras ImageDataGenerator with flow_from_dataframe()

Keras’ ImageDataGenerator class allows the users to perform image augmentation while training the model. If you do not have sufficient knowledge about data augmentation, please refer to this tutorialwhich has explained the various transformation methods with examples. You can also refer this Keras’ ImageDataGenerator tutorial which has explained how this ImageDataGenerator class work.

Keras’ ImageDataGenerator class provide three different functions to loads the image dataset in memory and generates batches of augmented data. These three functions are:

  • .flow()
  • .flow_from_directory()
  • .flow_from_dataframe.()

 

Each of these function is achieving the same task to loads the image dataset in memory and generates batches of augmented data, but the way to accomplish the task is different.

This tutorial has explained flow_from_dataframe() function with example.  The flow_from_dataframe() method takes the Pandas DataFrame and the path to a directory and generates batches of augmented/normalized data.

flow_from_dataframe(dataframe, directory=None, x_col='filename', y_col='class', weight_col=None, target_size=(256, 256), color_mode='rgb', classes=None, class_mode='categorical', batch_size=32, shuffle=True, seed=None, save_to_dir=None, save_prefix='', save_format='png', subset=None, interpolation='nearest', validate_filenames=True)

Prepare Dataset

For demonstration, we use the fruit dataset which has two types of fruit such as banana and Apricot. Each class contain 50 images. You can download the dataset here and save & unzip it in your current working directory. The downloaded dataset contains two .csv file. we will use this .csv file with flow_from_dataframe() function.

Directory Structure

| --- data
|     | --- train [100 images]
|     | --- test  [25 images]
|     | --- train_data.csv
|     | --- test_data.csv

Let’s plot the few images of train data.

In [1]: 
%matplotlib inline
import matplotlib.pyplot as plt
import os
src_path = "data/train"
sub_class = os.listdir(src_path)

fig = plt.figure(figsize=(10,5))
for e in range(len(sub_class[:8])):
    plt.subplot(2,4,e+1)
    img = plt.imread(os.path.join(src_path,sub_class[e]))
    plt.imshow(img, cmap=plt.get_cmap('gray'))

Out[1]:

we need to train a  classifier which can classify the input fruit image into class Banana or Apricot.

Implementing a training Script

Let’s import the required packages.

In [2]:
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Dropout
from keras.preprocessing.image import ImageDataGenerator
import pandas as pd

Let’s load the Pandas DataFrame

In [3]:
train_df = pd.read_csv('data/train_data.csv')
test_df = pd.read_csv('data/test_data.csv')
train_df['target'] = train_df['target'].astype(str)
train_df.head()

Out[3]:
   img_code target
0   66.jpg      0
1   88.jpg      0
2   41.jpg      1
3   71.jpg      0
4   46.jpg      1

Let’s initialize Keras’ ImageDataGenerator class:

In [4]:
src_path_train = "data/train/"
src_path_test = "data/test/"

train_datagen = ImageDataGenerator(
        rescale=1 / 255.0,
        rotation_range=20,
        zoom_range=0.05,
        width_shift_range=0.05,
        height_shift_range=0.05,
        shear_range=0.05,
        horizontal_flip=True,
        fill_mode="nearest",
        validation_split=0.20)

test_datagen = ImageDataGenerator(rescale=1 / 255.0)

Let’s initialize our training, validation and testing generator:

In [5]:
batch_size = 8
train_generator = train_datagen.flow_from_dataframe(
    dataframe=train_df,
    directory=src_path_train,
    x_col="img_code",
    y_col="target",
    target_size=(100, 100),
    batch_size=batch_size,
    class_mode="categorical",
    subset='training',
    shuffle=True,
    seed=42
)
valid_generator = train_datagen.flow_from_dataframe(
    dataframe=train_df,
    directory=src_path_train,
    x_col="img_code",
    y_col="target",
    target_size=(100, 100),
    batch_size=batch_size,
    class_mode="categorical",
    subset='validation',
    shuffle=True,
    seed=42
)
test_generator = test_datagen.flow_from_dataframe(
    dataframe=test_df,
    directory=src_path_test,
    x_col="img_code",
    target_size=(100, 100),
    batch_size=1,
    class_mode=None,
    shuffle=False,
)

Let’s define the Convolutional Neural Network (CNN)

In [6]:
def prepare_model():
    model = Sequential()
    model.add(Conv2D(32,kernel_size=(3,3),activation='relu',input_shape=(100, 100, 3)))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Flatten())
    model.add(Dense(16, activation='relu'))
    model.add(Dense(2, activation='sigmoid'))
    model.compile(loss="binary_crossentropy",optimizer="adam",metrics=['accuracy'])
    return model

Let’s train the model using fit_generator:

In [7]:
model = prepare_model()
model.fit_generator(train_generator,
                    validation_data = train_generator,
                    steps_per_epoch = train_generator.n//train_generator.batch_size,
                    validation_steps = valid_generator.n//valid_generator.batch_size,
                    epochs=5)

Out[7]:
Epoch 1/5
2/2 [==============================] - 1s 724ms/step - loss: 0.7246 - acc: 0.4219 - val_loss: 1.1718 - val_acc: 0.5062
Epoch 2/5
2/2 [==============================] - 1s 467ms/step - loss: 1.3703 - acc: 0.4653 - val_loss: 1.0518 - val_acc: 0.4938
Epoch 3/5
2/2 [==============================] - 1s 459ms/step - loss: 0.6531 - acc: 0.6719 - val_loss: 0.4665 - val_acc: 0.7375
Epoch 4/5
2/2 [==============================] - 1s 398ms/step - loss: 0.4396 - acc: 0.6858 - val_loss: 0.3845 - val_acc: 0.7500
Epoch 5/5
2/2 [==============================] - 1s 551ms/step - loss: 0.4066 - acc: 0.7344 - val_loss: 0.2907 - val_acc: 0.9250

Let’s evaluate our model performance

In [8]: 
score = model.evaluate_generator(valid_generator)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Out[8]:
Test loss: 0.079421
Test accuracy: 0.9750

Let’s make a prediction on test data using Keras’ predict_generator

In [9]:
predict=model.predict_generator(test_generator, steps = len(test_generator.filenames))

Keras’ predict_generator return the class probability of each class. Let’s print the prediction of the first 5 test data.

In [10]: predict[:5]
Out[10]: 
array([[2.3009509e-03, 9.9970609e-01],
       [1.0000000e+00, 1.6062003e-07],
       [1.8170279e-02, 9.9945873e-01],
       [1.0000000e+00, 6.9154913e-07],
       [2.6343190e-04, 9.9991655e-01]], dtype=float32)

If you want to predict the class label, use the below method:

In [11]: 
y_classes = predict.argmax(axis=-1)
print(y_classes)
Out[11]:
array([1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0,
       0, 1, 1])

.     .     .

Leave a Reply

Your email address will not be published. Required fields are marked *

Computer Vision Tutorials