Keras’ ImageDataGenerator class allows the users to perform image augmentation while training the model. If you do not have sufficient knowledge about data augmentation, please refer to this tutorial which has explained the various transformation methods with examples. You can also refer this Keras’ ImageDataGenerator tutorial which has explained how this ImageDataGenerator class work.
Keras’ ImageDataGenerator class provide three different functions to loads the image dataset in memory and generates batches of augmented data. These three functions are:
- .flow()
- .flow_from_directory()
- .flow_from_dataframe.()
Each of these function is achieving the same task to loads the image dataset in memory and generates batches of augmented data, but the way to accomplish the task is different.
This tutorial has explained flow_from_directory() function with example. The flow_from_directory() method takes a path of a directory and generates batches of augmented data.
The directory structure is very important when you are using flow_from_directory() method. The flow_from_directory() assumes:
- The root directory contains at least two folders one for train and one for the test.
- The train folder should contain n sub-directories each containing images of respective classes.
- The test folder should contain a single folder, which stores all test images.
The below figure represents the directory structure:
The syntax to call flow_from_directory() function is as follows:
flow_from_directory(directory, target_size=(256, 256), color_mode='rgb', classes=None, class_mode='categorical', batch_size=32, shuffle=True, seed=None, save_to_dir=None, save_prefix='', save_format='png', follow_links=False, subset=None, interpolation='nearest')
Prepare Dataset
For demonstration, we use the fruit dataset which has two types of fruit such as banana and Apricot. Each class contain 50 images. You can download the dataset here and save & unzip it in your current working directory. we need to train a classifier which can classify the input fruit image into class Banana or Apricot.
Directory Structure
The directory structure must be like as below:
| --- data | | --- train | | | --- Apricot [50 images] | | | --- Banana [50 images] | | --- test | | | --- predict [25 images]
Let’s plot the images of train data.
In [1]: %matplotlib inline import matplotlib.pyplot as plt import os src_path = "data/train/" sub_class = os.listdir(src_path) fig = plt.figure(figsize=(10,5)) path = os.path.join(src_path,sub_class[0]) for i in range(4): plt.subplot(240 + 1 + i) img = plt.imread(os.path.join(path,str(i)+'.jpg')) plt.imshow(img, cmap=plt.get_cmap('gray')) path = os.path.join(src_path,sub_class[1]) for i in range(4,8): plt.subplot(240 + 1 + i) img = plt.imread(os.path.join(path,str(i)+'.jpg')) plt.imshow(img, cmap=plt.get_cmap('gray')) Out[1]:
Implementing a training Script
Let’s import the required packages.
In [2]: from keras.models import Sequential from keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Dropout from keras.preprocessing.image import ImageDataGenerator
Let’s initialize Keras’ ImageDataGenerator class
In [3]: src_path_train = "data/train/" src_path_test = "data/test/" train_datagen = ImageDataGenerator( rescale=1 / 255.0, rotation_range=20, zoom_range=0.05, width_shift_range=0.05, height_shift_range=0.05, shear_range=0.05, horizontal_flip=True, fill_mode="nearest", validation_split=0.20) test_datagen = ImageDataGenerator(rescale=1 / 255.0)
Let’s initialize our training, validation and testing generator:
In [4]: batch_size = 8 train_generator = image_datagen.flow_from_directory( directory=src_path_train, target_size=(100, 100), color_mode="rgb", batch_size=batch_size, class_mode="categorical", subset='training', shuffle=True, seed=42 ) valid_generator = image_datagen.flow_from_directory( directory=src_path_train, target_size=(100, 100), color_mode="rgb", batch_size=batch_size, class_mode="categorical", subset='validation', shuffle=True, seed=42 ) test_generator = test_datagen.flow_from_directory( directory=src_path_test, target_size=(100, 100), color_mode="rgb", batch_size=1, class_mode=None, shuffle=False, seed=42 ) Out[4]: Found 80 images belonging to 2 classes. Found 20 images belonging to 2 classes. Found 25 images belonging to 1 classes.
. . .