Introduction to TensorFlow Lite

TensorFlow Lite is TensorFlow’s lightweight solution to run TensorFlow models on mobile, embedded and IoT devices with low latency quickly. With TensorFlow Lite we can perform classification and regression task without incurring server cost.

TensorFlow Lite model supported on both Andriod and iOS via C++ and Java API. Generally, the size of the TensorFlow model is quite big that we can’t run it as it is on mobile and other embedded devices with limited compute and memory resources. So to make it possible to run TensorFlow model on low processing devices,  TensorFlow introduces a lightweight solution i.e TensorFlow Lite.

In the following tutorial series, you will get to know about TensorFlow Lite in details :

  • How to convert trained Tensorflow model to TFLite using TFLite converter
  • How to run TFLite model in Android app
  • How to run TFLite model in iOS app
  • How to run TFLite model in raspberry pi

 

Before we start with TFLite, we need a trained model that is trained on a set of data using a high powered machine. This trained model can be converted to TensorFlow Lite format. We cannot create or train a model using TensorFlow  Lite. So, we must start with a regular TensorFlow model and then convert it to TFLite.

A TensorFlow model is a data structure that contains the logic and knowledge of the neural network. The trained model consists of various model files and a set of associated checkpoints. This model can be exported to be used on the mobile device.

tensorflow tflite model converter

 

The above diagram represents the standard flow to convert TensorFlow model to TensorFlow Lite. First, we need to train our model using TensorFlow’s high-level API like Keras or low-level API. Once we trained the model, we can save the model in various formats depending on the API that we have used while training. We can save our model either HDF5 file or SavedModel format if the model is trained using Keras. On other hand, if the model is trained using low-level API, we can save the model in SavedModel format or as a concrete function.

The next step is to convert the saved TensorFlow model to TensorFlow Lite model using TFLite converter. TensorFlow Lite model is an optimized FlatBuffer format identified by the .tflite file extension. Flatbuffers is a special serialization format that is optimized for performance.

We can deploy TFLite model on a mobile or embedded device and run it locally on these devices using TensorFlow Lite interpreter.

Trained a TensorFlow Model

Let’s a build simple Convolutional Neural Network (CNN) to classify CIFAR images. In this tutorial, the model is trained using the Keras API.

Import TensorFlow

import tensorflow as tf

from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
import numpy as np

tf.__version__
'2.3.1'

Download and prepare the CIFAR10 dataset

The CIFAR10 dataset consists of 60,000 images in 10 classes, with 6,000 images in each class. The dataset has 50,000 training images and 10,000 testing images.

(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()

# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0
Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
170500096/170498071 [==============================] - 1093s 6us/step
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

Build & Train the Model

Let’s a build and train simple Convolutional Neural Network (CNN) architecture on CIFAR10 train data.

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))

model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10))

model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])

history = model.fit(train_images, train_labels, epochs=10, 
validation_data=(test_images, test_labels))
Epoch 1/10
1563/1563 [==============================] - 27s 17ms/step - loss: 1.5473 - accuracy: 0.4386 - val_loss: 1.2637 - val_accuracy: 0.5478
Epoch 2/10
1563/1563 [==============================] - 23s 15ms/step - loss: 1.1778 - accuracy: 0.5810 - val_loss: 1.1323 - val_accuracy: 0.5976
Epoch 3/10
1563/1563 [==============================] - 24s 15ms/step - loss: 1.0363 - accuracy: 0.6364 - val_loss: 1.1376 - val_accuracy: 0.5918
Epoch 4/10
1563/1563 [==============================] - 26s 16ms/step - loss: 0.9444 - accuracy: 0.6671 - val_loss: 1.0019 - val_accuracy: 0.6496
Epoch 5/10
1563/1563 [==============================] - 24s 15ms/step - loss: 0.8814 - accuracy: 0.6929 - val_loss: 1.0200 - val_accuracy: 0.6463
Epoch 6/10
1563/1563 [==============================] - 26s 17ms/step - loss: 0.8262 - accuracy: 0.7118 - val_loss: 0.9130 - val_accuracy: 0.6802
Epoch 7/10
1563/1563 [==============================] - 31s 20ms/step - loss: 0.7832 - accuracy: 0.7248 - val_loss: 0.9493 - val_accuracy: 0.6733
Epoch 8/10
1563/1563 [==============================] - 25s 16ms/step - loss: 0.7412 - accuracy: 0.7391 - val_loss: 0.9036 - val_accuracy: 0.6924
Epoch 9/10
1563/1563 [==============================] - 24s 15ms/step - loss: 0.7075 - accuracy: 0.7513 - val_loss: 0.8870 - val_accuracy: 0.6957
Epoch 10/10
1563/1563 [==============================] - 24s 15ms/step - loss: 0.6747 - accuracy: 0.7614 - val_loss: 0.8922 - val_accuracy: 0.6986

Evaluate the Model

test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
313/313 - 1s - loss: 0.8922 - accuracy: 0.6986

Convert to a TensorFlow Lite Model

TensorFlow Lite Converter is a Python API that converts trained Tensorflow model into TensorFlow Lite format. The TensorFlow Lite converter enables us to reduce the model file size by applying a various degree of quantization with little degradation in the model accuracy. Quantization used to reduce the precision of model’s parameters which by default are 32-bit floating-point numbers. Below list is types of various degree of quantization available in TensorFlow Lite:

  1. Post-training float16 quantization
  2. Post-training dynamic range quantization
  3. Post-training integer quantization

 

These quantization techniques allow us to reduce the model size that can fit in low computation devices. You can refer to this tutorial to get to know each quantization techniques in details with example. The following example converts a model into TFLite model with no quantization.

There are various methods exist to convert TensorFlow model to TensorFlow Lite model. Below are the available options:

When converting tensorFlow 2.x models using tf.lite.TFLiteConverter

  1. tf.lite.TFLiteConverter.from_saved_model() (recommended): Converts a SavedModel.
  2. tf.lite.TFLiteConverter.from_keras_model(): Converts a Keras model.
  3. tf.lite.TFLiteConverter.from_concrete_functions(): Converts concrete functions.

 

When converting tensorFlow 1.x models using tf.compat.v1.lite.TFLiteConverter

  1. tf.compat.v1..lite.TFLiteConverter.from_saved_model() (recommended): Converts a SavedModel.
  2. tf.compat.v1.lite.TFLiteConverter.from_keras_model_file(): Converts a Keras model.
  3. tf.compat.v1.lite.TFLiteConverter.from_frozen_graph(): Converts a Frozen GraphDef from a file.
  4. tf.compat.v1.lite.TFLiteConverter.from_session(): Converts a GraphDef from a session.

 

You can refer to this tutorial to get to know about each above TFLiteConverter() methods in details. Here, let’s convert the Keras model into TensorFlow Lite using tf.lite.TFLiteConverter.from_keras_model() function.

converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
open("cifar10_model.tflite", "wb").write(tflite_model)
493624

Here, the return value 493624 is the size (in bytes) of the generated TensorFlow Lite model with no quantization. We can even significantly reduce the TFLite model size by applying quantization. Refer this tutorial to learn about quantization methods.

Run inference with the TFLite model

The converted TFLite model can be executed on mobile, embedded and IoT devices. The TensorFlow Lite Interpreter used to run an inference with TFLite model. TensorFlow Lite Interpreter is a library that takes a TFLite model file, executes the operations on input data and provide output.

Interpreter support multiple platforms and provide API to execute TFLite model from Java, C++, Python, Swift and Objective-C.

You can refer to this tutorial to run an inference with TFLite model using all above mention platform. Here, let’s make a prediction on the test image using TensorFlow Lite with Python. Python is a great platform to run TFLite model on embedded devices based on Linux such as Raspberry Pi and Coral.

We need to install TensorFlow Lite interpreter package to run TensorFlow Lite model with Python. We don’t require full TensorFlow packages to installed on embedded devices. This interpreter-only package is a small part of the entire TensorFlow library that only include the required code to run TFLite model. It is a very lightweight package.

Install TensorFlow Lite Interpreter

To install, run pip3 install wheel URL Select device appropriate Python wheel URL from this link.

Execute TensorFlow Lite Interpreter in Python

To distinguish this interpreter-only package from the full TensorFlow package, the Python module provided in the above wheel is named tflite_runtime.

So instead of importing Interpreter from the TensorFlow module, you need to import it from tflite_runtime. However, you can also test the TFLite model on full TensorFlow module.

Import TensorFlow & Load TFLite model

If you are running TFLite model on full TensorFlow package:

import tensorflow as tf
interpreter = tf.lite.Interpreter(model_path='cifar10_model.tflite')
interpreter.allocate_tensors()

If you are running TFLite model on the interpreter-only package:

import tflite_runtime.interpreter as tflite
interpreter = tflite.Interpreter(model_path='cifar10_model.tflite')
interpreter.allocate_tensors()
# Get input and output tensors
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

Let’s print the input_details and output_details variable. The input_details represents the pieces of information related to input data formats such as the shape of input data, the data type of input and other various information related to quantization. And output_details contains information related to the model output format.

input_details
[{'name': 'conv2d_input',
  'index': 0,
  'shape': array([ 1, 32, 32,  3], dtype=int32),
  'shape_signature': array([-1, 32, 32,  3], dtype=int32),
  'dtype': numpy.float32,
  'quantization': (0.0, 0),
  'quantization_parameters': {'scales': array([], dtype=float32),
   'zero_points': array([], dtype=int32),
   'quantized_dimension': 0},
  'sparsity_parameters': {}}]
output_details
[{'name': 'Identity',
  'index': 19,
  'shape': array([ 1, 10], dtype=int32),
  'shape_signature': array([-1, 10], dtype=int32),
  'dtype': numpy.float32,
  'quantization': (0.0, 0),
  'quantization_parameters': {'scales': array([], dtype=float32),
   'zero_points': array([], dtype=int32),
   'quantized_dimension': 0},
  'sparsity_parameters': {}}]

Run the interpreter on a test image

test_image = np.expand_dims(test_images[1], axis=0).astype(np.float32)

plt.figure(figsize=(2,2))
plt.imshow(test_images[1])

test_tflite_model_on_image

interpreter.set_tensor(input_details[0]['index'], test_image)
interpreter.invoke()
predictions = interpreter.get_tensor(output_details[0]['index'])

prediction_class = np.argmax(predictions)
prediction_class_name = class_names[prediction_class]
prediction_class_name
'ship'

Leave a Reply

Your email address will not be published. Required fields are marked *

Machine Learning Model Tutorials

Content-Based Recommendation System

Face verification on Live CCTV IP camera feed using Amazon Rekognition Video and Kinesis Video Streams

AWS Rekognition: Face detection & analysis of an image

Stream CCTV IP camera (RTSP) feed into AWS Kinesis Video Streams

Model Quantization Methods In TensorFlow Lite

TensorFlow : Prepare Custom Neural Network Model with Custom Layers

Regularization Techniques: To avoid Overfitting in Neural Network

Setting Dynamic Learning Rate While Training the Neural Network

Neural Network: Introduction to Learning Rate

Mathematics behind the Neural Network

Implementation of Neural Network from scratch using NumPy

How Neural Network works?

Gradient Descent with Momentum in Neural Network

Gradient Descent in Neural Network

Activation Functions in Neural Network

Introduction to Neural Network

K-Nearest Neighbors (KNN)

Support Vector Machine (SVM)

Logistic Regression

Linear Regression

Random Forest

Decision Tree

Introduction to Machine Learning Model

Performance Measurement Metrics to Evaluate Machine Learning Model

Essential Mathematics for Machine Learning

Applications of Machine Learning

Study Machine Learning