Intro to TinyML Part 2: Deploying a TensorFlow Lite Model to Arduino

2020-04-20 | By ShawnHymel

License: Attribution Arduino

In the previous tutorial, we trained a TensorFlow Lite model to predict sine function values when given a value between 0 and 2π as an input. We then created a .h header file using the constant bytes that make up the TensorFlow Lite model file, which can be loaded into a C program.

In this tutorial, we will load the model in Arduino using the TensorFlow Lite library and use it to run inference to generate an approximation of a sinewave.

Note that this is a ridiculous, roundabout way to create a sinewave, but it offers a useful example to show how a nonlinear neural network model can be deployed to an embedded system.

You can watch the steps in this tutorial through video form here:

Model Description

Our model is a 3-layer, fully connected neural network with a single, floating point input and a single, floating point output.

Sine prediction model

If you downloaded the .tflite file in the previous tutorial, you can use Netron to view the model in a slick, graphical interface. Run Netron and use it to open the .tflite file. You can click on the individual layers to get more details about them, such as input/output tensor shapes and data types.

Using Netron to view neural network model

As you can see, our model expects a tensor with 1 floating point element (a scalar value) as an input, and it outputs another scalar value. The input should be between 0 and 2π, and the output should be between -1 and 1.

Install TensorFlow Lite Arduino Library

TensorFlow Lite has support for a few microcontroller boards, which are listed here. At the time this tutorial was released, only 8 microcontroller boards were supported. We will use the pre-compiled TensorFlow Lite library for TensorFlow Lite, but note that only the Nano 33 BLE Sense is supported (right now).

Open your Arduino IDE (this tutorial was tested on v1.8.11). Go to Sketch > Include Library > Manage Libraries… and search for “TensorFlow.” Install the latest version of the Arduino_TensorFlowLite library (1.15.0-ALPHA was tested for this tutorial).

Install TensorFlow Lite library for Arduino

In Tools > Board, select the Arduino Nano 33 BLE. Plug in your Nano 33 BLE Sense and select the associated Serial port in Tools > Port.

Test Inference

Note: This code was originally developed by Pete Warden of the TensorFlow Lite team to demonstrate TensorFlow Lite capabilities on various microcontroller platforms. I have modified it here to reduce the number of dependent files in Arduino, which should hopefully make it easier to follow.

Copy the following into your Arduino sketch:

Copy Code

/**
 * Test sinewave neural network model
 * 
 * Author: Pete Warden
 * Modified by: Shawn Hymel
 * Date: March 11, 2020
 * 
 * Copyright 2019 The TensorFlow Authors. All Rights Reserved.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *    http://www.apache.org/licenses/LICENSE-2.0
 * 
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

// Import TensorFlow stuff
#include "TensorFlowLite.h"
#include "tensorflow/lite/experimental/micro/kernels/micro_ops.h"
#include "tensorflow/lite/experimental/micro/micro_error_reporter.h"
#include "tensorflow/lite/experimental/micro/micro_interpreter.h"
#include "tensorflow/lite/experimental/micro/micro_mutable_op_resolver.h"
#include "tensorflow/lite/version.h"

// Our model
#include "sine_model.h"

// Figure out what's going on in our model
#define DEBUG 1

// Some settings
constexpr int led_pin = 2;
constexpr float pi = 3.14159265;                  // Some pi
constexpr float freq = 0.5;                       // Frequency (Hz) of sinewave
constexpr float period = (1 / freq) * (1000000);  // Period (microseconds)

// TFLite globals, used for compatibility with Arduino-style sketches
namespace {
  tflite::ErrorReporter* error_reporter = nullptr;
  const tflite::Model* model = nullptr;
  tflite::MicroInterpreter* interpreter = nullptr;
  TfLiteTensor* model_input = nullptr;
  TfLiteTensor* model_output = nullptr;

  // Create an area of memory to use for input, output, and other TensorFlow
  // arrays. You'll need to adjust this by combiling, running, and looking
  // for errors.
  constexpr int kTensorArenaSize = 5 * 1024;
  uint8_t tensor_arena[kTensorArenaSize];
} // namespace

void setup() {

  // Wait for Serial to connect
#if DEBUG
  while(!Serial);
#endif

  // Let's make an LED vary in brightness
  pinMode(led_pin, OUTPUT);

  // Set up logging (will report to Serial, even within TFLite functions)
  static tflite::MicroErrorReporter micro_error_reporter;
  error_reporter = &micro_error_reporter;

  // Map the model into a usable data structure
  model = tflite::GetModel(sine_model);
  if (model->version() != TFLITE_SCHEMA_VERSION) {
    error_reporter->Report("Model version does not match Schema");
    while(1);
  }

  // Pull in only needed operations (should match NN layers)
  // Available ops:
  //  https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/micro/kernels/micro_ops.h
  static tflite::MicroMutableOpResolver micro_mutable_op_resolver;
  micro_mutable_op_resolver.AddBuiltin(
    tflite::BuiltinOperator_FULLY_CONNECTED,
    tflite::ops::micro::Register_FULLY_CONNECTED(),
    1, 3);

  // Build an interpreter to run the model
  static tflite::MicroInterpreter static_interpreter(
    model, micro_mutable_op_resolver, tensor_arena, kTensorArenaSize,
    error_reporter);
  interpreter = &static_interpreter;

  // Allocate memory from the tensor_arena for the model's tensors
  TfLiteStatus allocate_status = interpreter->AllocateTensors();
  if (allocate_status != kTfLiteOk) {
    error_reporter->Report("AllocateTensors() failed");
    while(1);
  }

  // Assign model input and output buffers (tensors) to pointers
  model_input = interpreter->input(0);
  model_output = interpreter->output(0);

  // Get information about the memory area to use for the model's input
  // Supported data types:
  // https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/c/common.h#L226
#if DEBUG
  Serial.print("Number of dimensions: ");
  Serial.println(model_input->dims->size);
  Serial.print("Dim 1 size: ");
  Serial.println(model_input->dims->data[0]);
  Serial.print("Dim 2 size: ");
  Serial.println(model_input->dims->data[1]);
  Serial.print("Input type: ");
  Serial.println(model_input->type);
#endif
}

void loop() {

#if DEBUG
  unsigned long start_timestamp = micros();
#endif

  // Get current timestamp and modulo with period
  unsigned long timestamp = micros();
  timestamp = timestamp % (unsigned long)period;

  // Calculate x value to feed to the model
  float x_val = ((float)timestamp * 2 * pi) / period;

  // Copy value to input buffer (tensor)
  model_input->data.f[0] = x_val;

  // Run inference
  TfLiteStatus invoke_status = interpreter->Invoke();
  if (invoke_status != kTfLiteOk) {
    error_reporter->Report("Invoke failed on input: %f\n", x_val);
  }

  // Read predicted y value from output buffer (tensor)
  float y_val = model_output->data.f[0];

  // Translate to a PWM LED brightness
  int brightness = (int)(255 * y_val);
  analogWrite(led_pin, brightness);

  // Print value
  Serial.println(y_val);

#if DEBUG
  Serial.print("Time for inference (us): ");
  Serial.println(micros() - start_timestamp);
#endif
}

For this first test, we’re simply assigning the value of pi (approximated to 3.14159265) to the input tensor:

Copy Code

model_input->data.f[0] = pi;

With this, we will run inference on just one number (over and over again, as it is in the loop function).

When you run the sketch, open the Serial monitor to see the output value. Also, note that because we have the DEBUG flag defined to 1, we can get an estimate of how long inference takes.

Model output

Notice that the output of the model is always 0.03. While sin(π) should be 0, 0.03 seems to be close enough. Additionally, notice that it takes about 1 ms to perform inference. This number is based on the performance capabilities of the microcontroller and the size/complexity of the model. This is a relatively simple model, so expect inference to take longer as you start using more complex models.

Examine the Code

Let’s take a quick look at what’s going on in the code.

At the top, we define a number of pointers that we’ll use throughout the rest of the sketch. Note that they’re in an anonymous namespace, which probably isn’t needed but follows the other TensorFlow Lite examples.

Copy Code

// TFLite globals, used for compatibility with Arduino-style sketches
namespace {
  tflite::ErrorReporter* error_reporter = nullptr;
  const tflite::Model* model = nullptr;
  tflite::MicroInterpreter* interpreter = nullptr;
  TfLiteTensor* model_input = nullptr;
  TfLiteTensor* model_output = nullptr;

  // Create an area of memory to use for input, output, and other TensorFlow
  // arrays. You'll need to adjust this by combiling, running, and looking
  // for errors.
  constexpr int kTensorArenaSize = 5 * 1024;
  uint8_t tensor_arena[kTensorArenaSize];
} // namespace

Note that we are setting aside a chunk of memory for an “arena” (essentially, a sandbox of RAM that TensorFlow Lite uses to perform calculations and store tensors). Unfortunately, we must predict the arena size. 5 kB seems to work for this model, but if you have problems during the “allocate tensors” step later, you should try increasing the arena size.

We set up logging, which, for Arduino, will output debugging information to the Serial port. Note that this will output information from within some of the TensorFlow Lite functions, which can be very helpful when something isn’t working.

Copy Code

// Set up logging (will report to Serial, even within TFLite functions)
static tflite::MicroErrorReporter micro_error_reporter;
error_reporter = &micro_error_reporter;

To help reduce the required space, we only pull in the required TensorFlow Lite operations with the following code:

Copy Code

// Pull in only needed operations (should match NN layers)
static tflite::MicroMutableOpResolver micro_mutable_op_resolver;
micro_mutable_op_resolver.AddBuiltin(
  tflite::BuiltinOperator_FULLY_CONNECTED,
  tflite::ops::micro::Register_FULLY_CONNECTED(),
  1, 3);

You will need to pick out the ones that line up with the layers (and other operations) you defined when constructing the model. It can help to look at Netron to see the required operations.

In this next section, we build the interpreter using the parameters we just defined and then allocate the necessary memory using the arena. Then, we get handles to the input and output tensors. Note that these tensors are just one element each (scalars).

Copy Code

// Build an interpreter to run the model
static tflite::MicroInterpreter static_interpreter(
  model, micro_mutable_op_resolver, tensor_arena, kTensorArenaSize,
  error_reporter);
interpreter = &static_interpreter;

// Allocate memory from the tensor_arena for the model's tensors
TfLiteStatus allocate_status = interpreter->AllocateTensors();
if (allocate_status != kTfLiteOk) {
  error_reporter->Report("AllocateTensors() failed");
  while(1);
}

// Assign model input and output buffers (tensors) to pointers
model_input = interpreter->input(0);
model_output = interpreter->output(0);

If the AllocateTensors() function fails during runtime, you may want to try increasing the arena size defined earlier. There seems to be a bit of trial and error required to get the memory management right.

In loop, we copy our input values to the input tensor. Note the index of 0; if this was a multi-element tensor, we’d want to copy everything in with something like a for loop.

Copy Code

// Copy value to input buffer (tensor)
model_input->data.f[0] = pi;

We tell the interpreter to run inference with the Invoke() function:

Copy Code

// Run inference
TfLiteStatus invoke_status = interpreter->Invoke();
if (invoke_status != kTfLiteOk) {
  error_reporter->Report("Invoke failed on input: %f\n", x_val);
}

Note that at this time, Invoke is blocking, so we must wait while it performs its calculations.

When it’s done, we can access the output value in our model_output handle:

Copy Co

float y_val = model_output->data.f[0];

Hopefully, this should give you an idea of the required TensorFlow operations that you need to run to get inference working on a microcontroller!

Connect Inference to Hardware

Connect an LED from pin 2 to GND (through a 100 Ω).

At the top of the program, change the DEBUG flag to the following:

Copy Code

#define DEBUG 0

Then, change the loop() function to the following:

Copy Code

void loop() {

#if DEBUG
  unsigned long start_timestamp = micros();
#endif

  // Get current timestamp and modulo with period
  unsigned long timestamp = micros();
  timestamp = timestamp % (unsigned long)period;

  // Calculate x value to feed to the model
  float x_val = ((float)timestamp * 2 * pi) / period;

  // Copy value to input buffer (tensor)
  model_input->data.f[0] = x_val;

  // Run inference
  TfLiteStatus invoke_status = interpreter->Invoke();
  if (invoke_status != kTfLiteOk) {
    error_reporter->Report("Invoke failed on input: %f\n", x_val);
  }

  // Read predicted y value from output buffer (tensor)
  float y_val = model_output->data.f[0];

  // Translate to a PWM LED brightness
  int brightness = (int)(255 * y_val);
  analogWrite(led_pin, brightness);

  // Print value
  Serial.println(y_val);

#if DEBUG
  Serial.print("Time for inference (us): ");
  Serial.println(micros() - start_timestamp);
#endif
}

Notice that we’re now calculating the input to the sine prediction model based on the current timestamp. By doing this, we can effectively control the sinewave’s frequency using the freq variable at the top of the program.

Upload this code and open a Serial monitor. You should see the values of the model’s output fly by.

Serial monitor with sine output

This isn’t terribly useful, so go close it and select Tools > Serial Plotter. This should show you a slow-moving sinewave as a plot versus time.

Serial plotter with sinewave

If you take a look at the LED connected to the Arduino board, you should see it increasing and decreasing in brightness in something that approximates a sinusoidal pattern.

LED slowly blinking in sinewave pattern

Try changing the frequency to something higher, say:

Copy Code

constexpr float freq = 100;

Upload this and look at the Serial Plotter. You should see a much better representation of our sinewave.

100 Hz sinewave on Serial Plotter

Going Further

Once again, it’s not a perfect sinewave and certainly not an efficient way to calculate sine, but it’s definitely a useful way to test the TensorFlow Lite functions in Arduino. I hope this has helped you get started with TensorFlow Lite for microcontrollers!

See these articles to learn more about TensorFlow Lite for Microcontrollers:

Recommended Reading

Intro to TinyML Part 1: Training a Model for Arduino in TensorFlow

Add all DigiKey Parts to Cart

Have questions or comments? Continue the conversation on TechForum, DigiKey's online community and technical resource.

Intro to TinyML Part 2: Deploying a TensorFlow Lite Model to Arduino

2020-04-20 | By ShawnHymel

信息

帮助

联系我们

关注我们