Customizing an Object Detection Model on Your Raspberry Pi

2024-08-15 | By Ramaditya Kotha

This post is a continuation of my last article, which explained how to set up a pre-trained object detection model (efficientdetlite0) on Raspberry Pi using Mediapipe. This article will focus on customizing your own image detection algorithm and deploying it to your Raspberry Pi.

Overview

In object detection applications, neural networks are trained to detect important features of images. Depending on the model used, they can detect patterns in images like areas of high contrast, areas with a dominant color, and shapes. It’s usually mostly in the last layer that these detected features are all considered to decide what object is being detected and where. Taking this into account, we can customize a neural network by simply retraining the last layer of an existing image detection model!

Note: Retraining the whole model will provide better model accuracy, but it will take significantly longer than retraining the last layer. I will only be covering retraining the last layer in this article, but many resources on how to retrain the whole model are readily available in the Google Mediapipe documentation.

Thankfully, due to how easy Google’s Mediapipe library is to use, we don’t have to concern ourselves with any of the working operations behind retraining a neural network. We’ll also be using Google Collab’s hosted runtimes to perform the retraining, so we don’t even need a powerful computer! We’ll be dividing the retraining process into four steps:

1) Image Collection

2) Labeling

3) Retraining

4) Testing

Image Collection

To start, we’ll first need some pictures! These pictures will be fed to the model so that it can be retrained. Despite this being one of the more tedious parts of the process, it’s important to take your time on this step. The images you feed into your model will directly determine how well your model performs. When we take pictures for our model there are a few things that we should consider.

First and foremost, it is well advised to take pictures with the same camera that will be used to perform inference. This is because properties of the camera like noise, contrast, aspect ratio, and more can create significant differences in the pixel properties of an image. When a neural network looks at an image, it (generally) analyzes most, if not all the pixels in the input image, so even differences that aren’t discernible to the human eye can have a large impact on the result.

The fact that object detection models generally read the pixels of the input image is the high-level motivation behind Nightshade, a research project at the University of Chicago that offensively targets generative AI models. Models trained on artwork or images poisoned with Nightshade will exhibit unpredictable or unreliable behaviors. I recommend reading about it as it’s an interesting read, highlighting the ongoing fight between artists and the rise of large AI models.

Secondly, you want to ensure you take pictures of your object in as many different environments and lighting conditions as possible. This will allow your model to generalize to more situations and prevent it from overfitting to one lighting condition or feature. When training my model to detect a basket, I only took pictures of it on the ground or a table. When I tested the model for accuracy, I found that the accuracy was diminished when I held the basket above the ground. Thankfully, for my application, the basket will always be on the ground, so I didn’t have to redo the training process.

Finally, you want to ensure you have plenty of other objects in the background, especially those that look like your object. This will reduce the number of false positive detections that your model outputs. Here’s an example of a good training picture: