LEGO Brick Finder with OpenMV and Edge Impulse
2020-09-21 | By ShawnHymel
License: Attribution
In this project, I will walk you through the steps needed to create your own LEGO® brick finder using the OpenMV H7 camera module. We will train a machine learning model to identify a particular piece with Edge Impulse and then deploy that model to the OpenMV, which will scan through captured images and highlight areas that match the piece on a connected LCD.
Finding a particular Lego piece in a pile can be a time-consuming and frustrating task--a veritable needle-in-a-haystack. Growing up, I often solicited help from my parents to assist in this arduous task. Even with their help, it could often take 15 minutes of digging through bins of Lego bricks to uncover that one elusive piece. I attempt to address this problem using machine learning on microcontrollers (also known as TinyML).
See this tutorial to learn how to set up and get started with the OpenMV camera. All of the code and image dataset used in this project can be found in this GitHub repository.
Even though this is a fun, lighthearted project showcasing the capabilities of machine learning on microcontrollers, it has many potential industrial applications. Identifying parts of an image to look for some object has many uses in autonomous vehicles, satellite and aerial image analysis, and X-ray image analysis. See this article to learn more about object detection and recognition.
Here is a video showing the Lego brick finder in action:
Limitations
Note that this project is a proof-of-concept with several major limitations:
- It’s very slow, taking around 10 seconds to identify potential target pieces in a pile of bricks.
- The field of view is limited, and we are only working with 240x240 resolution.
- It only works to identify 1 piece at the moment. More models (or 1 larger model) would need to be trained to incorporate more target pieces.
- The camera must be a particular distance away from the pile of bricks, and similar lighting between data collection and deployment must be maintained. This can be remedied by collecting more data to create more robust models.
- The target piece must be face up on top of the pile of bricks. Once again, more data would be needed to create better models that could identify pieces from any angle or pieces that are partially obscured.
To create a real, usable version of this project, it’s recommended that you use a faster processor on a handheld device with a better camera and screen (such as a smartphone). You will also need to collect a lot more data consisting of multiple Lego pieces and different lighting, distances, and angles.
Hardware Hookup
You will need the following hardware:
- OpenMV H7 Camera or OpenMV H7 Camera PLUS
- OpenMV LCD Shield
- MicroSD Card
- 2x Mini Breadboard
- Pushbutton
- Tall headers (or bend right-angle headers to be straight)
- Wires
- USB Micro cable
Most machine learning projects begin by collecting data. For our purposes, that means taking a bunch of photos of Lego bricks using the OpenMV camera. To start, we’ll create a simple MicroPython program that turns our OpenMV into a still frame camera.
Solder the tall headers onto the OpenMV camera module. Attach the LCD Shield to the back of the camera module, and attach the camera module to two connected breadboards through a set of tall headers. Insert the microSD card into the OpenMV camera.
Connect one side of the pushbutton to GND on the OpenMV and the other side to pin P4.
Mount the camera module so that it points straight down at a pile of Lego bricks, about 8 inches away from the top of the pile.
Data Collection
Use the OpenMV IDE to upload the following program to your OpenMV camera (Tools > Save open script to OpenMV Cam (as main.py)). Note that you will need to first connect to the camera (Connect icon in the bottom-left corner of the IDE) before uploading the program.
import pyb, sensor, image, lcd, time
# Settings
btn_pin = 'P4'
led_color = 1
file_prefix = "/IMG"
file_suffix = ".bmp"
shutter_delay = 1000 # Milliseconds to wait after button press
# Globals
timestamp = time.ticks()
btn_flag = 0
btn_prev = 1
led_state = True
file_num = 0
filename = file_prefix + str(file_num) + file_suffix
####################################################################################################
# Functions
def file_exists(filename):
try:
f = open(filename, 'r')
exists = True
f.close()
except OSError:
exists = False
return exists
####################################################################################################
# Main
# Start sensor
sensor.reset() # Reset and initialize the sensor.
sensor.set_pixformat(sensor.RGB565) # Set pixel format to RGB565 (or GRAYSCALE)
sensor.set_framesize(sensor.QVGA) # Set frame size to QVGA (320x240)
sensor.skip_frames(time = 2000) # Wait for settings take effect
# Start clock
clock = time.clock()
# Set up LED
led = pyb.LED(led_color)
led.off()
# Set up button
btn = pyb.Pin(btn_pin, pyb.Pin.IN, pyb.Pin.PULL_UP)
# Set up LCD
lcd.init()
# Figure where to start numbering files
while file_exists(filename):
file_num += 1
filename = file_prefix + str(file_num) + file_suffix
# Main while loop
while(True):
# Update FPS clock
clock.tick()
# Get button state
btn_state = btn.value()
# Get image, print out resolution and FPS
img = sensor.snapshot()
print(img.width(), "x", img.height(), "FPS:", clock.fps())
# If flag is set, countdown to shutter (blink LED) and save image
if btn_flag == 1:
if (time.ticks() - timestamp) <= shutter_delay:
led_state = not led_state
if led_state:
led.on()
else:
led.off()
else:
btn_flag = 0
led.off()
img.save(filename)
print("Image saved to:", filename)
# If button is released, start shutter countdown
if (btn_prev == 0) and (btn_state == 1):
# Set flag and timestamp
btn_flag = 1
timestamp = time.ticks()
# Figure out what to name file
while file_exists(filename):
file_num += 1
filename = file_prefix + str(file_num) + file_suffix
# Scale image (in place) and display on LCD
lcd.display(img.mean_pool(2, 2))
# Record button state
btn_prev = btn_state
Make sure that the pile of bricks does NOT contain your target piece. Press the button on the breadboard to snap a photo of the Lego bricks. Note that the red LED will flash for about 1 second before actually taking the photo (this is to let allow the camera to settle from shaking with the button push).
Add your target piece into the frame and snap another photo. Move the piece slightly, and snap a photo again. Keep doing this until you have the piece in various locations throughout the pile. For my demo, I snapped photos of the target piece in 25 different locations on each background pile of Lego bricks.
Note that the target piece for my demo was the light gray 2x2 round plate with axle hole.
Move the pile of Lego pieces around and repeat the process a few more times. In my demo, I took photos of 3 background Lego piles and 25 target piece photos on each background (for a total of 78 photos).
Data Curation
We need to crop the photos into 32x32 pixel chunks and divide them up into background and target sets. We will train a neural network to classify each 32x32 image as either background or target. To start, copy all of the captured photos from the SD card to your computer. I recommend separating the background and target photos into separate folders to make the curation process easier.
The first part is a manual process, unfortunately. Use an image editing program, such as GIMP, to crop out the target piece in each photo. For my demo, I ended up with 75 32x32 pixel BMP images of the target piece in various locations and with different backgrounds. I put these in a separate folder: datasets/lego/bmp_32px/target.
The second part, thankfully, can be automated. I created a Python script that crops a number of windows from images and saves each window as its own image in another folder. Download divide_images.py and utiles.py from this repository and put them in the same folder somewhere on your computer.
You will need Python 3 installed on your computer for this script to run. Additionally, you will need to install the Pillow package via pip.
Call divide_images.py from a command prompt, providing it with the source images directory, an output directory, and the resolution and hop distance of each crop window. Here is an example call to the script to crop 32x32 images from the background photos:
python divide_images.py -i "../../Python/datasets/lego/raw/background" -o "../../Python/datasets/lego/bmp_32px/background" -n "background" -w 32 -t 32 -l 10
Note that the -n option will set the prefix name for each output image.
When it’s done, you should have hundreds of 32x32 BMP files in the output directory, each containing a random smattering of Lego bricks. None of them should contain your target piece.
At this time, Edge Impulse does not support BMP files, so we need to convert them to JPGs. Use a program (such as BulkImageConverter) to convert all your 32x32 BMP files to 32x32 JPG files. I stored them in separate folders. Target JPG files went into lego/jpg_32px/target, and background JPG files went into lego/jpg_32px/background.
Machine Learning Model Training with Edge Impulse
Head to edgeimpulse.com, create a new account (if you don’t have one already), and start a new project. On the left pane, click Data acquisition. Click Upload data. In the new window, click Choose Files, and select all of your background JPG images. Leave Automatically split between training and testing selected, and enter the label background.
Click Begin upload and let the process run. When it has finished, repeat the process for the target JPG files (but give them the label target instead).
Click on Create impulse under Impulse design in the left pane. You should see an Image data block. Change the resolution to 32 for width and 32 for height. Add an Image processing block and leave the settings at their defaults. Finally, add a Neural Network (Keras) learning block, and leave the default settings.
Note that you can also try the Transfer Learning (Images) learning block instead. It allows you to choose from a few pre-trained (or partially trained) MobileNet models. Transfer learning allows you to tweak the parameters of the neural network for your particular application. I recommend trying out transfer learning to see if it works for your application. Note that MobileNet is quite accurate, but it may or may not fit on the OpenMV camera.
Click Save Impulse.
Click on Image under Impulse design in the left navigation pane. Click on the Generate features tab. You can explore some of the features in the graphic visualization to see if there is some discernible separation among the samples. Note that this works better for 2 or 3 features (dimensions) rather than images (which, for us, contain 32x32 pixels and 3 channels in each pixel, giving us 3072 input dimensions).
Click Generate features, and wait while Edge Impulse extracts features from all of the images.
When that’s finished, click on NN Classifier under Impulse design (navigation pane). I like to increase the number of training cycles to 100 to help ensure that the model converges. Additionally, I changed the model slightly, as I found that I could decrease the number of filters in the first layer to 16 without losing much accuracy, and it made the final model smaller.
Feel free to play around with different models and layers to find something that works for you. I found that the following model worked well enough for this proof-of-concent:
2D conv (16 filters, 3 kernel size) > 2D conv (16 filters, 3 kernel size) > Flatten > Dropout (rate 0.25).
Click Start training, and wait for training to complete. When it’s done, you can scroll down to view the predicted accuracy and confusion matrix of the validation set(s). You should have very few false positives and false negatives in the confusion matrix (i.e. the numbers of correctly guessed classes should be the highest).
On the right side of the Last training performance pane, you should see some predicted performance metrics. Hover over the ‘?’ icon to see that these numbers are for an ARM Cortex-M4F running at 80 MHz. It also assumes that the model has been quantized to 8-bit integers (from floating point operations), which means some accuracy is lost to get a smaller and faster model.
Head to Model testing on the navigation pane. Click the checkbox next to Sample Name to select all of the test samples, and click Classify selected. When it’s done, scroll down in the samples pane to look at how well it performed for background and target examples. Expect a few samples to be misclassified, but you should really aim for something over 91% accuracy (with 13 target samples and 122 background samples, the accuracy for guessing everything is “background” is 90.4%, and we want to do better than that).
Click on Deployment in the left navigation pane. Select OpenMV, and click the Build button. When the build process is complete, a .zip file should be automatically downloaded.
Deploy Machine Learning Model
Unzip the downloaded .zip file. Copy the labels.txt and trained.tflite files to the root directory of your OpenMV drive (same directory as main.py). If you are using a microSD card, these files should be in the root directory of the microSD card.
Feel free to look at the ei_image_classification.py example for help on how to use the TensorFlow Lite package in OpenMV. However, we will be writing our own program to identify target piece location.
We will be using a sliding window, which is similar to how we divided up the training images into 32x32 cropped images, . A 32x32 pixel chunk of the live image is cropped out. Inference (with our TensorFlow Lite model) is performed on that chunk to determine if it contains our target piece or not. The window then slides over a few pixels, and we perform inference on the next cropped chunk. This continues for the whole image.
Any windows that are thought to contain the target piece (the output probability for the target label is over some threshold, such as 0.7) are highlighted and displayed to the user.
In the OpenMV IDE, enter the following program:
# Edge Impulse - OpenMV Image Classification Example
import pyb, sensor, image, time, os, lcd, tf
# Settings
model_path = "trained.tflite" # Path to tflite model file
labels_path = "labels.txt" # Path to text file (one label on each line)
btn_pin = 'P4' # Pin that button is connected to
led_color = 1 # Red LED = 1, Green LED = 2, Blue LED = 3, IR LEDs = 4
shutter_delay = 800 # Milliseconds to wait after button press
led_off_delay = 200 # Milliseconds to wait after turning off LED to snap photo
target_label = 'target' # See labels.txt for labels
target_thresh = 0.7 # NN output for target label must be at least this prob.
sub_w = 32 # Width (pixels) of window
sub_h = 32 # Height (pixels) of window
hop = 10 # Pixels between start of windows (vertical and horizontal)
# Globals
timestamp = time.ticks()
btn_flag = 0
btn_prev = 1
led_state = True
target_locations = []
####################################################################################################
# Main
# Set up sensor
sensor.reset() # Reset and initialize the sensor.
sensor.set_pixformat(sensor.RGB565) # Set pixel format to RGB565 (or GRAYSCALE)
sensor.set_framesize(sensor.QVGA) # Set frame size to QVGA (320x240)
sensor.set_windowing((240, 240)) # Set 240x240 window.
sensor.skip_frames(time=2000) # Let the camera adjust.
# Load tflite model and labels
labels = [line.rstrip('\n') for line in open(labels_path)]
# Start clock
clock = time.clock()
# Set up LED
led = pyb.LED(led_color)
led.off()
# Set up button
btn = pyb.Pin(btn_pin, pyb.Pin.IN, pyb.Pin.PULL_UP)
# Load model
model = tf.load(model_path)
# Set up LCD
lcd.init()
# Main while loop
while(True):
# Update FPS clock
clock.tick()
# Get button state
btn_state = btn.value()
# Get button state
img = sensor.snapshot()
# Initialize top-left markers for sub-image window
px_left = 0
px_top = 0
# If flag is set, count down to shutter (blink LED) and look for target
if btn_flag == 1:
if (time.ticks() - timestamp) <= shutter_delay:
led_state = not led_state
if led_state:
led.off()
else:
led.on()
else:
# Reset flag
btn_flag = 0
# Turn off LED to get snapshot, then turn it back on to show it's processing
led.off()
time.sleep(led_off_delay)
img = sensor.snapshot()
led.on()
# Crop out sub-image window and run through tflite model (does window contain target?)
target_locations = []
total_inference_time = 0
num_inference_cnt = 0
while(px_top + sub_h <= img.height()):
while(px_left + sub_w <= img.width()):
# Measure time start
start_time = time.ticks()
# Crop out window
img_crop = img.copy((px_left, px_top, sub_w, sub_h))
# Classfiy window (is it our target?), get output probability
tf_out = model.classify(img_crop)
target_prob = tf_out[0].output()[labels.index(target_label)]
# If it is our target, add location and probability to list
if target_prob >= target_thresh:
prob_str = str("{:.2f}".format(round(target_prob, 2)))
target_locations.append((px_left, px_top, prob_str))
# Record time to perform inference
total_inference_time += (time.ticks() - start_time)
num_inference_cnt += 1
# Move window to the right
px_left += hop
# Move window down (and reset back to left)
px_top += hop
px_left = 0
# Turn off LED to show we're done
led.off()
# Report average inference time
print()
print("Number of possible targets found:", len(target_locations))
print("Average inference time:", total_inference_time / num_inference_cnt, "ms")
print("Number of inferences performed:", num_inference_cnt)
print("Total computation time:", total_inference_time / 1000, "s")
# Draw target locations and probabilities onto image
for loc in target_locations:
img.draw_rectangle((loc[0], loc[1], sub_w, sub_h))
img.draw_string(loc[0] + 1, loc[1], loc[2], mono_space=False)
# If button is released, start shutter countdown
if (btn_prev == 0) and (btn_state == 1):
# Set flag and timestamp
btn_flag = 1
timestamp = time.ticks()
# Scale image and display on LCD
#lcd.display(img.mean_pool(2, 2))
# Display info to terminal
#print("FPS:", clock.fps())
# Record button state
btn_prev = btn_state
With the IDE connected to the camera module, press the Start (run script) button (bottom-left corner of the IDE).
This program loads the trained.tflite model and labels.txt files found on the board (or SD card). When you press the button, it captures a still photo and uses a sliding window to crop out individual parts of the image. Inference is performed on each part using the model, and areas thought to contain the target piece are highlighted, which you can see in the Frame buffer pane (upper-right corner of the IDE).
Press the button on the breadboard to snap a photo and look for instances of the target piece in the frame.
If you open the Serial Terminal, you should see some metrics printed out, including the number of frames thought to contain the target piece and the time to perform inference (which can be upwards of 10 seconds for the whole image).
To use the device independently from the computer, you will first need to uncomment the following line:
lcd.display(img.mean_pool(2, 2))
Upload this file to the OpenMV camera (Tools > Save open script to OpenMV Cam (as main.py)).
If you are on Windows, go into File Explorer and eject the OpenMV (or associated SD card) drive to force writing to complete.
Cycle power to the OpenMV, and it should start to run your newly uploaded code. Position the camera module over your pile of Lego bricks, and press the button to snap a photo. In a few seconds, the LCD should show you the photo with white squares around any areas it thinks contain the target piece.
Resources and Going Further
As mentioned earlier, this project is a proof-of-concept and has many limitations. For it to work on a usable scale, more data would need to be collected for each piece, and models would need to be built for all potential target pieces (which likely means every Lego piece ever made). Additionally, better user hardware is required to perform inference at an acceptable rate (say, getting better than 1 frame per second). A modern smartphone would likely accomplish this goal.
One way to deploy such a large-scale project would require Lego (or a third party) to host a database that associates a trained machine learning model with each part. The user could search for a particular piece on their phone (through a site or app), and the phone would download the model for that part.
Then, the user could use their phone’s camera to snap photos (or live video feed) of a pile of Lego bricks. The app would then highlight any parts that matched the target piece chosen by the user.
I hope this project has helped you see how easy it is to get started using machine learning with computer vision! Here are some resources if you would like to try this project and take it further:
Have questions or comments? Continue the conversation on TechForum, DigiKey's online community and technical resource.
Visit TechForum