Maker.io main logo

LEGO Brick Finder with OpenMV and Edge Impulse

2020-09-21 | By ShawnHymel

License: Attribution

In this project, I will walk you through the steps needed to create your own LEGO® brick finder using the OpenMV H7 camera module. We will train a machine learning model to identify a particular piece with Edge Impulse and then deploy that model to the OpenMV, which will scan through captured images and highlight areas that match the piece on a connected LCD.

Finding a particular Lego piece in a pile can be a time-consuming and frustrating task--a veritable needle-in-a-haystack. Growing up, I often solicited help from my parents to assist in this arduous task. Even with their help, it could often take 15 minutes of digging through bins of Lego bricks to uncover that one elusive piece. I attempt to address this problem using machine learning on microcontrollers (also known as TinyML).

See this tutorial to learn how to set up and get started with the OpenMV camera. All of the code and image dataset used in this project can be found in this GitHub repository.

Even though this is a fun, lighthearted project showcasing the capabilities of machine learning on microcontrollers, it has many potential industrial applications. Identifying parts of an image to look for some object has many uses in autonomous vehicles, satellite and aerial image analysis, and X-ray image analysis. See this article to learn more about object detection and recognition.

Here is a video showing the Lego brick finder in action:

 

Limitations

Note that this project is a proof-of-concept with several major limitations:

  • It’s very slow, taking around 10 seconds to identify potential target pieces in a pile of bricks.
  • The field of view is limited, and we are only working with 240x240 resolution.
  • It only works to identify 1 piece at the moment. More models (or 1 larger model) would need to be trained to incorporate more target pieces.
  • The camera must be a particular distance away from the pile of bricks, and similar lighting between data collection and deployment must be maintained. This can be remedied by collecting more data to create more robust models.
  • The target piece must be face up on top of the pile of bricks. Once again, more data would be needed to create better models that could identify pieces from any angle or pieces that are partially obscured.

To create a real, usable version of this project, it’s recommended that you use a faster processor on a handheld device with a better camera and screen (such as a smartphone). You will also need to collect a lot more data consisting of multiple Lego pieces and different lighting, distances, and angles.

Hardware Hookup

You will need the following hardware:

Most machine learning projects begin by collecting data. For our purposes, that means taking a bunch of photos of Lego bricks using the OpenMV camera. To start, we’ll create a simple MicroPython program that turns our OpenMV into a still frame camera.

Solder the tall headers onto the OpenMV camera module. Attach the LCD Shield to the back of the camera module, and attach the camera module to two connected breadboards through a set of tall headers. Insert the microSD card into the OpenMV camera.

Connect one side of the pushbutton to GND on the OpenMV and the other side to pin P4.

OpenMV still camera

Mount the camera module so that it points straight down at a pile of Lego bricks, about 8 inches away from the top of the pile.

Mounted OpenMV Lego finder

Data Collection

Use the OpenMV IDE to upload the following program to your OpenMV camera (Tools > Save open script to OpenMV Cam (as main.py)). Note that you will need to first connect to the camera (Connect icon in the bottom-left corner of the IDE) before uploading the program.

Copy Code
import pyb, sensor, image, lcd, time

# Settings
btn_pin = 'P4'
led_color = 1
file_prefix = "/IMG"
file_suffix = ".bmp"
shutter_delay = 1000 # Milliseconds to wait after button press

# Globals
timestamp = time.ticks()
btn_flag = 0
btn_prev = 1
led_state = True
file_num = 0
filename = file_prefix + str(file_num) + file_suffix

####################################################################################################
# Functions

def file_exists(filename):
try:
f = open(filename, 'r')
exists = True
f.close()
except OSError:
exists = False
return exists

####################################################################################################
# Main

# Start sensor
sensor.reset() # Reset and initialize the sensor.
sensor.set_pixformat(sensor.RGB565) # Set pixel format to RGB565 (or GRAYSCALE)
sensor.set_framesize(sensor.QVGA) # Set frame size to QVGA (320x240)
sensor.skip_frames(time = 2000) # Wait for settings take effect

# Start clock
clock = time.clock()

# Set up LED
led = pyb.LED(led_color)
led.off()

# Set up button
btn = pyb.Pin(btn_pin, pyb.Pin.IN, pyb.Pin.PULL_UP)

# Set up LCD
lcd.init()

# Figure where to start numbering files
while file_exists(filename):
file_num += 1
filename = file_prefix + str(file_num) + file_suffix

# Main while loop
while(True):

# Update FPS clock
clock.tick()

# Get button state
btn_state = btn.value()

# Get image, print out resolution and FPS
img = sensor.snapshot()
print(img.width(), "x", img.height(), "FPS:", clock.fps())

# If flag is set, countdown to shutter (blink LED) and save image
if btn_flag == 1:
if (time.ticks() - timestamp) <= shutter_delay:
led_state = not led_state
if led_state:
led.on()
else:
led.off()
else:
btn_flag = 0
led.off()
img.save(filename)
print("Image saved to:", filename)

# If button is released, start shutter countdown
if (btn_prev == 0) and (btn_state == 1):

# Set flag and timestamp
btn_flag = 1
timestamp = time.ticks()

# Figure out what to name file
while file_exists(filename):
file_num += 1
filename = file_prefix + str(file_num) + file_suffix

# Scale image (in place) and display on LCD
lcd.display(img.mean_pool(2, 2))

# Record button state
btn_prev = btn_state

Make sure that the pile of bricks does NOT contain your target piece. Press the button on the breadboard to snap a photo of the Lego bricks. Note that the red LED will flash for about 1 second before actually taking the photo (this is to let allow the camera to settle from shaking with the button push).

OpenMV take photo

Add your target piece into the frame and snap another photo. Move the piece slightly, and snap a photo again. Keep doing this until you have the piece in various locations throughout the pile. For my demo, I snapped photos of the target piece in 25 different locations on each background pile of Lego bricks.

Note that the target piece for my demo was the light gray 2x2 round plate with axle hole.

OpenMV Lego finder target piece

Move the pile of Lego pieces around and repeat the process a few more times. In my demo, I took photos of 3 background Lego piles and 25 target piece photos on each background (for a total of 78 photos).

Data Curation

We need to crop the photos into 32x32 pixel chunks and divide them up into background and target sets. We will train a neural network to classify each 32x32 image as either background  or target. To start, copy all of the captured photos from the SD card to your computer. I recommend separating the background and target photos into separate folders to make the curation process easier.

The first part is a manual process, unfortunately. Use an image editing program, such as GIMP, to crop out the target piece in each photo. For my demo, I ended up with 75 32x32 pixel BMP images of the target piece in various locations and with different backgrounds. I put these in a separate folder: datasets/lego/bmp_32px/target.

Divide target photos

The second part, thankfully, can be automated. I created a Python script that crops a number of windows from images and saves each window as its own image in another folder. Download divide_images.py and utiles.py from this repository and put them in the same folder somewhere on your computer.

You will need Python 3 installed on your computer for this script to run. Additionally, you will need to install the Pillow package via pip.

Call divide_images.py from a command prompt, providing it with the source images directory, an output directory, and the resolution and hop distance of each crop window. Here is an example call to the script to crop 32x32 images from the background photos:

Copy Code
python divide_images.py -i "../../Python/datasets/lego/raw/background" -o "../../Python/datasets/lego/bmp_32px/background" -n "background" -w 32 -t 32 -l 10

Note that the -n option will set the prefix name for each output image.

When it’s done, you should have hundreds of 32x32 BMP files in the output directory, each containing a random smattering of Lego bricks. None of them should contain your target piece.

Cropped background images

At this time, Edge Impulse does not support BMP files, so we need to convert them to JPGs. Use a program (such as BulkImageConverter) to convert all your 32x32 BMP files to 32x32 JPG files. I stored them in separate folders. Target JPG files went into lego/jpg_32px/target, and background JPG files went into lego/jpg_32px/background.

Machine Learning Model Training with Edge Impulse

Head to edgeimpulse.com, create a new account (if you don’t have one already), and start a new project. On the left pane, click Data acquisition. Click Upload data. In the new window, click Choose Files, and select all of your background JPG images. Leave Automatically split between training and testing selected, and enter the label background

Upload images to Edge Impulse

Click Begin upload and let the process run. When it has finished, repeat the process for the target JPG files (but give them the label target instead).

Click on Create impulse under Impulse design in the left pane. You should see an Image data block. Change the resolution to 32 for width and 32 for height. Add an Image processing block and leave the settings at their defaults. Finally, add a Neural Network (Keras) learning block, and leave the default settings.

Note that you can also try the Transfer Learning (Images) learning block instead. It allows you to choose from a few pre-trained (or partially trained) MobileNet models. Transfer learning allows you to tweak the parameters of the neural network for your particular application. I recommend trying out transfer learning to see if it works for your application. Note that MobileNet is quite accurate, but it may or may not fit on the OpenMV camera.

Create a data flow on Edge Impulse

Click Save Impulse.

Click on Image under Impulse design in the left navigation pane. Click on the Generate features tab. You can explore some of the features in the graphic visualization to see if there is some discernible separation among the samples. Note that this works better for 2 or 3 features (dimensions) rather than images (which, for us, contain 32x32 pixels and 3 channels in each pixel, giving us 3072 input dimensions).

Click Generate features, and wait while Edge Impulse extracts features from all of the images.

Extract features in Edge Impulse

When that’s finished, click on NN Classifier under Impulse design (navigation pane). I like to increase the number of training cycles to 100 to help ensure that the model converges. Additionally, I changed the model slightly, as I found that I could decrease the number of filters in the first layer to 16 without losing much accuracy, and it made the final model smaller.

Feel free to play around with different models and layers to find something that works for you. I found that the following model worked well enough for this proof-of-concent:

2D conv (16 filters, 3 kernel size) > 2D conv (16 filters, 3 kernel size) > Flatten > Dropout (rate 0.25).

Neural network design in Edge Impulse

Click Start training, and wait for training to complete. When it’s done, you can scroll down to view the predicted accuracy and confusion matrix of the validation set(s). You should have very few false positives and false negatives in the confusion matrix (i.e. the numbers of correctly guessed classes should be the highest).

On the right side of the Last training performance pane, you should see some predicted performance metrics. Hover over the ‘?’ icon to see that these numbers are for an ARM Cortex-M4F running at 80 MHz. It also assumes that the model has been quantized to 8-bit integers (from floating point operations), which means some accuracy is lost to get a smaller and faster model.

Neural network validation in Edge Impulse

Head to Model testing on the navigation pane. Click the checkbox next to Sample Name to select all of the test samples, and click Classify selected. When it’s done, scroll down in the samples pane to look at how well it performed for background and target examples. Expect a few samples to be misclassified, but you should really aim for something over 91% accuracy (with 13 target samples and 122 background samples, the accuracy for guessing everything is “background” is 90.4%, and we want to do better than that).

Testing neural network in Edge Impulse

Click on Deployment in the left navigation pane. Select OpenMV, and click the Build button. When the build process is complete, a .zip file should be automatically downloaded.

Build TensorFlow Lite model in Edge Impulse

Deploy Machine Learning Model

Unzip the downloaded .zip file. Copy the labels.txt and trained.tflite files to the root directory of your OpenMV drive (same directory as main.py). If you are using a microSD card, these files should be in the root directory of the microSD card.

Deploy TensorFlow Lite model file to OpenMV

Feel free to look at the ei_image_classification.py example for help on how to use the TensorFlow Lite package in OpenMV. However, we will be writing our own program to identify target piece location.

We will be using a sliding window, which is similar to how we divided up the training images into 32x32 cropped images, . A 32x32 pixel chunk of the live image is cropped out. Inference (with our TensorFlow Lite model) is performed on that chunk to determine if it contains our target piece or not. The window then slides over a few pixels, and we perform inference on the next cropped chunk. This continues for the whole image.

Neural network inference with sliding window

Any windows that are thought to contain the target piece (the output probability for the target label is over some threshold, such as 0.7) are highlighted and displayed to the user.

In the OpenMV IDE, enter the following program:

Copy Code
# Edge Impulse - OpenMV Image Classification Example

import pyb, sensor, image, time, os, lcd, tf

# Settings
model_path = "trained.tflite" # Path to tflite model file
labels_path = "labels.txt" # Path to text file (one label on each line)
btn_pin = 'P4' # Pin that button is connected to
led_color = 1 # Red LED = 1, Green LED = 2, Blue LED = 3, IR LEDs = 4
shutter_delay = 800 # Milliseconds to wait after button press
led_off_delay = 200 # Milliseconds to wait after turning off LED to snap photo
target_label = 'target' # See labels.txt for labels
target_thresh = 0.7 # NN output for target label must be at least this prob.
sub_w = 32 # Width (pixels) of window
sub_h = 32 # Height (pixels) of window
hop = 10 # Pixels between start of windows (vertical and horizontal)

# Globals
timestamp = time.ticks()
btn_flag = 0
btn_prev = 1
led_state = True
target_locations = []

####################################################################################################
# Main

# Set up sensor
sensor.reset() # Reset and initialize the sensor.
sensor.set_pixformat(sensor.RGB565) # Set pixel format to RGB565 (or GRAYSCALE)
sensor.set_framesize(sensor.QVGA) # Set frame size to QVGA (320x240)
sensor.set_windowing((240, 240)) # Set 240x240 window.
sensor.skip_frames(time=2000) # Let the camera adjust.

# Load tflite model and labels
labels = [line.rstrip('\n') for line in open(labels_path)]

# Start clock
clock = time.clock()

# Set up LED
led = pyb.LED(led_color)
led.off()

# Set up button
btn = pyb.Pin(btn_pin, pyb.Pin.IN, pyb.Pin.PULL_UP)

# Load model
model = tf.load(model_path)

# Set up LCD
lcd.init()

# Main while loop
while(True):

# Update FPS clock
clock.tick()

# Get button state
btn_state = btn.value()

# Get button state
img = sensor.snapshot()

# Initialize top-left markers for sub-image window
px_left = 0
px_top = 0

# If flag is set, count down to shutter (blink LED) and look for target
if btn_flag == 1:
if (time.ticks() - timestamp) <= shutter_delay:
led_state = not led_state
if led_state:
led.off()
else:
led.on()
else:

# Reset flag
btn_flag = 0

# Turn off LED to get snapshot, then turn it back on to show it's processing
led.off()
time.sleep(led_off_delay)
img = sensor.snapshot()
led.on()

# Crop out sub-image window and run through tflite model (does window contain target?)
target_locations = []
total_inference_time = 0
num_inference_cnt = 0
while(px_top + sub_h <= img.height()):
while(px_left + sub_w <= img.width()):

# Measure time start
start_time = time.ticks()

# Crop out window
img_crop = img.copy((px_left, px_top, sub_w, sub_h))

# Classfiy window (is it our target?), get output probability
tf_out = model.classify(img_crop)
target_prob = tf_out[0].output()[labels.index(target_label)]

# If it is our target, add location and probability to list
if target_prob >= target_thresh:
prob_str = str("{:.2f}".format(round(target_prob, 2)))
target_locations.append((px_left, px_top, prob_str))

# Record time to perform inference
total_inference_time += (time.ticks() - start_time)
num_inference_cnt += 1

# Move window to the right
px_left += hop

# Move window down (and reset back to left)
px_top += hop
px_left = 0

# Turn off LED to show we're done
led.off()

# Report average inference time
print()
print("Number of possible targets found:", len(target_locations))
print("Average inference time:", total_inference_time / num_inference_cnt, "ms")
print("Number of inferences performed:", num_inference_cnt)
print("Total computation time:", total_inference_time / 1000, "s")

# Draw target locations and probabilities onto image
for loc in target_locations:
img.draw_rectangle((loc[0], loc[1], sub_w, sub_h))
img.draw_string(loc[0] + 1, loc[1], loc[2], mono_space=False)

# If button is released, start shutter countdown
if (btn_prev == 0) and (btn_state == 1):

# Set flag and timestamp
btn_flag = 1
timestamp = time.ticks()

# Scale image and display on LCD
#lcd.display(img.mean_pool(2, 2))

# Display info to terminal
#print("FPS:", clock.fps())

# Record button state
btn_prev = btn_state

With the IDE connected to the camera module, press the Start (run script) button (bottom-left corner of the IDE).

This program loads the trained.tflite model and labels.txt files found on the board (or SD card). When you press the button, it captures a still photo and uses a sliding window to crop out individual parts of the image. Inference is performed on each part using the model, and areas thought to contain the target piece are highlighted, which you can see in the Frame buffer pane (upper-right corner of the IDE).

Press the button on the breadboard to snap a photo and look for instances of the target piece in the frame.

If you open the Serial Terminal, you should see some metrics printed out, including the number of frames thought to contain the target piece and the time to perform inference (which can be upwards of 10 seconds for the whole image).

Run Lego brick finder from OpenMV IDE

To use the device independently from the computer, you will first need to uncomment the following line:

Copy Code
lcd.display(img.mean_pool(2, 2))

Upload this file to the OpenMV camera (Tools > Save open script to OpenMV Cam (as main.py)).

If you are on Windows, go into File Explorer and eject the OpenMV (or associated SD card) drive to force writing to complete.

Cycle power to the OpenMV, and it should start to run your newly uploaded code. Position the camera module over your pile of Lego bricks, and press the button to snap a photo. In a few seconds, the LCD should show you the photo with white squares around any areas it thinks contain the target piece.

Lego brick finder with OpenMV

Resources and Going Further

As mentioned earlier, this project is a proof-of-concept and has many limitations. For it to work on a usable scale, more data would need to be collected for each piece, and models would need to be built for all potential target pieces (which likely means every Lego piece ever made). Additionally, better user hardware is required to perform inference at an acceptable rate (say, getting better than 1 frame per second). A modern smartphone would likely accomplish this goal.

One way to deploy such a large-scale project would require Lego (or a third party) to host a database that associates a trained machine learning model with each part. The user could search for a particular piece on their phone (through a site or app), and the phone would download the model for that part.

Then, the user could use their phone’s camera to snap photos (or live video feed) of a pile of Lego bricks. The app would then highlight any parts that matched the target piece chosen by the user.

Lego brick finder database

I hope this project has helped you see how easy it is to get started using machine learning with computer vision! Here are some resources if you would like to try this project and take it further:

制造商零件编号 SEN-15325
OPENMV H7 CAMERA
SparkFun Electronics
¥610.51
Details
制造商零件编号 SEN-16989
OPENMV CAM H7 PLUS
SparkFun Electronics
¥879.13
Details
制造商零件编号 LCD-16777
OPENMV LCD SHIELD
SparkFun Electronics
¥188.20
Details
Add all DigiKey Parts to Cart
TechForum

Have questions or comments? Continue the conversation on TechForum, DigiKey's online community and technical resource.

Visit TechForum