NVIDIA Jetson Nano - Part 2: Image Classification with Machine Learning
2019-12-23 | By ShawnHymel
License: Attribution Jetson
The NVIDIA Jetson Nano is a single-board computer based on the NVIDIA Tegra X1 processor, which combines CPU and GPU capabilities. As a result, it is a great starting platform for doing Edge AI.
If you have not done so already, please follow the steps in the previous tutorial to install Linux and configure your Jetson Nano: Getting Started with the NVIDIA Jetson Nano - Part 1: Setup.
Here is a YouTube link if you would like to view these steps in video format:
Note that for the following demos, you will want to use a keyboard, mouse, and monitor connected directly to the Jetson Nano. Otherwise, the camera feed will be extremely slow over a network connection.
Live Detection Demo
If you downloaded the COCO models in the previous episode, then you can use them to identify and detect objects in real time. To do that, we use the DetectNet-Console tool.
First, make sure you have a camera plugged into your Jetson Nano. This can be a CSI camera (the Raspberry Pi Camera Module V2 supposedly works well) or a USB webcam (the Logitech c920 worked for me).
Open a terminal and navigate to the bin directory in aarch64:
cd ~/jetson-inference/build/aarch64/bin/
From there, run the live camera tool. Note that you will need to set the camera parameter to your connected camera: a USB webcam will likely be the device file /dev/video0, or if you’re using a CSI camera, it will be just 0 or 1.
./detectnet-camera.py --network=coco-dog --camera=/dev/video0
This will start a live feed from your camera. It will also attempt to locate any objects in the frame that match the dog model found in the coco-dog network.
If you use an image of a dog (or a real dog), the program should be able to detect it, label it as a dog, and put a blue bounding box around it.
Training a Model
Because the Nano is an embedded device, it is not nearly as powerful as a modern desktop or server built with a powerful graphics card. As a result, if you plan to train a deep neural network (or other large model) from scratch, we recommend doing so from a laptop, desktop, or server.
NVIDIA has a training interface called DIGITS that makes training networks much easier. This guide will walk you through training deep neural networks from scratch.
That being said, we can do something called “transfer learning” to retrain an existing network. When we do this, we just tweak the model’s parameters to optimize it to our own training data.
To begin, we first need to set up a swap space on our SD card so that we have more RAM to play with. Make sure you have at least 4 GB available on your SD card by running the following command:
df -h
Next, create a mount the swap partition:
sudo fallocate -l 4G /mnt/4GB.swap
sudo chmod 0600 /mnt/4GB.swap
sudo mkswap /mnt/4GB.swap
sudo swapon /mnt/4GB.swap
If you want the swap file to mount on boot, you will need to modify fstab:
sudo vi /etc/fstab
Scroll to the bottom of this file and press ‘o’ to insert a new line and begin editing. Enter the following line:
/mnt/4GB.swap none swap sw 0 0
You can check to see if the swap space mounted with:
swapon -s
You should see the 4GB.swap file listed.
Next, we need to capture images to create our datasets. I’ll be using 3 different sets of images, as I want my network to identify these categories:
- Background
- Fork
- Spoon
Note that if you are training the network to identify objects, you should take pictures of them in similar backgrounds. With such little data, the network will be sensitive to new backgrounds, new lighting, etc.
To use the jetson-inference capture tool, first create our datasets directory and labels file:
cd ~
mkdir datasets
cd ~/datasets
mkdir utensils
cd utensils
touch labels.txt
echo “background” >> labels.txt
echo “fork” >> labels.txt
echo “spoon” >> labels.txt
Note that the categories in the labels file need to be on separate lines and in alphabetical order! You can check them with:
cat labels.txt
Next, run the camera-capture tool. If you’re using a USB webcam, you will want to use the /dev/video0 device file. If you’re using CSI camera, change the camera parameter to 0 or 1 (whichever one works, such as --camera=0). I’m also using a much lower resolution, as it allows for faster training and later classification:
camera-capture --camera=/dev/video0 --width=640 --height=480
In the capture tool, first point the Dataset Path to your ~/datasets/utensils directory. Then, point the Class Labels to the ~/datasets/utensils/labels.txt file. Select your Current Class (e.g. start with “background”). For Current Set, select train. Use the spacebar or button to capture at least 30 images of your intended background.
Next change the Set to val (for validation), and take at least 10 more photos of the same background. Change the Set to test and take yet another 10 photos of the background.
Repeat this process for your fork and spoon images, each time, holding up the desired utensil to the camera. You can move the utensil around slightly, but don’t move it too much, or the model will not be able to train on the image properly.
In the end, you should have the following set of images:
- Background
- Train: 30 (or more) images
- Val: 10 (or more) images
- Test: 10 (or more) images
- Fork
- Train: 30 (or more) images
- Val: 10 (or more) images
- Test: 10 (or more) images
- Spoon
- Train: 30 (or more) images
- Val: 10 (or more) images
- Test: 10 (or more) images
Now, it’s time to train! Navigate to the classification directory and run the training program:
cd ~/jetson-inference/python/training/classification/
python train.py --model-dir=utensils ~/datasets/utensils
This can take up to 30 minutes, so be patient (or go get some coffee). When it’s done, we will need to export the model to the Open Neural Network Exchange (ONNX) format:
python onnx_export.py --model-dir=utensils
Test It!
With the model trained, we can use it to make classification predictions! Run the following program (changing /dev/video0 for your particular camera):
imagenet-camera --model=utensils/resnet18.onnx --labels=/home/sgmustadio/datasets/utensils/
labels.txt --camera=/dev/video0 --width=640 --height=480 --input_blob=input_0 --output_blob=output_0
It can take around 5 minutes for the engine to start up, so be patient with this one, too. Once you get a live stream of your camera, make sure it is facing the background that you trained it on. Then, hold up a fork or spoon in front of the camera. It should be able to identify the utensil!
Note that it probably won’t be very accurate--we used a very small training set!
Going Further
Try training the network on different objects! NVIDIA also has a number of other demos in their Hello AI World documentation that we recommend working through: https://github.com/dusty-nv/jetson-inference#hello-ai-world
Recommended Reading
Have questions or comments? Continue the conversation on TechForum, DigiKey's online community and technical resource.
Visit TechForum