How To Get Started with the Google MediaPipe Python SDK on Raspberry Pi

2024-08-14 | By Maker.io Staff

The Google MediaPipe SDK is an open-source machine-learning (ML) framework for handling various ML tasks without requiring in-depth knowledge of complex models and algorithms. Instead, developers can effortlessly build impressive applications using only a few lines of code that run well, even on hardware with limited resources. Read on to learn how to write a Python-based object classifier in just a few minutes.

What is Google MediaPipe?

Google MediaPipe is a robust machine-learning framework that lets developers build and train complex models with only a few lines of code. The framework accomplishes this by allowing developers to plug together pre-built solutions in a graph-based architecture. These solutions are building blocks for performing various common tasks, such as detecting the presence of objects in a video or estimating the sentiment of a text. Developers can combine these blocks to perform more complex tasks that suit their requirements.

Google offers the framework free of charge for many different platforms and languages. The framework can handle multiple input modalities and work with diverse data sources, such as still images, videos, audio snippets, and text. This article discusses the Python-based SDK for Raspberry Pi and how to process video frames from a live camera feed.

Prerequisites

If you want to try the Google MediaPipe SDK on a Raspberry Pi, ensure your system is up and running, connected to the internet, and all relevant updates are installed. In addition, reading up on the basic concepts of machine learning can be tremendously helpful in understanding what the system does and how to troubleshoot it if something goes wrong.

Next, install a series of programs on the Raspberry Pi within a new Python3 virtual environment. These help handle video-processing tasks, and this step also downloads and installs the MediaPipe SDK itself.

Copy Code

sudo pip install opencv-contrib-python numpy mediapipe

It’s important to note that Mediapipe currently only supports Python versions 3.7 through 3.10, and the installation succeeded with Python 3.9.10.

Setting up a Camera on the Raspberry Pi

Once you have updated your Raspberry Pi, you will need to decide which camera to use. The default Raspi camera module, a safe and reliable choice, is guaranteed to be compatible with the SBC. However, you also have the option to use a standard USB webcam if your project requires higher-resolution images or special features like thermal imaging. Unfortunately, not all USB cameras are compatible with the Raspberry Pi, so you might have to do some research to find one that works.

Regardless of the chosen camera, a correctly installed camera should show up in the list of available devices:

Copy Code

v4l2-ctl --list-devices

If the search returns more than one device, the first one, commonly camera0 or video0, is usually the correct one to use. However, the exact device name may vary depending on the model and manufacturer:

How To Get Started with the Google MediaPipe Python SDK on Raspberry Pi This screenshot shows an example list of media input devices connected to a Raspberry Pi.

The connected USB camera has two video interfaces.

Finally, you will need to select a trained machine learning model before writing the ML application. A model contains the trained information of a machine-learning network generated by a specific algorithm. At this point, there’s no need to worry if the details of what ML models are and how they are built is unclear. However, it’s essential to understand that models are trained for a specific task and that training one from scratch requires many samples and knowledge to avoid common pitfalls.

Luckily, Google offers pre-trained object detection models that strike a good balance between latency and accuracy, and you can download them for free and import them into your projects to get them up and running quickly. More experienced makers can create and train custom models using MediaPipe, too. The following command downloads a pre-trained model from Google:

Copy Code

wget -q -O efficientdet.tflite -q

https://storage.googleapis.com/mediapipe-models/object_detector/ef

ficientdet_lite0/int8/1/efficientdet_lite0.tflite

Take note of the file path of the downloaded model, as the Python application requires it later.

How To Get Started with the Google MediaPipe Python SDK on Raspberry Pi This image shows the downloaded file in the pi user’s home directory.

Writing an Everyday Object Detector using Google MediaPipe

Finally, it’s time to get started with writing the detector application itself. The first few lines of the Python program import the previously downloaded dependencies:

Copy Code

import cv2 as cv
import numpy as np
import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision

OpenCV is required for reading the camera feed and converting the individual frames to still images for further processing. The MediaPipe imports load the ML framework’s components. Numpy is required for fast matrix operations.

The following variables hold the model options and the classifier objects:

Copy Code

BaseOptions = mp.tasks.BaseOptions
DetectionResult = mp.tasks.components.containers.DetectionResult
ObjectDetector = mp.tasks.vision.ObjectDetector
ObjectDetectorOptions = mp.tasks.vision.ObjectDetectorOptions
VisionRunningMode = mp.tasks.vision.RunningMode

options = ObjectDetectorOptions(
    base_options=BaseOptions(model_asset_path='/home/pi/efficientdet.tflite'),
    running_mode=VisionRunningMode.LIVE_STREAM,
    max_results=2,
    result_callback=annotateFrame)

The Options Object defines which model to use for making predictions. In this example, the code references the previously downloaded pre-compiled model stored in the pi user’s home folder. Further, the options define that the model operates on a live video feed, should stop after classifying two objects in the image, and should call the annotateFrame function once it concludes.

Next, define a callback function for the model to call whenever it finishes classifying the objects in a frame:

Copy Code

MARGIN = 10
ROW_SIZE = 10
FONT_SIZE = 1
FONT_THICKNESS = 1
TEXT_COLOR = (255, 0, 0)

frameToShow = None
processingFrame = False

# Annotate each frame
def annotateFrame(result: DetectionResult, image: mp.Image, timestamp_ms: int):
	global frameToShow, processingFrame
	frameToShow = np.copy(image.numpy_view())
	for detection in result.detections:
		# Draw bounding_box
		bbox = detection.bounding_box
		start_point = bbox.origin_x, bbox.origin_y
		end_point = bbox.origin_x + bbox.width, bbox.origin_y + bbox.height
		cv.rectangle(frameToShow, start_point, end_point, TEXT_COLOR, 3)
		# Draw label
		category = detection.categories[0]
		category_name = category.category_name
		result_text = category_name
		text_location = (MARGIN + bbox.origin_x, MARGIN + ROW_SIZE + bbox.origin_y)
		cv.putText(frameToShow, result_text, text_location, cv.FONT_HERSHEY_PLAIN, FONT_SIZE, TEXT_COLOR, FONT_THICKNESS)
	frameToShow = frameToShow
	processingFrame = False

This callback function converts the current frame to a numpy image and then iterates over all detection results returned by the model. Besides other information, each detection result contains the predicted labels sorted by the model’s certainty in ascending order, the certainty values, and a bounding box around the detected object. The callback uses OpenCV to draw the bounding box around the detected object. The function then adds the label to the image. It stores the result in the global frameToShow variable. Finally, the callback resets the processingFrame flag to inform the global loop that a new frame is ready for display.

How To Get Started with the Google MediaPipe Python SDK on Raspberry Pi This flowchart outlines the most critical steps of the object detector program.

Finally, the main loop continuously checks whether a frame is ready for display. If so, the main loop shows the annotated image. Otherwise, the main loop uses OpenCV to obtain the current frame from the connected camera. The program then passes that frame to the classifier and sets the busy flag from before. This measure leads to many dropped frames, but it performs better with the Raspberry Pi’s limited computing capabilities. The computer only processes one frame at a time before displaying the result, which leads to a more responsive user experience.

Copy Code

cap = cv.VideoCapture(0)
if not cap.isOpened():
	print("Cannot open camera")
	exit()
with ObjectDetector.create_from_options(options) as detector:
	while True:
		ret, frame = cap.read()
		if not ret:
			print("Can't read frame. Exiting ...")
			break
		if frameToShow is not None:
			cv.imshow("Detection Result", frameToShow)
			frameToShow = None
		elif not processingFrame:
			mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=frame)
			frame_ms = int(cap.get(cv.CAP_PROP_POS_MSEC))
			detector.detect_async(mp_image, frame_ms)
			processingFrame = True
		if cv.waitKey(1) == ord('q'):
		 	break
 
# Release the capture before closing the program
cap.release()
cv.destroyAllWindows()

Processing every frame introduces a significant delay between reading a frame and displaying the result, meaning that multiple seconds pass between a user moving the camera and the program displaying the result.

Either way, the last “if” block checks whether a user pressed the q-key on the keyboard. If it detects a keypress, it breaks out of the main loop. Finally, the program frees the camera resource and closes the output window.

How To Get Started with the Google MediaPipe Python SDK on Raspberry Pi This screenshot shows how the model classifies two objects the camera can see.

It outputs the two results with the highest certainty, which are a clock and a cup, in this example.

Summary

The Google MediaPipe SDK, a powerful and user-friendly machine-learning framework, enables developers to easily build applications without the need for extensive knowledge of complex models and algorithms. It offers pre-built solutions that can be combined to perform various tasks, such as object detection in videos and sentiment analysis of text. The SDK is available for free on different platforms and languages and supports multiple input modalities, inspiring developers to explore its versatility.

You can use pre-trained object detection models from Google or create custom models using MediaPipe. Writing the detector application involves importing necessary dependencies, defining model options and classifier objects, extracting frames from the video feed, and creating a callback function for the model to use when classifying objects in a frame.