Generative AI - Virtual Agent with M5Stack
2023-12-08 | By M5Stack
License: General Public License Bluetooth / BLE Single Board Computers Wifi Arduino ESP32 M5Stack
* Thanks for the source code and project information provided by @Johnnie Tien Nguyen
Hardware components
Story
Virtual agent Power by M5Stack Core
Generative AI - Virtual Agent with M5Stack: Your Conversational Companion
In an era of rapid technological advancements, the quest for seamless communication has never been more pressing. With the advent of ChatGPT, Bing, and Google Bard, the world has gained access to powerful language models capable of answering complex questions and generating creative text formats. However, harnessing these capabilities often requires access to computers or laptops, limiting their reach and accessibility.
Enter the Virtual Agent with M5Stack, an innovative device that empowers you to interact with the world around you in a seamless and accessible way. Powered by the latest advancements in generative AI, this smart companion is designed to serve as your personal assistant, answering your questions and providing information in real-time.
Empowering Communication and Accessibility
The Virtual Agent with M5Stack is a game-changer for those with mobility impairments, visual impairments, or low vision. With its intuitive voice-based interface, users can simply ask questions and receive clear, concise answers. This hands-free approach eliminates the need for typing or screen navigation, making it an ideal tool for everyday communication and information access.
This project tackles this challenge by developing a smart device powered by generative AI and the M5Stack platform, enabling users to engage in natural conversations with a virtual agent anytime, anywhere. Leveraging the power of PaLM2 API or ChatGPT API and Google Infrastructure, this device delivers up-to-date information and enhanced performance through integrated text-to-speech (TTS) and speech-to-text (STT) functionalities.
The device's keyword detection system, built using TinyML framework from Edge Impulse, activates the virtual agent, ready to receive and respond to user queries. For STT, the device records audio and sends it to Wit.ai for processing, converting it into a text format suitable for chatbot interaction. Besides that, I also programmed Button A to trigger the asking question in case the detecting keyword is not working properly.
The backend chatbot, initially built using ChatGPT 3.5, was later transitioned to Google PaLM2 API due to its superior performance and cost-effectiveness. The device seamlessly sends the user's question to the chatbot, receives the response, and utilizes the Google TTS library to read it aloud, providing a comprehensive and engaging communication experience.
This project not only demonstrates the transformative potential of generative AI but also highlights the versatility of the M5Stack platform in facilitating accessible and interactive communication. With its potential to empower individuals with low vision, the elderly, or those seeking an alternative to screen-based interactions, this device represents a significant step towards a more inclusive and connected future.
Some demos of the Agent chatbot can be tried here: (please note the language is Vietnamese)
https://multilchatbotpalm2-johnnietien.streamlit.app/
So far this is what I have achieved:
Have questions or comments? Continue the conversation on TechForum, DigiKey's online community and technical resource.
Visit TechForum