Unleashing Edge Intelligence on XILINX FPGA through Corazon-AI
The Rising Need for EdgeAI
Traditionally, Artificial Intelligence (AI) solutions have all been dependent on the cloud with most of the data filtering, pruning, and computing taken care of on the cloud. Now, there is a need for real-time pre-emptive action in industries and smart cities with decision and computation on the edge.
(Image source: iWave)
Speed and privacy are now the key parameters critical in AI solutions across industries and verticals impacting the choice of technology on the edge. Applications ranging from medical imaging to intelligent traffic management now have a need for high-performance intelligent devices featuring increased hardware acceleration and decreased latency.
To meet the growing requirement of edge AI devices and the strong proposition of an FPGA on the edge, iWave has launched Corazon-AI, an EdgeAI Solution built around the Xilinx Zynq® UltraScale+™ MPSoC.
AI Engine on an FPGA, GPU and CPU
To make the most out of the AI Engine on CPU Cores, frequency scaling was adopted to achieve acceleration, which was followed up by an increase in the number of processor cores to scale up performance. This was a challenge for a programmer to target multi-core platforms and optimize his models. This gave rise to GPU which provided for a massive parallel array solution to integrate an AI Engine.
Achieving through-put coupled with a high batch size awaiting all inputs before getting initiated on the processing paves the way for increased latency. There is a compromise of performance and latency on GPU solutions, giving developers a tough decision to make. The AI Engine on an FPGA provides for more parallelism, increasing the AI Engine capability while achieving a better through-put with a reduced batch size. FPGAs can hence provide the resolution for the compromise between performance and latency.
While the AI Engine runs on the FPGA, the processing functionality can be customized at the base logic port level. The configuration enables direct access to hardware I/O without the impending latency of internal bus structures. This provides immediate high-latency processing of data from a special hardware interface and real-time action on input data. This makes an FPGA more responsive and allows for more dedicated special-purpose innovation operations.
Therefore, an FPGA seems to be the front-runner while analysing the flexibility, latency, performance and optimization and is deemed to be an ideal fit for AI on the edge.
(Image source: iWave)
Corazon-AI – The heart of AI
Corazon-AI has an integrated Xilinx® FPGA AI Engine called DPU (Deep Learning Processor Unit) to perform the AI application acceleration. The DPU is a configurable computation engine dedicated and optimized for convolutional neural networks.
Corazon-AI has the interfaces to connect up to 8 IP Cameras, multiple USB Cameras and an SDI Camera. The ability to capture multi-angle high-resolution video frames that are proactively processed by the built-in AI Inference engine adds further strength.
With the support for high-speed connectivity such as Dual Gigabit Ethernet and multiple wireless connectivity options such as Wi-Fi , Bluetooth and Cellular connectivity, the device can communicate to the cloud and servers when necessary and is modular in architecture.
An ideal application for Corazon-AI would be an intelligent toll management system. With the provision to connect 8 IP Cameras, cameras across lanes can capture individual images and video streams with models running simultaneously on each of the input streams. Decisions and data filtering can be done on the edge which eliminates the need to have multiple gateways at each toll plaza. At present, the whole video stream is transferred from the toll to the control center where the decision and processing can take place.
Corazon-AI can provide an ideal fit in applications where there is a need to connect multiple cameras and a need for computing and data-driven decisions on the edge such as Traffic Management, Video Surveillance, Smart Parking, and Automated Video Inspection Sorting.
(Image source: iWave)
DPU helps achieve high through-put and low latency
The DPU has configurable hardware architectures (B512, B800, B1024, B1152, B1600, B2304, B3136 & B4096). Each DPU architecture is capable of configuring up to 3 internal cores. Also, dedicated AXI interfaces are provided for instruction access, configuration access and data access. The data access AXI master interface supports configurable width of 64 or 128 bits, which are capable of convolution, deconvolution and depth wise convolution.
DPU’s peak theoretical performance with different configurations. (Image source: iWave)
Performance of Several Models. (Image source: iWave)
(Image source: iWave)
*These models were pruned using the XILINX pruning tool
* The accuracy is based on an 8-bit fixed quantization
* Measured on Corazon-AI platform with single B4096 core with 16 threads
About iWave Systems Technologies Pvt. Ltd.
iWave Systems Technologies Pvt. Ltd., established in 1999, focuses on product engineering services involving embedded hardware and software, FPGA design and development. With over 18 years’ experience on FPGAs, iWave provides custom design services and a wide array of XILINX System on Modules – ZYNQ 7000 Series and Ultra Scale+ MPSoC series.
You can get in touch with us for inquiries and further information at mktg@iwavesystems.com.
Have questions or comments? Continue the conversation on TechForum, Digi-Key's online community and technical resource.
Visit TechForum