Accelerating Network Performance: The Impact of RDMA over Converged Ethernet (RoCE)
2024-12-27
The rapid evolution of compute-intensive applications has heightened the need for faster, more efficient, and scalable network solutions. Among the most innovative technologies emerging to meet this demand is Remote Direct Memory Access (RDMA) over Converged Ethernet (RoCE). This groundbreaking technology facilitates direct data transfers between systems without the need for CPU intervention, significantly reducing latency and improving overall system performance. iWave, a prominent FPGA design house, is at the forefront of this advancement, having implemented a robust 100G Ethernet solution by integrating AMD’s ERNIC IP (Ethernet RDMA Network Interface Controller Intellectual Property) into their embedded computing modules portfolio. This integration is set to enhance RDMA capabilities in high-performance applications.
Figure 1: RoCE facilitates direct data transfers between systems without the need for CPU intervention, significantly reducing latency and improving overall system performance. (Image source: iWave)
Understanding RDMA over Converged Ethernet (RoCE)
RDMA is a critical technology that enables direct memory transfers between hosts or servers, effectively bypassing the CPU. This capability allows CPUs to focus on application execution and data processing, leading to notable improvements in network performance characterized by reduced latency, lower CPU loads, and increased bandwidth—all in a cost-effective manner. RoCE is a specific network protocol designed to facilitate RDMA operations over Ethernet networks. By leveraging the existing Ethernet infrastructure, RoCE presents an appealing option for organizations looking to enhance performance without overhauling their current network setups.
Types of RoCE
RoCE is categorized into two distinct versions based on the network adapter used: RoCE v1 and RoCE v2.
- RoCE v1: This protocol allows communication between two hosts situated within the same Ethernet broadcast domain (VLAN). It utilizes Ethertype 0x8915 and restricts standard Ethernet frames to 1500 bytes, while allowing Ethernet jumbo frames to extend up to 9000 bytes.
- RoCE v2: Addressing the limitations of RoCE v1, RoCE v2 introduces enhancements in packet encapsulation by incorporating IP and UDP headers. This modification enables RoCE v2 to function seamlessly across both Layer 2 (Data Link Layer) and Layer 3 (Network Layer) networks, thus supporting Layer 3 routing and scalability across multiple subnets. Often referred to as Routable RoCE (RRoCE), RoCE v2 also adds support for IP multicast, further broadening its applicability.
ERNIC IP: enhancing RDMA capabilities
The ERNIC (Embedded RDMA enabled NIC) IP is a customizable Ethernet RDMA Network Interface Controller IP core designed for seamless integration with AMD FPGAs, MPSoCs, and soft MAC IP implementations. This solution is characterized by high throughput, low latency, and a fully hardware-offloaded, reliable data transfer mechanism over standard Ethernet. iWave has exemplified its commitment to technological advancement by successfully implementing a 100G Ethernet solution. This achievement was made possible through the utilization of iWave’s Zynq UltraScale+ MPSoC powered development kit, which integrates AMD’s ERNIC IP.
The Zynq UltraScale+ MPSoC development kit is specifically tailored for prototyping and evaluating 100G Ethernet solutions, employing high-speed QSFP-28 connectors.
Demo setup
A typical demo setup (Figure 2) consists of:
- iWave’s Zynq UltraScale+ MPSoC ZU19EG powered development kit
- Advantech Mellanox ConnectX-5 100G NIC
- Sync 1588 PTP enabled 1G NIC
- MTP Cable, QSFP-28 modules, and CAT6 RJ45 Ethernet cable
- Ubuntu 22.04 Server PC
Figure 2: The typical setup for the Zynq UltraScale+ MPSoC development kit. (Image source: iWave)
System architecture overview
The system architecture is designed to optimize data transfer, with roles clearly defined between the Processing System (PS) and Programmable Logic (PL) components. The implementation also features precision time protocol (PTP) synchronization, crucial for real-time applications. With remarkable performance metrics, such as the ability to handle 8K video at over 100 frames per second, the potential applications span various sectors, including datacenters, multimedia, and high-performance computing, underscoring the technology’s versatility and importance in modern computing environments.
The high-level architecture of the system, depicted in Figure 3, highlights the distinct roles of the PS and PL components within the Zynq UltraScale+ MPSoC. The PS features an ARM Cortex-A53-based Hard SoC, which is essential for system configuration, control, and diagnostics. Key components of this architecture include:
- 100G Ethernet MAC Driver: Ensures robust performance and low-latency data transmission at 100 Gb/s
- ERNIC Controller Driver: Responsible for managing incoming data to DDR and facilitating communication between the user application and ERNIC IP through efficient doorbell exchanges
- RDMA Core and User Space Libraries: Ensures compatibility and optimal performance for RDMA operations across both kernel and user spaces
Figure 3: Highlights of the distinct roles of the processing system and programmable logic components within the Zynq UltraScale+ MPSoC. (Image source: iWave)
The AMD ERNIC IP effectively offloads the RoCE v2 stack onto the FPGA, with the ERNIC Controller managing the handshaking between various modules to facilitate data transfer. It generates work queue entries and sends notifications (doorbells) to the ERNIC IP. Concurrently, the Zynq UltraScale+ MPSoC’s 100G Ethernet subsystem manages the MAC and physical layers, while the Data Pattern Generator is responsible for producing raw data and video data patterns.
Precision time protocol (PTP)
The PTP (IEEE 1588 Standard) timestamp plays a crucial role in synchronizing time across systems on an Ethernet network. This synchronization is vital for enhancing the performance of real-time applications, enabling synchronized and low-latency data exchanges at the nanosecond level.
Key highlights of the setup
The notable features of this setup include:
- Implementation of 100G Ethernet over RoCE v2 utilizing AMD ERNIC IP
- Reliable Connection Transport Type
- RDMA SEND, RDMA READ, and RDMA WRITE functionalities for packet handling
- Support for RDMA Send with Immediate and RDMA Write with Immediate message types
- Performance testing for RDMA using XRPING and PERFTEST applications
- Custom Data Pattern Generator for RAW and video data patterns
- Insertion of PTP timestamps alongside data
The detailed throughput statistics for video data transfers from the Zynq UltraScale+ MPSoC development kit to the server PC reveal impressive performance, with the capability to handle 8K video at over 100 fps and 4K video at more than 400 fps.
Potential Applications
The integration of RDMA over Converged Ethernet and ERNIC IP opens new avenues across various industries, significantly enhancing connectivity, performance, and efficiency in a range of applications, including:
- Datacenters and cloud computing: Facilitating efficient server communication and accelerating data processing in cloud architectures
- Video/image capture and transfer: Beneficial for multimedia applications, broadcasting, and virtual reality (VR) environments
- Storage solutions: Enabling faster data transfers between storage devices and servers, thereby improving storage system performance
- High-performance computing (HPC): Enhancing data transfer speeds and reducing latency within HPC clusters for quicker computational tasks and simulations
- IoT Edge devices: Enabling real-time data collection and transmission from sensors and devices
As the demand for faster and more efficient data transfer solutions continues to rise, RDMA over Converged Ethernet and ERNIC IP are poised to play a pivotal role in the future of high-performance computing.
Conclusion
iWave’s extensive portfolio of FPGA and SoC FPGA platforms, combined with its deep technical expertise, enables customers to develop cutting-edge products that leverage the latest advancements in artificial intelligence (AI), machine learning, and edge computing. By partnering with iWave, companies can accelerate their product development, reduce risk, and stay ahead of the competition in an increasingly complex technological landscape.
For more information or to discuss custom requirements, please reach out to us at mktg@iwave-global.com
免责声明:各个作者和/或论坛参与者在本网站发表的观点、看法和意见不代表 DigiKey 的观点、看法和意见,也不代表 DigiKey 官方政策。