By Bruno Ferreira
Publication Date: 2026-02-04 11:00:00
In any datacenter, whether it’s for AI or not, having fast networked communication across nodes is as equally important as the speed of the nodes themselves. When doing AI work, developers are steered to vendor-specific networking libraries like Nvidia’s NCCL or AMD’s RCCL. Now, in a new paper, a group of South Korean scientists has proposed a new library called HetCCL, a vendor-agnostic approach that allows clusters composed of GPUs from both vendors to operate as one.
Although it can simply be used for communicating between multiple GPUs in one setup, a collective commin a datacenter often ends up using good ol’ Remote Direct Memory Access (RDMA) to let applications pass data to a GPU somewhere else in the network. Think of sending network packets directly into a device’s memory (in this case GPU VRAM), rather than going through the driver, the TCP/IP stack, the OS networking layer, and burning a metric ton of CPU cycles in the process.