Alibaba Cloud opts for Ethernet over Nvidia’s interconnect, utilizing own High Performance Network for GPU connectivity in data center with 15,000 units
Alibaba Cloud engineer Ennan Zhai shared research on the design of data centers used for LLM training via GitHub. The document describes how Alibaba used Ethernet to enable 15,000 GPUs to communicate. They developed the High Performance Network (HPN) to overcome issues with uneven traffic distribution. Alibaba Cloud divided its data centers into hosts with … Read more