At the 2024 Build conference, Azure CTO Mark Russinovich delved into the evolution of Azure’s infrastructure, specifically focusing on the advancements in AI platforms. Azure’s hardware has advanced significantly over the years, from a basic server design to various server types, now including GPUs and AI accelerators for complex workloads.
The most recent innovation in 2023 showcased how Azure’s infrastructure has grown alongside AI models. The rapid growth of AI models, such as the GPT-4 with over a billion parameters, has led to the development of massive distributed supercomputers for efficient training. Microsoft now operates over 30 AI supercomputers globally, reducing training times significantly.
Along with training, inference for AI models also requires substantial computing power. Microsoft’s Maia hardware introduces new liquid cooling systems for efficient inference processing. The Azure POLCA Project aims to enhance efficiency in data centers by optimizing power consumption during inference operations, allowing for more servers in a data center.
Managing data for training models is also a challenge, but Microsoft’s Storage Accelerator system efficiently distributes data across clusters for faster loading. High-bandwidth networks are crucial for AI workloads, and Microsoft has heavily invested in InfiniBand connections for both internal and customer services.
Project Forge, Microsoft’s software stack, helps manage Azure’s AI infrastructure by scheduling operations and optimizing resource management. By utilizing a virtual GPU pool called “One Pool,” Azure ensures consistent utilization across its platform. Project Flywheel further guarantees performance consistency for different AI models.
Additionally, confidential computing capabilities ensure security for training custom models on Azure. Microsoft’s focus on efficient and responsive AI infrastructure is evident in its investment in hardware and software advancements. The experience gained from running Open AI on Azure has allowed Microsoft to provide cutting-edge tools and techniques for AI applications to a wider audience, showcasing a commitment to innovation and excellence in cloud computing.
Article Source
https://www.infoworld.com/article/3715661/inside-todays-azure-ai-cloud-data-centers.amp.html