AI and the need for purpose-built cloud infrastructure | Azure blog and updates


The advancement of AI has been amazing, with solutions that push the envelope by expanding human understanding, preferences, intentions and even spoken language. AI improves our knowledge and understanding by helping us deliver faster, more insightful solutions that drive transformation beyond our imagination. However, with this rapid growth and change, AI’s demand for computing power has skyrocketed, surpassing Moore’s Law’s ability to keep up. As AI powers a wide range of key applications, including natural language processing, robotic process automation, and machine learning and deep learning, AI silicon manufacturers are finding new, innovative ways to get more out of every piece of silicon, such as advanced mixed-precision capabilities, with it AI innovators can do more with less. At Microsoft, our mission is to empower every person and organization on the planet to achieve more, with Azure’s purpose-built AI infrastructure We intend to keep that promise.

Azure High Performance Computing offers scalable solutions

The need for a purpose-built infrastructure for AI is evident—one that not only scales to leverage multiple accelerators within a single server, but also scales to combine many servers (with multiple accelerators) running across a high-performance network are distributed. High Performance Computing (HPC) technologies have greatly advanced multidisciplinary science and engineering simulations – including innovations in hardware, software, and the modernization and acceleration of applications by uncovering parallelism and advances in communications to drive AI infrastructure. A scalable AI computing infrastructure combines memory from individual graphics processing units (GPUs) into a large, shared pool to handle larger and more complex models. Combined with the incredible vector processing capabilities of GPUs, high-speed memory pools have proven extremely effective in processing large, multi-dimensional arrays of data to enhance insights and accelerate innovation.

With the added capability of a high-bandwidth, low-latency interconnect fabric, a scale-out AI-first infrastructure can significantly reduce time to resolution through advanced parallel communication methods, nesting of computations, and communication across large numbers of compute nodes shorten. Azure scale-up and scale-out AI-first infrastructure combines the attributes of both vertical and horizontal system scaling to handle the most demanding AI workloads. Azure’s AI-first infrastructure today offers best-in-class price, compute, and power-efficient performance.

Cloud infrastructure specially designed for AI

Microsoft Azure, in partnership with NVIDIA, delivers purpose-built AI supercomputers in the cloud to handle the most demanding real-world workloads at scale while meeting price/performance and time-to-solution requirements. And with the advanced machine learning tools available, you can accelerate the integration of AI into your workloads to drive smarter simulations and make smarter decisions.

Microsoft Azure is the only global public cloud service provider offering purpose-built AI supercomputers with massively scalable, scale-up and scale-out IT infrastructure consisting of interconnected NVIDIA InfiniBand NVIDIA Ampere A100 Tensor Core GPUs . Optional and available Azure Machine Learning Tools make it easy to adopt Azure’s AI-first infrastructure, from early development stages to enterprise-grade production deployments.

Scale-up and scale-out infrastructures with NVIDIA GPUs and NVIDIA Quantum InfiniBand networks are among the most powerful supercomputers in the world. Microsoft Azure placed in the top 15 of the Top500 Supercomputers worldwide and currently five systems in the top 50 use Azure infrastructure with NVIDIA A100 Tensor Core GPUs. Twelve of the top 20 supercomputers ranked Green500 list Use NVIDIA A100 Tensor Core GPUs.

Source: Top 500 List: Top500 November 2022, Green500 November 2022.

Azure is paving the way beyond exascale AI supercomputing with a total solution approach that combines the latest GPU architectures designed for the most compute-intensive AI training and inference workloads and optimized software to harness the power of GPUs. And this supercomputer-class AI infrastructure will be made widely available to researchers and developers in organizations of all sizes around the world to support Microsoft’s stated mission. Organizations that need to expand their existing on-premises HPC or AI infrastructure can take advantage of Azure’s dynamically scalable cloud infrastructure.

In fact, Microsoft Azure works closely with customers across all industry segments. Your growing needs for AI technology, research and applications will be met, augmented and/or accelerated with Azure’s AI-first infrastructure. Some of these collaborations and applications are detailed below:

Retail and AI

Microsoft Azure AI-first cloud infrastructure and toolchain with NVIDIA is having a significant impact on retail. With a GPU-accelerated computing platform, customers can quickly work through models and identify the best-performing model. Benefits include:

  • Deliver 50x performance improvements for classic data analytics and machine learning (ML) processes at scale with an AI-first cloud infrastructure.
  • By leveraging RAPIDS with NVIDIA GPUs, retailers can train their machine learning algorithms up to 20x faster. This means they can take larger data sets and process them faster and more accurately, allowing them to respond to purchasing trends in real time and realize inventory cost savings at scale.
  • Reduce the total cost of ownership (TCO) for large data science operations.
  • Increase ROI on forecasts, resulting in cost savings from fewer stockouts and misplaced inventory.

With autonomous checkout, retailers can offer their customers smoother and faster shopping experiences while increasing sales and margins. Benefits include:

  • Offer your customers a better and faster checkout experience and reduce waiting times in the queue.
  • Increase sales and margins.
  • Reduce shrinkage – the loss of inventory due to theft, such as shoplifting or self-checkout ticket swapping, which costs retailers $62 billion annually, according to the National Retail Federation.

In either case, these data-driven solutions require sophisticated deep learning models—models far more sophisticated than what machine learning alone provides. This level of sophistication, in turn, requires an AI-first infrastructure and an optimized AI toolchain.

Customer story (video): Eversen and NVIDIA create a seamless shopping experience that positively impacts the bottom line.


On the shop floor, proactive predictive maintenance, compared to routine or time-based preventative maintenance, can preempt the problem before it occurs and save companies from costly downtime. Benefits of Azure and NVIDIA cloud infrastructure built specifically for AI include:

  • GPU-accelerated computing enables AI on an industrial scale, leveraging unprecedented amounts of sensor and operational data to optimize operations, improve time to insight, and reduce costs.
  • Process more data faster with higher accuracy, enabling faster response time to potential device failures before they even happen.
  • Achieve a 50 percent reduction in false positives and a 300 percent reduction in false negatives.

Traditional computer vision methods, typically used in automated optical inspection (AOI) machines in production environments, require intensive human and capital investments. Benefits of GPU-accelerated infrastructure include:

  • Consistent performance with guaranteed quality of service, whether on-premises or in the cloud.
  • GPU-accelerated computing enables AI at industrial scale, leveraging unprecedented amounts of sensor and operational data to optimize operations, improve quality, improve time to insight, and reduce costs.
  • By leveraging RAPIDS with NVIDIA GPUs, manufacturers can train their machine learning algorithms up to 20x faster.

Each of these examples requires an AI-first infrastructure and toolchain to significantly reduce false positives and negatives in predictive maintenance and address subtle nuances in ensuring overall product quality.

Customer Story (Video): Microsoft Azure and NVIDIA give BMW the computing power for automated quality control.

As we have seen, AI is ubiquitous and its application is growing rapidly. The reason is simple. AI enables companies of all sizes to gain better insights and apply those insights to accelerate innovation and business outcomes. An optimized AI-first infrastructure is critical to developing and deploying AI applications.

Azure is the only cloud service provider with a purpose-built, AI-optimized infrastructure consisting of interconnected Mellanox InfiniBand NVIDIA Ampere A100 Tensor Core GPUs for AI applications of any scale for any size enterprise. At Azure, we have a purpose-built AI-first infrastructure that empowers every person and organization in the world to achieve more. Come and do more with Azure!

Learn more about purpose-built infrastructure for AI

Source link


Please enter your comment!
Please enter your name here