The AI inference market is booming, prompting cloud computing giant, and Nvidia partner, Amazon Web Services to offer a new cloud instance that addresses the growing cost of scaling inference.
With that in mind, Amazon Web Services announced this week it will release a new cloud instance running on Nvidia’s Tensor Core GPUs with an emphasis on machine learning inference.
Inference “is the big market,” Nvidia CEO Jensen Huang stressed during the GPU maker’s technology conference this week, with an estimated 80 to 90 percent of the cost of machine learning at scale devote to AI inference.
The new “G4” instances running on the AWS Elastic Compute Cloud (EC2) and Tesla T4 GPUs are expected to be available in the coming weeks, according to Matt Garman, vice president of compute services at AWS.
Along with machine learning inference, the latest GPU service in the cloud targets video transcoding, media processing and other graphics-intensive applications, the partners said Monday (March 18).
“Machine learning is a great fit for the cloud,” Garman said. The cloud giant and the GPU specialist have been collaborating since AWS released an EC2 cloud instance of a version of Nvidia’s Tesla GPU known as “Fermi” in 2010, making it perhaps the first “GPU as a service.”
Despite the steady performance increases of GPU instances in the cloud, “A lot of our customers are still trying to figure out how exactly do you incorporate machine learning into your applications,” Garman said. Among the initial uses of GPU clusters in the cloud is running machine learning training applications.
As more ML training and test shifted to the cloud, AWS rolled out its SageMaker machine learning and deep learning stack in 2017 designed to streamline a process that previously involved data preparation, selecting training algorithms, scaling to production and ultimately retraining the model.
The partners are touting their collaboration as a way to accelerate iteration of applications ranging from materials properties to drug discovery. Garman noted that AWS EC2 P3 servers running Nvidia’s V100 Tesla GPUs for HPC applications have reduced one startup’s drug design studies from two months to six hours.
The value proposition of running AI in the cloud extends to freeing up the precious time of data scientists and machine learning specialists, Garman stressed.