Nvidia’s Groq 3 LPU Signals AI's Inferencing Era

By Dina Genkina
Publication Date: 2026-03-16 21:04:00

This week, over 30,000 people are descending upon San Jose, Calif., to attend Nvidia GTC, the so-called Superbowl of AI—a nickname that may or may not have been coined by Nvidia. At the main event Jensen Huang, Nvidia CEO, took the stage to announce (among other things) a new line of next generation Vera Rubin chips that represent a first for the GPU giant: a chip designed specifically to handle AI inference. The Nvidia Groq 3 language processing unit (LPU) incorporates intellectual property Nvidia licensed from the start-up Groq last Christmas Eve for US $20 billion.

“Finally, AI is able to do productive work, and therefore the inflection point of inference has arrived,” Huang told the crowd. “AI now has to think. In order to think, it has to inference. AI now has to do; in order to do, it has to inference.”

Training and inference tasks have distinct computational requirements. While training can be done on huge amounts of data at the same time and can take weeks, inference must be run on a user’s query when it comes in. Unlike training, inference doesn’t require running costly backpropagation. With inference, the most important thing is low latency—users expect the chatbot to answer quickly, and for thinking or reasoning models inference runs many times before the user even sees an output.

Over the past few years, inference-specific chip start-ups were experiencing a sort of Cambrian explosion, with different companies exploring distinct approaches to speed…

Related Posts