By DataCenterKnowledge
Publication Date: 2026-04-24 15:01:00
A recent wave of releases – including OpenAI’s GPT-5.5 model and new Nvidia guidance on building agents – points to a shift in how AI runs in production. Instead of responding to discrete prompts, systems are becoming persistent agents that execute multi-step tasks, call tools, and maintain context over time.
That shift breaks a core assumption of modern AI infrastructure: workloads arrive as short, stateless requests optimized for throughput.
Agent workloads hold state and run in bursts that interleave GPU compute with I/O and coordination, making demand harder – less predictable, less batchable, and more dependent on system-wide coordination.
From Stateless Inference to Long-Lived Processes
Nvidia’s recent framing centers on agents that plan, execute, and iterate across tasks while interacting with external tools and environments. That introduces a different execution model from traditional inference, which runs in tight loops optimized for tokens per second.
“With agentic, we’re moving from stateless, single-shot inference to long-lived, stateful processes,” Matt Kimball, vice president and principal analyst at Moor Insights & Strategy, told Data Center Knowledge. “These agents don’t just generate tokens. They maintain context, call tools, wait on external systems, and resume.”
That variability disrupts how systems have been tuned.
“Traditional inference is built on tight…