By Steven J. Vaughan-Nichols
Publication Date: 2026-03-24 15:20:00
The marriage of Kubernetes and AI has arrived in llm‑d, a replicable Kubernetes blueprint to deploy inference stacks for any model, on any accelerator, in any cloud.
On Tuesday at KubeCon Europe 2026 in Amsterdam, IBM Research, Red Hat, and Google Cloud announced the donation of llm‑d, their open‑source distributed inference framework, to the Cloud Native Computing Foundation (CNCF) as a sandbox project.
The move, supported by founding collaborators NVIDIA and CoreWeave along with AMD, Cisco, Hugging Face, Intel, Lambda, and Mistral AI, establishes llm‑d as a community‑governed blueprint for scalable, vendor‑neutral large language model (LLM) inference.
Launched in 2025, llm‑d was built to make serving foundation models at scale predictable, portable, and cloud‑native. It transforms inference from an improvised, model‑by‑model challenge into a replicable, production‑grade Kubernetes-based system. Llm-d was created by Neural Magic, which Red…