Build With Kimi K2.5 Multimodal VLM Using NVIDIA GPU-Accelerated Endpoints | NVIDIA Technical Blog

By Anu Srivastava
Publication Date: 2026-02-04 19:46:00

Kimi K2.5 is the newest open vision language model (VLM) from the Kimi family of models. Kimi K2.5 is a general-purpose multimodal model that excels in current high-demand tasks such as agentic AI workflows, chat, reasoning, coding, mathematics, and more.

The model was trained using the open source Megatron‑LM framework. Megatron-LM provides accelerated computing for scalability and GPU optimization through several types of parallelism (tensor, data, sequence) for training massive transformer-based models.

This model architecture builds on leading state-of-the-art large open models for efficiency and capability. The model is composed of 384 experts with a single dense layer, which allows for smaller-sized experts and specialized routing for different modalities. Kimi K2.5 achieves a 3.2% activation rate of parameters per token.

Kimi K2.5
Modalities	Text, image, video
Total parameters	1T
Active parameters	32.86B
Activation rate	3.2%
Input context length	262K
Additional configuration information
# experts	384
# shared experts	1
# experts per token	8
# layers	61 (1 dense, 60 MoE)
# attention heads	64
Vocab size	~164K

Table 1. Specifications and configuration details for the Kimi K2.5 model

For vision capability, the large training vocabulary of 164K contains vision-specific tokens. Kimi created the MoonViT3d Vision Tower for the visual processing component of this model, which converts images and video frames into…

Related Posts