Build with Kimi K2.5 Multimodal VLM Using NVIDIA GPU-Accelerated Endpoints | NVIDIA Technical Blog

Build with Kimi K2.5 Multimodal VLM Using NVIDIA GPU-Accelerated Endpoints | NVIDIA Technical Blog

By Anu Srivastava
Publication Date: 2026-02-04 19:46:00

Kimi K2.5 is the newest open vision language model (VLM) from the Kimi family of models. Kimi K2.5 is a general-purpose multimodal model that excels in current high-demand tasks such as agentic AI workflows, chat, reasoning, coding, mathematics, and more.  

The model was trained using the open source Megatron‑LM framework. Megatron-LM provides accelerated computing for scalability and GPU optimization through several types of parallelism (tensor, data, sequence) for training massive transformer-based models.  

This model architecture builds on leading state-of-the-art large open models for efficiency and capability. The model is composed of 384 experts with a single dense layer, which allows for smaller-sized experts and specialized routing for different modalities. Kimi K2.5 achieves a 3.2% activation rate of parameters per token. 

Kimi K2.5 
Modalities  Text, image, video 
Total parameters  1T 
Active parameters  32.86B 
Activation rate  3.2% 
Input context length  262K 
Additional configuration information 
# experts  384 
# shared experts 
# experts per token 
# layers  61 (1 dense, 60 MoE) 
# attention heads  64 
Vocab size  ~164K 
Table 1. Specifications and configuration details for the Kimi K2.5 model

For vision capability, the large training vocabulary of 164K contains vision-specific tokens. Kimi created the MoonViT3d Vision Tower for the visual processing component of this model, which converts images and video frames into…