By Anu Srivastava
Publication Date: 2026-02-04 19:46:00
Kimi K2.5 is the newest open vision language model (VLM) from the Kimi family of models. Kimi K2.5 is a general-purpose multimodal model that excels in current high-demand tasks such as agentic AI workflows, chat, reasoning, coding, mathematics, and more.
The model was trained using the open source Megatron‑LM framework. Megatron-LM provides accelerated computing for scalability and GPU optimization through several types of parallelism (tensor, data, sequence) for training massive transformer-based models.
This model architecture builds on leading state-of-the-art large open models for efficiency and capability. The model is composed of 384 experts with a single dense layer, which allows for smaller-sized experts and specialized routing for different modalities. Kimi K2.5 achieves a 3.2% activation rate of parameters per token.
| Kimi K2.5 | |
| Modalities | Text, image, video |
| Total parameters | 1T |
| Active parameters | 32.86B |
| Activation rate | 3.2% |
| Input context length | 262K |
| Additional configuration information | |
| # experts | 384 |
| # shared experts | 1 |
| # experts per token | 8 |
| # layers | 61 (1 dense, 60 MoE) |
| # attention heads | 64 |
| Vocab size | ~164K |
For vision capability, the large training vocabulary of 164K contains vision-specific tokens. Kimi created the MoonViT3d Vision Tower for the visual processing component of this model, which converts images and video frames into…