Kimi K2.5 is a 1-trillion-parameter open-weight model released by Moonshot AI in January 2026. It uses Mixture-of-Experts, has a 256K context window, native vision, and a built-in agent swarm that can coordinate up to 100 parallel sub-agents.

Open Source AI Models Developer Tools

Kimi K2.5: The Open-Source Model Behind Cursor's Composer 2

Q: Why is Cursor using Kimi K2.5?

Cursor found Kimi K2.5 had the best perplexity for coding. They added continued pre-training and reinforcement learning to build Composer 2, which beats Claude Opus at one-tenth the cost.

edit_note

Author

Himansh

Published

March 21, 2026

schedule

8 min read

TheAITechPulse.com

Kimi K2.5 open-source AI model — 1 trillion parameter model powering Cursor Composer 2 — Kimi K2.5 — Moonshot AI's 1-trillion-parameter open-weight model powering Cursor's Composer 2.

On January 27, 2026, Beijing-based Moonshot AI released Kimi K2.5 — a 1-trillion-parameter open-weight model with native multimodality and a unique "agent swarm" architecture. Within two months, it became the foundation of Cursor's Composer 2, a $50 billion coding assistant, proving that Chinese open-source models now drive the global AI stack. Here's what makes K2.5 different — and why it matters.

Quick verdict: Kimi K2.5 delivers frontier-level performance at a fraction of the cost of proprietary models. It's the first open-weight system with a 256K context window, built-in vision, and the ability to spawn 100 parallel agents. Ideal for developers who need coding, reasoning, and multimodal workflows without paying closed-model premiums.

Best Laptops for Running Local AI Models

MacBook Air M3 — Best Overall

From $1,099

18-hour battery, fanless, handles model inference and development with ease. Perfect for students and developers running quantized models like Kimi K2.5.

View on Amazon →

ASUS ROG Zephyrus G14 — Power Pick

From $2,199

RTX 5080 with 16GB VRAM, 32GB RAM — handles large-scale model experimentation and fine-tuning locally.

View on Amazon →

Not sure which laptop fits your workflow? Use the Laptop Finder Tool →

Architecture: Sparse Power

Kimi K2.5 uses a Mixture-of-Experts (MoE) design: 1 trillion total parameters, but only 32 billion active per token. It has 61 layers (one dense, 60 MoE), 384 experts per MoE layer, and activates the top 8 each time. This keeps inference fast and cost-effective while retaining broad knowledge.

The context window is 256,000 tokens — enough to ingest entire codebases or long research papers. Multi-Head Latent Attention (MLA) compresses key-value caches by ~10×, enabling long-range tasks without blowing memory.

Total parameters

32B

Active per token

256K

Context window

$0.60

Input price / 1M tokens

Native Multimodality and Agent Swarm

K2.5 was trained from scratch on roughly 15 trillion mixed visual and text tokens. Its vision encoder, MoonViT (400M parameters), processes images and video frames directly through the same transformer layers as text. That means you can:

Upload a UI screenshot and get back working React or Tailwind CSS code
Analyze video for reasoning (VideoMMMU: 86.6%)
Process scanned documents with OCRBench 92.3% accuracy — matching Gemini 3

The most distinctive feature is Agent Swarm: the model can spawn and coordinate up to 100 parallel sub-agents, each handling independent subtasks. Using Parallel-Agent Reinforcement Learning (PARL), the orchestrator decomposes complex requests, delegates, and synthesizes results. This yields:

BrowseComp accuracy: 60.6% → 78.4%
WideSearch F1: 72.7% → 79.0%
Execution time for parallel tasks drops 3–4.5×

Benchmarks: Where It Leads

Kimi K2.5 holds its own against closed-source giants:

Benchmark	Kimi K2.5	vs Competitors
AIME 2025 (math reasoning)	96.1%	Leads field
SWE-Bench Verified (coding)	76.8%	Top open-source
Humanity's Last Exam (with tools)	50.2%	Beats GPT-5.2 (45.5%)
LiveCodeBench v6	85.0%	Competitive
VideoMMMU (multimodal)	86.6%	Matches Gemini 3
OCRBench	92.3%	Matches Gemini 3

Weaknesses: the model can be verbose (generating 5–6× more tokens than needed) and sometimes hallucinates on factual knowledge. But for coding, reasoning, and agent workflows, it's highly competitive.

Pricing and Licensing: Built for Scale

Kimi K2.5 is released under a Modified MIT License. It's commercially free for companies under 100 million monthly active users or less than $20 million in monthly revenue. Larger services must display "Kimi K2.5" credit prominently.

Through providers like Fireworks AI, API pricing is aggressive:

$0.60 per million input tokens
$2.50 per million output tokens
$0.30 per million cached input tokens

For local deployment, the model ships in INT4 (~595GB). Unsloth's 1.8-bit quant reduces that to ~240GB, which can run on a single 24GB GPU with RAM offloading. Popular inference engines like vLLM, SGLang, and KTransformers all support it.

                        ⚡ The Cursor Connection: In March 2026, developers discovered that Cursor's
                        Composer 2 was built on Kimi K2.5 (plus continued pre-training and RL). Cursor's leadership
                        acknowledged the base model, noting they accessed it through Fireworks AI under an authorized
                        commercial license. The episode validated that open-source models can power billion-dollar
                        products without sacrificing performance.
                    

💡 Bottom line: Kimi K2.5 is a proof point that open-source AI has caught up to — and in some areas surpassed — closed models. Its combination of MoE efficiency, native multimodality, and agent swarm makes it a foundation for the next generation of autonomous software development.

Frequently Asked Questions

What is Kimi K2.5?

Kimi K2.5 is a 1-trillion-parameter open-weight model released by Moonshot AI in January 2026. It uses a Mixture-of-Experts architecture with 32B active parameters, a 256K context window, native vision, and a built-in agent swarm that can coordinate up to 100 parallel sub-agents.

Is Kimi K2.5 free to use?

Yes — available under a modified MIT license for commercial use below 100M MAU or $20M monthly revenue. API access through Fireworks AI starts at $0.60 per million input tokens.

How does Kimi K2.5 compare to GPT-5 or Claude?

It's competitive on coding (SWE-Bench 76.8%) and excels at multimodal reasoning (VideoMMMU 86.6%) and agentic workflows. It can be more verbose and less polished on factual recall, but costs a fraction of proprietary models.

Can I run Kimi K2.5 locally?

Yes — with quantization (INT4 ~595GB, 1.8-bit ~240GB) and inference engines like vLLM or KTransformers, you can run it on high-end consumer hardware with sufficient RAM.

Why is Cursor using Kimi K2.5?

Cursor evaluated multiple base models and found Kimi K2.5 had the best perplexity for coding. They added continued pre-training and reinforcement learning to build Composer 2, which now beats Claude Opus at one-tenth the cost.

Sources: Moonshot AI official blog (Jan 27, 2026), Fireworks AI documentation (Feb 2026), Artificial Analysis (Mar 2026), Cursor engineering blog (Mar 19, 2026). — Himansh, TheAITechPulse.com

About the Author

Himansh is the founder of TheAITechPulse, where he analyzes AI tools, productivity software, and emerging tech for practical business use.

He focuses on real-world testing, ROI-driven evaluations, and actionable implementation guides for small businesses and solo founders.

👤 More about Himansh ✉️ Get in touch