MoonshotAI: Kimi Linear 48B A3B Instruct
Model Type
Open Weight Model
48B parameters
Recommended Use Cases
Text Generation
Kimi Linear 48B A3B Instruct is Moonshot AI's efficient hybrid linear attention model released in October 2025, featuring Kimi Delta Attention (KDA) for 75% less KV cache usage and up to 6x faster decoding at 1M context.
Overview
Kimi Linear introduces a novel hybrid architecture that combines Kimi Delta Attention (KDA) with Multi-Head Latent Attention (MLA) in a 3:1 ratio. This design achieves superior performance while dramatically reducing memory usage and increasing throughput, making it ideal for long-context and agentic workloads.
Key Features
- Kimi Delta Attention (KDA): Linear attention with channel-wise gating for fine-grained memory control
- 75% KV cache reduction: Dramatically lower memory requirements
- 6x decoding throughput: At 1M context length compared to full attention
- 1M token context: Native support for extremely long sequences
- Drop-in replacement: Can replace full attention architectures
Technical Specifications
| Specification | Value |
|---|---|
| Total Parameters | 48B |
| Active Parameters | 3B |
| Architecture | Hybrid KDA + MLA (3:1 ratio) |
| Context Length | 1M tokens |
| Attention Design | Channel-wise gating with DPLR matrices |
| Position Encoding | NoPE (No Position Embeddings) |
Availability
- Open Weights: Hugging Face (moonshotai/Kimi-Linear-48B-A3B-Instruct)
- Inference: vLLM with FLA (Flash Linear Attention) kernel
- KDA Kernel: Open-sourced in FLA library
Use Cases
- Long-context document analysis
- Agentic workflows with extended trajectories
- Memory-constrained deployments
- High-throughput inference scenarios
- Test-time scaling applications
Role in Series
Moonshot AI's Kimi models offer different capabilities:
- Kimi Linear: Most efficient, best for long context and throughput (this model)
- Kimi K2 0905: Agentic coding specialist, 256K context
- Kimi K2 Thinking: Extended reasoning with tool orchestration
- Kimi K2.5: Maximum capability with vision and Agent Swarm