MoonshotAI iconMoonshotAI: Kimi Linear 48B A3B Instruct

Model Type

Open weight model icon

Open Weight Model

48B parameters

Recommended Use Cases

Text Generation

Kimi Linear 48B A3B Instruct is Moonshot AI's efficient hybrid linear attention model released in October 2025, featuring Kimi Delta Attention (KDA) for 75% less KV cache usage and up to 6x faster decoding at 1M context.

Overview

Kimi Linear introduces a novel hybrid architecture that combines Kimi Delta Attention (KDA) with Multi-Head Latent Attention (MLA) in a 3:1 ratio. This design achieves superior performance while dramatically reducing memory usage and increasing throughput, making it ideal for long-context and agentic workloads.

Key Features

  • Kimi Delta Attention (KDA): Linear attention with channel-wise gating for fine-grained memory control
  • 75% KV cache reduction: Dramatically lower memory requirements
  • 6x decoding throughput: At 1M context length compared to full attention
  • 1M token context: Native support for extremely long sequences
  • Drop-in replacement: Can replace full attention architectures

Technical Specifications

SpecificationValue
Total Parameters48B
Active Parameters3B
ArchitectureHybrid KDA + MLA (3:1 ratio)
Context Length1M tokens
Attention DesignChannel-wise gating with DPLR matrices
Position EncodingNoPE (No Position Embeddings)

Availability

  • Open Weights: Hugging Face (moonshotai/Kimi-Linear-48B-A3B-Instruct)
  • Inference: vLLM with FLA (Flash Linear Attention) kernel
  • KDA Kernel: Open-sourced in FLA library

Use Cases

  • Long-context document analysis
  • Agentic workflows with extended trajectories
  • Memory-constrained deployments
  • High-throughput inference scenarios
  • Test-time scaling applications

Role in Series

Moonshot AI's Kimi models offer different capabilities:

  • Kimi Linear: Most efficient, best for long context and throughput (this model)
  • Kimi K2 0905: Agentic coding specialist, 256K context
  • Kimi K2 Thinking: Extended reasoning with tool orchestration
  • Kimi K2.5: Maximum capability with vision and Agent Swarm

Links