Mistral has taken a different approach to the foundation model race. Instead of scaling parameters, Small 4 consolidates three previously separate models — Magistral for reasoning, Pixtral for vision, and Devstral for code — into a single 119B-parameter MoE architecture with 128 experts and just 6B active parameters per forward pass.
The result is a 40% reduction in end-to-end completion time and 3x throughput improvement over its predecessor, with a 256K context window and configurable reasoning intensity. Developers can dial reasoning effort up or down depending on task complexity.
Released under Apache 2.0, Small 4 is fully open-weight and available for self-hosting. For teams that previously had to route between separate specialized models, this consolidation simplifies infrastructure while maintaining competitive performance across all three domains.