NVIDIA: Nemotron 3 Super (free)
FreeNVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer Mixture-of-Experts architecture with multi-token prediction (MTP), it delivers over 50% higher token generation compared to leading open models. The model features a 1M token context window for long-term agent coherence, cross-document reasoning, and multi-step task planning. Latent MoE enables calling 4 experts for the inference cost of only one, improving intelligence and generalization. Multi-environment RL training across 10+ environments delivers leading accuracy on benchmarks including AIME 2025, TerminalBench, and SWE-Bench Verified. Fully open with weights, datasets, and recipes under the NVIDIA Open License, Nemotron 3 Super allows easy customization and secure deployment anywhere — from workstation to cloud.
Pricing per 1M Tokens
| Input (Prompt) | Free |
| Output (Completion) | Free |
| Cache Read | Free |
| Cache Write | Free |
| Image | N/A |
Specifications
| Context Length | 262K |
| Max Output Tokens | 262K |
| Input Modalities | Text |
| Output Modalities | Text |
| Tokenizer | Other |
| Instruct Type | N/A |
| Top Provider Context | 262K |
| Top Provider Max Output | 262K |
| Moderated | No |
Compare this model
See how NVIDIA: Nemotron 3 Super (free) stacks up against other models.
More from NVIDIA
Last updated: March 23, 2026
First tracked: March 23, 2026