Llama 3 3 Nemotron Super 49B V1 5
nvidia · released 2025-10-10 · other license
Llama-3.3-Nemotron-Super-49B-v1.5 is NVIDIA's July 2025 reasoning model distilled from Llama-3.3-70B via NAS (49B, 128K context). NVIDIA's evals report 97.4% MATH500, 82.71% AIME 2025, 71.97% GPQA and 73.58% LiveCodeBench in reasoning mode.
Key specs
| Type | Local open-weight |
|---|---|
| Parameters | 49.87B total |
| Architecture | nemotron-nas |
| Context window | 131K tokens |
| Knowledge cutoff | 2024-03-31 |
| Modalities | text |
| Recommended backends | — |
| Minimum viable rig | — |
Benchmark scores
| GPQA Diamond | 71.97% |
|---|---|
| SWE-bench Verified | — |
| AIME | 82.71% |
| MMLU-Pro | 79.53% |
| BFCL v3 (tool use) | 71.75% |
| Composite score | 6.46 |
| Community rating | No reviews yet |
VRAM & disk per quantization
| Quant | VRAM | Disk | RAM | Context |
|---|---|---|---|---|
| Q4_K_M | 30.4 GB | 28.9 GB | — | 131K |
Strengths & weaknesses
Strengths: Excellent accuracy/efficiency tradeoff via NAS; fits on a single H200; Strong reasoning (MATH-500 97.4, AIME25 82.71); Reasoning on/off toggle, tool calling, 128K context
Weaknesses: Older pretraining freshness (2023 cutoff from Llama 3.3); NVIDIA Open Model License more restrictive than Apache/MIT; low HLE (7.64)