Llama 3 3 Nemotron Super 49B V1 5

nvidia · released 2025-10-10 · other license

Llama-3.3-Nemotron-Super-49B-v1.5 is NVIDIA's July 2025 reasoning model distilled from Llama-3.3-70B via NAS (49B, 128K context). NVIDIA's evals report 97.4% MATH500, 82.71% AIME 2025, 71.97% GPQA and 73.58% LiveCodeBench in reasoning mode.

Key specs

Type	Local open-weight
Parameters	49.87B total
Architecture	nemotron-nas
Context window	131K tokens
Knowledge cutoff	2024-03-31
Modalities	text
Recommended backends	—
Minimum viable rig	—

Benchmark scores

GPQA Diamond	71.97%
SWE-bench Verified	—
AIME	82.71%
MMLU-Pro	79.53%
BFCL v3 (tool use)	71.75%
Composite score	6.46
Community rating	No reviews yet

VRAM & disk per quantization

Quant	VRAM	Disk	RAM	Context
Q4_K_M	30.4 GB	28.9 GB	—	131K

Strengths & weaknesses

Strengths: Excellent accuracy/efficiency tradeoff via NAS; fits on a single H200; Strong reasoning (MATH-500 97.4, AIME25 82.71); Reasoning on/off toggle, tool calling, 128K context

Weaknesses: Older pretraining freshness (2023 cutoff from Llama 3.3); NVIDIA Open Model License more restrictive than Apache/MIT; low HLE (7.64)