Llama 3 3 Nemotron Super 49B V1 5

nvidia · released 2025-10-10 · other license

Llama-3.3-Nemotron-Super-49B-v1.5 is NVIDIA's July 2025 reasoning model distilled from Llama-3.3-70B via NAS (49B, 128K context). NVIDIA's evals report 97.4% MATH500, 82.71% AIME 2025, 71.97% GPQA and 73.58% LiveCodeBench in reasoning mode.

Key specs

TypeLocal open-weight
Parameters49.87B total
Architecturenemotron-nas
Context window131K tokens
Knowledge cutoff2024-03-31
Modalitiestext
Recommended backends
Minimum viable rig

Benchmark scores

GPQA Diamond71.97%
SWE-bench Verified
AIME82.71%
MMLU-Pro79.53%
BFCL v3 (tool use)71.75%
Composite score6.46
Community ratingNo reviews yet

VRAM & disk per quantization

QuantVRAMDiskRAMContext
Q4_K_M30.4 GB28.9 GB131K

Strengths & weaknesses

Strengths: Excellent accuracy/efficiency tradeoff via NAS; fits on a single H200; Strong reasoning (MATH-500 97.4, AIME25 82.71); Reasoning on/off toggle, tool calling, 128K context

Weaknesses: Older pretraining freshness (2023 cutoff from Llama 3.3); NVIDIA Open Model License more restrictive than Apache/MIT; low HLE (7.64)