DeepSeek R1 Distill Llama 70B

deepseek-ai · released 2025-01-23 · mit license

DeepSeek-R1-Distill-Llama-70B (Jan 2025) is the strongest R1 distill: AIME24 70.0, MATH-500 94.5, GPQA Diamond 65.2, LiveCodeBench 57.5, atop Llama-3.3-70B. MIT model license.

Key specs

Type	Local open-weight
Parameters	70.55B total
Architecture	llama
Context window	131K tokens
Knowledge cutoff	2024-07-31
Modalities	text
Recommended backends	—
Minimum viable rig	—

Benchmark scores

GPQA Diamond	65.2%
SWE-bench Verified	—
AIME	70%
MMLU-Pro	—
BFCL v3 (tool use)	—
Composite score	6.71
Community rating	No reviews yet

VRAM & disk per quantization

Quant	VRAM	Disk	RAM	Context
Q4_K_M	42.4 GB	40.9 GB	—	131K

Strengths & weaknesses

Strengths: Best of the R1 distill family (MATH-500 94.5, GPQA Diamond 65.2); Strong coding (LiveCodeBench 57.5); MIT model license

Weaknesses: Built on Llama-3.3-70B, so base Llama 3.3 license terms apply; Heavy 70B footprint; long CoT latency