DeepSeek R1 Distill Llama 70B

deepseek-ai · released 2025-01-23 · mit license

DeepSeek-R1-Distill-Llama-70B (Jan 2025) is the strongest R1 distill: AIME24 70.0, MATH-500 94.5, GPQA Diamond 65.2, LiveCodeBench 57.5, atop Llama-3.3-70B. MIT model license.

Key specs

TypeLocal open-weight
Parameters70.55B total
Architecturellama
Context window131K tokens
Knowledge cutoff2024-07-31
Modalitiestext
Recommended backends
Minimum viable rig

Benchmark scores

GPQA Diamond65.2%
SWE-bench Verified
AIME70%
MMLU-Pro
BFCL v3 (tool use)
Composite score6.71
Community ratingNo reviews yet

VRAM & disk per quantization

QuantVRAMDiskRAMContext
Q4_K_M42.4 GB40.9 GB131K

Strengths & weaknesses

Strengths: Best of the R1 distill family (MATH-500 94.5, GPQA Diamond 65.2); Strong coding (LiveCodeBench 57.5); MIT model license

Weaknesses: Built on Llama-3.3-70B, so base Llama 3.3 license terms apply; Heavy 70B footprint; long CoT latency