DeepSeek R1 Distill Llama 70B
deepseek-ai · released 2025-01-23 · mit license
DeepSeek-R1-Distill-Llama-70B (Jan 2025) is the strongest R1 distill: AIME24 70.0, MATH-500 94.5, GPQA Diamond 65.2, LiveCodeBench 57.5, atop Llama-3.3-70B. MIT model license.
Key specs
| Type | Local open-weight |
|---|---|
| Parameters | 70.55B total |
| Architecture | llama |
| Context window | 131K tokens |
| Knowledge cutoff | 2024-07-31 |
| Modalities | text |
| Recommended backends | — |
| Minimum viable rig | — |
Benchmark scores
| GPQA Diamond | 65.2% |
|---|---|
| SWE-bench Verified | — |
| AIME | 70% |
| MMLU-Pro | — |
| BFCL v3 (tool use) | — |
| Composite score | 6.71 |
| Community rating | No reviews yet |
VRAM & disk per quantization
| Quant | VRAM | Disk | RAM | Context |
|---|---|---|---|---|
| Q4_K_M | 42.4 GB | 40.9 GB | — | 131K |
Strengths & weaknesses
Strengths: Best of the R1 distill family (MATH-500 94.5, GPQA Diamond 65.2); Strong coding (LiveCodeBench 57.5); MIT model license
Weaknesses: Built on Llama-3.3-70B, so base Llama 3.3 license terms apply; Heavy 70B footprint; long CoT latency