DeepSeek R1 Distill Qwen 32B
deepseek-ai · released 2025-01-20 · mit license
DeepSeek-R1-Distill-Qwen-32B (Jan 2025) distills R1 reasoning into Qwen2.5-32B: AIME24 72.6, MATH-500 94.3, GPQA Diamond 62.1, LiveCodeBench 57.2, beating o1-mini on several. MIT.
Key specs
| Type | Local open-weight |
|---|---|
| Parameters | 32.76B total |
| Architecture | qwen2 |
| Context window | 131K tokens |
| Knowledge cutoff | — |
| Modalities | text |
| Recommended backends | — |
| Minimum viable rig | — |
Benchmark scores
| GPQA Diamond | 62.1% |
|---|---|
| SWE-bench Verified | — |
| AIME | 72.6% |
| MMLU-Pro | — |
| BFCL v3 (tool use) | — |
| Composite score | 6.63 |
| Community rating | No reviews yet |
VRAM & disk per quantization
| Quant | VRAM | Disk | RAM | Context |
|---|---|---|---|---|
| Q4_K_M | 20.5 GB | 19 GB | — | 131K |
Strengths & weaknesses
Strengths: Strong math reasoning (AIME24 72.6 pass@1, MATH-500 94.3); Beats o1-mini on several benchmarks at dense 32B; MIT model license
Weaknesses: Distilled reasoning; weaker general chat breadth than full R1; Long CoT increases latency/cost