DeepSeek V3
deepseek-ai · released 2024-12-26 · DeepSeek Model License license
DeepSeek-V3 (Dec 2024) is the original 671B MoE chat model: 75.9% MMLU-Pro, 59.1% GPQA Diamond, 90.2% MATH-500 and 42.0% SWE-bench Verified per its report. The strongest open model of its time, predating the R1/V3.1 reasoning upgrades.
Key specs
| Type | Local open-weight |
|---|---|
| Parameters | 684.53B total · MoE, — active |
| Architecture | deepseek_v3 |
| Context window | 164K tokens |
| Knowledge cutoff | 2024-07-31 |
| Modalities | text |
| Recommended backends | — |
| Minimum viable rig | — |
Benchmark scores
| GPQA Diamond | 59.1% |
|---|---|
| SWE-bench Verified | 42% |
| AIME | 39.2% |
| MMLU-Pro | 75.9% |
| BFCL v3 (tool use) | — |
| Composite score | 5.61 |
| Community rating | No reviews yet |
VRAM & disk per quantization
| Quant | VRAM | Disk | RAM | Context |
|---|---|---|---|---|
| Q4_K_M | 398.5 GB | 397 GB | — | 164K |
Strengths & weaknesses
Strengths: Landmark efficient open-weights MoE (671B/37B active); Strong math/code for a 2024 non-reasoning model (MATH-500 90.2)
Weaknesses: Original Dec-2024 release, now behind on GPQA/SWE-bench; Non-pure-MIT model license