DeepSeek V3

deepseek-ai · released 2024-12-26 · DeepSeek Model License license

DeepSeek-V3 (Dec 2024) is the original 671B MoE chat model: 75.9% MMLU-Pro, 59.1% GPQA Diamond, 90.2% MATH-500 and 42.0% SWE-bench Verified per its report. The strongest open model of its time, predating the R1/V3.1 reasoning upgrades.

Key specs

TypeLocal open-weight
Parameters684.53B total · MoE, — active
Architecturedeepseek_v3
Context window164K tokens
Knowledge cutoff2024-07-31
Modalitiestext
Recommended backends
Minimum viable rig

Benchmark scores

GPQA Diamond59.1%
SWE-bench Verified42%
AIME39.2%
MMLU-Pro75.9%
BFCL v3 (tool use)
Composite score5.61
Community ratingNo reviews yet

VRAM & disk per quantization

QuantVRAMDiskRAMContext
Q4_K_M398.5 GB397 GB164K

Strengths & weaknesses

Strengths: Landmark efficient open-weights MoE (671B/37B active); Strong math/code for a 2024 non-reasoning model (MATH-500 90.2)

Weaknesses: Original Dec-2024 release, now behind on GPQA/SWE-bench; Non-pure-MIT model license