AI

DeepSeek-R1

Foundation1.5B7B8B14B32B70B671B

DeepSeek's first-generation open reasoning model with 671B parameters (37B active), trained via reinforcement learning to achieve o1-level performance on math, code, and reasoning.

Model Overview

DeepSeek-R1 is an open-source reasoning model trained via large-scale reinforcement learning on DeepSeek-V3-Base, without relying on supervised fine-tuning as a preliminary step. It achieves performance comparable to OpenAI o1 across math, code, and reasoning tasks. The model naturally emerged with chain-of-thought reasoning behaviors including self-verification and reflection. DeepSeek also released six distilled dense models (1.5B, 7B, 8B, 14B, 32B, 70B) based on Qwen2.5 and Llama3 series. The latest version DeepSeek-R1-0528 significantly improves reasoning and inference capabilities, approaching O3 and Gemini 2.5 Pro on several benchmarks.

Capabilities

Inputs

Text

Outputs

Text, code

Task categories

Reasoning, coding, math, general language, chinese language tasks

Languages

English, chinese, multilingual

Tags

Reasoning, open-source, moe, long-context, coding, math, chain-of-thought, distillation

Model Details

Developer
DeepSeek AI
Version
DeepSeek-R1-0528
Release date
January 20, 2025
Model type
Foundation
Context window
160K tokens
Open source
Yes
Commercial use
Allowed
Fine-tunable
Yes
License
MIT

Architecture

Type
MoE (Mixture of Experts) — based on DeepSeek-V3 architecture
Base model
DeepSeek-V3-Base
Active params
37B
Total params
671B

Benchmarks

MMLU

90.8%

C_Eval

91.8%

FRAMES

82.5%

MATH_500

97.3%

MMLU_Pro

84%

AIME_2024

79.8%

ArenaHard

92.3%

CNMO_2024

78.8%

MMLU_Redux

92.9%

GPQA_Diamond

71.5%

Safety & Compliance

Notes
MIT license allows commercial use, modifications, and derivative works including distillation. Qwen-based distills follow Apache 2.0; Llama-based distills follow their respective Llama licenses.
Red teamed
Yes
Open weights
Yes
Distillation allowed
Yes
Usage recommendations
Set temperature 0.5-0.7 (0.6 recommended). Avoid system prompts. Enforce <think> token at start of output for best reasoning performance.
Training
Reinforcement learning, supervised fine-tuning, cold-start data, RLHF
Fine-tunable
Yes
Base model
DeepSeek-V3-Base
Usage recommendations
Set temperature 0.5-0.7 (0.6 recommended). Avoid system prompts. Enforce <think> token at start of output for best reasoning performance.