DeepSeek-R1

Name: DeepSeek-R1
Brand: DeepSeek AI

Foundation1.5B7B8B14B32B70B671B

DeepSeek's first-generation open reasoning model with 671B parameters (37B active), trained via reinforcement learning to achieve o1-level performance on math, code, and reasoning.

Paper Link Use this Model

Model Overview

DeepSeek-R1 is an open-source reasoning model trained via large-scale reinforcement learning on DeepSeek-V3-Base, without relying on supervised fine-tuning as a preliminary step. It achieves performance comparable to OpenAI o1 across math, code, and reasoning tasks. The model naturally emerged with chain-of-thought reasoning behaviors including self-verification and reflection. DeepSeek also released six distilled dense models (1.5B, 7B, 8B, 14B, 32B, 70B) based on Qwen2.5 and Llama3 series. The latest version DeepSeek-R1-0528 significantly improves reasoning and inference capabilities, approaching O3 and Gemini 2.5 Pro on several benchmarks.

Capabilities

Inputs

Text

Outputs

Text, code

Task categories

Reasoning, coding, math, general language, chinese language tasks

Languages

English, chinese, multilingual

Model Details

Developer

DeepSeek AI

Version

DeepSeek-R1-0528

Release date

January 20, 2025

Model type

Foundation

Context window

160K tokens

Open source

Yes

Commercial use

Allowed

Fine-tunable

Yes

License

MIT

Architecture

Type

MoE (Mixture of Experts) — based on DeepSeek-V3 architecture

Base model

DeepSeek-V3-Base

Active params

37B

Total params

671B

Benchmarks

MMLU

90.8%

C_Eval

91.8%

FRAMES

82.5%

MATH_500

97.3%

MMLU_Pro

84%

AIME_2024

79.8%

ArenaHard

92.3%

CNMO_2024

78.8%

MMLU_Redux

92.9%

GPQA_Diamond

71.5%

Safety & Compliance

Notes

MIT license allows commercial use, modifications, and derivative works including distillation. Qwen-based distills follow Apache 2.0; Llama-based distills follow their respective Llama licenses.

Red teamed

Yes

Open weights

Yes

Distillation allowed

Yes

Usage recommendations

Set temperature 0.5-0.7 (0.6 recommended). Avoid system prompts. Enforce <think> token at start of output for best reasoning performance.

Training

Reinforcement learning, supervised fine-tuning, cold-start data, RLHF

Fine-tunable

Yes

Base model

DeepSeek-V3-Base

Usage recommendations

Set temperature 0.5-0.7 (0.6 recommended). Avoid system prompts. Enforce <think> token at start of output for best reasoning performance.

Variations

Name	Parameters	Context Window
deepseek-r1:1.5b	1.5B	128K tokens
deepseek-r1:7b	7B	128K tokens
deepseek-r1:8b	8B	128K tokens
deepseek-r1:14b	14B	128K tokens
deepseek-r1:32b	32B	128K tokens
deepseek-r1:70b	70B	128K tokens
deepseek-r1:671b	671B	160K tokens

Use Cases

Mathematical reasoningCode generationCode debuggingScientific reasoningLogic problemsMultilingual tasksResearch assistanceDistillation training