
DeepSeek: DeepSeek R1 Zero
deepseek/deepseek-r1-zero
DeepSeek-R1-Zero is a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step. It's 671B parameters in size, with 37B active in an inference pass.
It demonstrates remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors.
DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. See DeepSeek R1 for the SFT model.
Modalities
Context
Avg
164K
Released
Mar 6, 2025
Knowledge Cutoff
Jul 2024
Activity
Token volume and request traffic to this model over time.